In-Depth Ruby Concurrency: Navigating the Ruby Concurrency Landscape

Summarized using AI

In-Depth Ruby Concurrency: Navigating the Ruby Concurrency Landscape

JP Camara • November 13, 2024 • Chicago, IL • Talk

Introduction

This talk titled In-Depth Ruby Concurrency: Navigating the Ruby Concurrency Landscape, presented by JP Camara at RubyConf 2024, dives into the complex world of Ruby concurrency, discussing when and how to effectively use different concurrency models within Ruby applications.

Key Points

  • Types of Ruby Concurrency:

    • Ruby concurrency comprises several layers including processes, threads, fibers, and ractors, which can be visualized as nested dolls, each adding a layer of concurrency.
  • Processes:

    • Processes are heavy in terms of memory but run in isolated memory spaces, providing true parallelism.
    • An example discussed is Pitchfork, a multi-process server handling web requests by mapping each request to an individual process, optimizing CPU usage by adjusting process counts based on core availability.
  • Threads:

    • Threads operate in a shared memory space and achieve concurrency but not parallelism due to the Global VM Lock (GVL) in Ruby.
    • Suitable for I/O-bound operations, Sidekick is highlighted as a practical implementation of threading in handling background jobs effectively.
  • Reactor Pattern:

    • The reactor pattern enhances concurrency by managing multiple I/O connections efficiently without blocking threads, increasing throughput for web servers like Puma.
  • Fibers:

    • Fibers are lightweight compared to threads and allow for cooperative multitasking, meaning they need to yield control to each other.
    • The introduction of the fiber scheduler offers the ability to handle async operations, making them more powerful in applications with high connection demands, such as with Falcon server.
  • Ractors:

    • Ractors provide a means for parallel and concurrent execution, managing memory more strictly and allowing the sharing of some components. They’re still experimental with limited library support, but libraries like Mooro are emerging.

Conclusion

Key takeaways from the talk include:
- Maximizing vertical scalability before attempting horizontal scaling.
- Utilizing preloading to optimize memory usage in processes.
- Adjusting thread counts based on the application’s I/O percentage to ensure optimal resource usage.
- Encouraging experimentation with new concurrency models like ractors and the async fiber scheduler for future projects.
Overall, the session aimed to empower developers with greater clarity on how to navigate the Ruby concurrency landscape effectively, ensuring efficient application performance.

In-Depth Ruby Concurrency: Navigating the Ruby Concurrency Landscape
JP Camara • November 13, 2024 • Chicago, IL • Talk

When do I use a Process, or a Thread, or a Fiber? And Why? Can I use Ractors yet? What is the FiberScheduler? The M:N Thread scheduler? What's a Reactor? Do I fork, prefork, or refork? Should I care?

Do I scale up my Threads? My Fibers? My Processes? Do I use lots of lower powered, low-process horizontal scaling or high-powered, high-process vertical scaling? All of the above?

In this talk, we'll build an understanding of the Ruby concurrency landscape, and map that understanding onto concurrent gems like Sidekiq, Puma, Falcon, Pitchfork, SolidQueue and Mooro. My goal is for you to better understand how they work, what concurrency options they offer, and how you can best utilize them for scaling your applications.

RubyConf 2024

00:00:15.160 all right thank you and uh thanks for
00:00:17.199 coming to such a heady topic as your
00:00:19.439 last Talk of the day um my talk is
00:00:21.800 called in-depth Ruby concurrency
00:00:23.320 navigating the rubby concurrency
00:00:24.640 landscape like Mike said uh when I first
00:00:27.920 found out I was doing this talk um
00:00:30.560 I told my parents I was doing a Talk on
00:00:32.320 Ruby and they were really confused um
00:00:35.640 because they thought I was doing a talk
00:00:36.840 about their dog uh and so if you've come
00:00:40.800 here to to learn about how to play with
00:00:42.719 my parents dog Ruby concurrently I'm
00:00:44.600 sorry this isn't the talk for you and
00:00:46.079 and you may want to
00:00:47.800 leave um so I'm JP CRA um I publish
00:00:51.600 technical blog posts at JP ca.com and
00:00:53.960 over the past year I've been doing a
00:00:55.280 series on uh in-depth rubby concurrency
00:00:57.760 basically this topic um I do some open
00:01:00.000 source of GitHub I'm active on Blue Sky
00:01:02.280 at jpc.com and I'm a principal engineer
00:01:05.360 at wealthbox CRM which is a CRM for
00:01:07.759 financial
00:01:09.200 advisers So today we're going to talk
00:01:11.320 about uh a group that I affectionately
00:01:13.360 refer to as the Ruby concurrency crew um
00:01:16.680 that consists of processes rectors
00:01:19.320 threads and
00:01:22.439 fibers so when I think about Ruby
00:01:24.360 concurrency I think of it sort of like a
00:01:26.759 uh nesting doll essentially as you open
00:01:29.520 up pieces P of the nesting doll there's
00:01:31.600 new layers of concurrency inside and uh
00:01:35.040 at runtime in Ruby you can introspect
00:01:37.000 all those layers really easily so you've
00:01:38.920 got the process ID that you can get from
00:01:41.200 process. PID and then you've got
00:01:42.840 reactors threads and fibers all kind of
00:01:44.360 nested within there um and you can
00:01:46.119 access all them using dot current to
00:01:47.680 introspect information about
00:01:50.399 them so the first thing we're going to
00:01:52.280 talk about at the top level of that
00:01:53.520 nesting doll is the process so processes
00:01:56.880 are parallel and concurrent and we'll
00:01:59.000 kind of dig into what that really means
00:02:00.799 exactly uh they work in isolated memory
00:02:03.240 spaces meaning that when you have
00:02:05.079 multiple processes they're uh unable to
00:02:08.119 communicate with each other they have is
00:02:10.039 they can't share memory between each
00:02:11.400 other so things like class loading and
00:02:12.840 stuff can't Shar be shared between each
00:02:14.400 other uh they map to an actual OS
00:02:17.040 process um so you can introspect them at
00:02:19.120 the operating system level um and
00:02:20.959 they're the most expensive form of Rubik
00:02:23.040 concurrency and we'll talk about that a
00:02:24.440 bit so we have a very simple example of
00:02:26.319 just Fork do here that'll create a
00:02:28.239 process for you
00:02:30.680 um very coincidentally Matt's did a
00:02:33.200 whole section about prime numbers and
00:02:35.000 how he can generate the world's largest
00:02:36.280 prime number um and I happened to use
00:02:38.280 primes here because they are kind of an
00:02:39.800 expensive operation to do in parallel
00:02:41.800 and by the way I tried Ruby head with
00:02:44.200 the largest prime available and 10
00:02:45.879 minutes in I just had to shut my
00:02:47.239 computer off because it was just dying
00:02:48.680 under this the weight of it so I
00:02:49.920 wouldn't actually try running this code
00:02:52.360 um you need to make it a little simpler
00:02:54.440 but basically I've got three ranges of
00:02:56.440 numbers from one to that very large
00:02:59.239 prime number and I'm going to walk over
00:03:01.000 them and create forks and try to figure
00:03:02.920 out what prime numbers are in my
00:03:04.959 lists and so the first thing you'll see
00:03:07.319 when I run that code is that uh the fork
00:03:10.440 actually starts running immediately this
00:03:11.920 is this is true parallelism in you know
00:03:14.239 the regular cr- Ruby runtime and at the
00:03:17.000 same time you'll also see I've got this
00:03:18.400 process status command running at the
00:03:19.840 bottom we've got a new process that got
00:03:22.200 created so I set the process title to
00:03:24.560 Ruby with the index after it and then I
00:03:26.720 just start selecting Primes from my
00:03:28.280 numbers and while I'm doing that I'm
00:03:30.239 getting full CPU
00:03:32.519 saturation and for each one that I run
00:03:35.159 until I go to process weight all and
00:03:36.599 that initial process just kind of goes
00:03:38.200 off and waits I've got three
00:03:40.879 99.9% um CPU saturation processes
00:03:44.799 running at the same time uh I mentioned
00:03:48.159 that processes are parallel and
00:03:50.360 concurrent um I think people normally
00:03:52.239 think of just like threads and fibers as
00:03:53.640 being concurrent but really concurrency
00:03:56.120 uh whereas parallelism is sort of uh
00:03:57.959 running things simultaneously
00:03:59.319 concurrency is really about
00:04:00.519 orchestrating or composing tasks and you
00:04:03.319 essentially put those things into a unit
00:04:04.840 of work you hand it off to something and
00:04:06.680 you say run this for me in whatever way
00:04:08.439 makes sense um so for instance like
00:04:10.640 psychic jobs are a great example of of
00:04:13.040 concurrency and so in our case here when
00:04:15.000 we first start running because it's a
00:04:16.799 process that starts running in parallel
00:04:18.799 but in this example we've actually got
00:04:20.280 five ranges and on our weird
00:04:21.919 hypothetical computer we've only got
00:04:23.280 three cores and so as we start to go
00:04:25.840 beyond the initial three cores we have
00:04:28.000 available to us things still continue to
00:04:30.440 run things still continue to switch
00:04:31.960 around but our CPU usage for each of our
00:04:34.360 processes goes down because they're
00:04:35.680 starting to share those three CPUs and
00:04:37.960 so at this point we've got 75% and then
00:04:40.680 basically by the time we get to the end
00:04:42.000 of it we're running at about 60% CPU but
00:04:44.880 we're running concurrently we're
00:04:46.000 swapping back and forth between them
00:04:47.440 we're orchestrating these
00:04:49.600 tasks so what's an example of processes
00:04:52.160 in the real world one of my favorite
00:04:54.120 examples is a newer server from Shopify
00:04:56.400 called Pitchfork uh Pitchfork is a
00:04:58.680 multi-process what's called a refor
00:05:00.560 forking server uh it Maps web requests
00:05:04.199 one to one from a web request to a
00:05:06.120 process meaning for however many
00:05:07.840 processes you have that's how many web
00:05:09.400 requests you can run at any given time
00:05:11.600 and if you go over that then your
00:05:12.759 requests start to queue up and those
00:05:14.320 handle your parallel CPU and IO for
00:05:16.720 you but how do you decide how many
00:05:19.160 processes you should actually
00:05:21.560 use and for my answer is really ideally
00:05:24.680 as many processes as you have cores so
00:05:27.240 if you don't have as many processes as
00:05:28.720 you have cores you're ESS leaving
00:05:30.400 parallelism on the
00:05:32.080 table additionally if you're using a
00:05:34.400 server that only uses processes like
00:05:36.280 Pitchfork does and doesn't use any of
00:05:37.720 the other concurrency units we're going
00:05:38.919 to talk about you really want to up that
00:05:40.960 process count as well um more like 1.25
00:05:44.600 to two times as many processes as you
00:05:46.560 have CPUs and that's because concurrency
00:05:49.919 so when you have uh when you're only
00:05:52.560 running processes you really want to
00:05:53.960 take advantage of that concurrency some
00:05:55.720 things are going to block on things like
00:05:57.120 IO they're not going to be running in
00:05:58.360 parallel on CPU so you want to be able
00:06:00.479 to take advantage of that and and run
00:06:02.520 extra
00:06:04.440 processes I mentioned that processes are
00:06:06.599 expensive and one of the things that
00:06:08.160 that's most expensive about them is
00:06:09.440 they're heavy on memory and so in a
00:06:12.080 typical example you know you've got your
00:06:13.919 parent process that you're running out
00:06:15.080 of we call our Fork do like we were
00:06:16.599 doing for our prime number selection um
00:06:18.919 our child processes each have what are
00:06:21.039 called pages of memory and so those
00:06:23.440 pages of memory are completely unshared
00:06:25.000 between them but there's lots of things
00:06:26.160 that you could share in your application
00:06:27.840 you know you load your classes maybe
00:06:29.319 you've got yit bite code there's lots of
00:06:31.199 things that that are important to share
00:06:32.520 but you can't do it
00:06:34.160 here and so you really want to improve
00:06:36.199 your memory usage by doing something
00:06:37.639 called preloading which you may have
00:06:39.000 seen before uh pre-loading basically
00:06:41.720 just means before I start forking I
00:06:44.240 require some amount of code um by
00:06:46.639 requiring that code in this example here
00:06:48.120 I'm requiring uh the rails application
00:06:50.199 code and I'm calling initialize and so
00:06:52.800 if you have wet running you're going to
00:06:54.080 get some bite code you're going to load
00:06:55.319 a bunch of your classes maybe some
00:06:56.800 things that are memoized or initializers
00:06:58.639 run and that all gets pre-loaded into my
00:07:00.919 parent process now when I fork my child
00:07:04.759 processes are able to take advantage of
00:07:06.360 something called copy on write so
00:07:08.000 there's pieces of memory that may never
00:07:09.560 change in my child processes they never
00:07:11.560 have to reinitialize them and so the
00:07:13.080 memory overhead on my child processes
00:07:14.879 goes down because it shares it with the
00:07:17.360 parent most servers support a preload
00:07:19.879 option so you really want to improve
00:07:21.319 your copy on right memory by utilizing
00:07:24.400 that the most interesting thing to me
00:07:26.520 about Pitchfork is it's got this newer
00:07:28.039 concept called reforging and so what
00:07:30.199 reking is is basically the parent
00:07:32.280 process creates a child without doing
00:07:34.280 any preloading and it's called the mold
00:07:35.879 or the
00:07:37.240 template that template does the
00:07:39.160 preloading and from here things start to
00:07:41.240 pretty much look like our previous
00:07:42.520 example we've pre-loaded we've got a few
00:07:44.360 pages of memory that we can use to
00:07:46.440 create our what we call here sibling
00:07:48.120 processes but effectively child
00:07:49.400 processes so we're sharing a minimal
00:07:51.159 amount of memory we're getting some
00:07:52.240 reduced overhead we can probably use
00:07:53.759 some more concurrency and we're getting
00:07:55.520 benefits where things really get cool is
00:07:58.080 with pitchfork it has something called
00:08:00.000 refor after which means that after a
00:08:02.199 certain number of web requests it'll
00:08:04.479 actually Fork again from that mold
00:08:06.680 process and what that means is we've
00:08:08.280 gotten lots of web requests we've done
00:08:10.199 all our initializers memoization may
00:08:11.960 have happened a lot more wet may have
00:08:14.039 happened at that point all of that stuff
00:08:16.199 is now reforced and able to be shared
00:08:18.960 with a next generation of processes and
00:08:21.360 so in this case we went from you know
00:08:22.919 three pages of memory to many pages of
00:08:25.759 memory a much more warm application um
00:08:28.639 and it continues to that at different
00:08:30.080 intervals here we have 50 100
00:08:33.640 1,000 at at Shopify it reduced their
00:08:35.880 memory usage by 30% and their latency by
00:08:39.039 nine so we kind of understand what
00:08:41.719 processes are good for in terms of
00:08:43.159 parallelization and a certain amount of
00:08:44.480 concurrency what is Pitchfork good for
00:08:46.600 so if you're currently a unicorn user um
00:08:48.920 for as your server or if you're using
00:08:50.440 Puma without threads Pitchfork might be
00:08:52.399 a good option for you uh if your web
00:08:54.680 requests are primarily CPU constrained
00:08:57.040 um Pitchfork also might be a good option
00:08:58.680 or if you just aren't aren't sure if
00:08:59.800 your code is thread safe you might want
00:09:01.480 to utilize
00:09:03.720 Pitchfork so the next thing technically
00:09:06.160 would be rators um but I'm going to save
00:09:09.000 that for a little bit later so we're
00:09:10.120 going to come back to in a
00:09:12.279 bit so next we're going to talk about
00:09:14.680 threads uh threads are purely concurrent
00:09:17.440 whereas processes are concurrent and
00:09:19.200 parallel threads just have this
00:09:20.680 concurrent concept um they operate in
00:09:23.120 shared memory space so they don't have
00:09:24.760 to initialize a whole new block of
00:09:26.600 memory every time they can share the the
00:09:28.480 memory of their parents they do allocate
00:09:30.440 a little bit of memory but nothing near
00:09:31.920 what a process does they map one to one
00:09:34.279 for the most part with OS threads and
00:09:36.200 they're less expensive than a process
00:09:38.240 and so we've got a little example here
00:09:39.600 of thread. new now we're going to try to
00:09:42.360 use our same prime example here um and
00:09:45.079 it's not really going to work out so
00:09:46.279 well and so the first thing you'll
00:09:48.079 notice as opposed to our process where
00:09:49.880 our process immediately started uh
00:09:51.720 running our operation in parallel
00:09:53.680 nothing really happens here at first
00:09:55.800 we're seeing our threads actually are
00:09:57.160 getting created at the operating system
00:09:58.560 level but they're operating at at 0%
00:10:01.079 usage and that's because of the gvl you
00:10:03.399 may have seen some previous talks about
00:10:05.320 the gvl it's this Global virtual machine
00:10:07.720 lock that every thread needs to have
00:10:09.160 access to to keep the Ruby runtime
00:10:10.800 consistent internally and so that means
00:10:12.800 our threads can only run Ruby code one
00:10:14.680 at a time uh basically until some event
00:10:18.440 comes along something blocks uh the
00:10:20.399 thread scheduler switches them they
00:10:21.920 can't do anything until that
00:10:24.200 happens and so in our case here
00:10:26.240 eventually our main thread says hey you
00:10:28.560 know threads give give me your values
00:10:30.279 which is just a way of blocking on the
00:10:31.680 thread and asking for whatever the last
00:10:33.200 value is that comes back from the thread
00:10:35.120 we set a thread current name so we get
00:10:36.800 those Ruby 012 kind of options there and
00:10:39.760 then we select our primes But ultimately
00:10:42.279 we have to just keep swapping back and
00:10:43.839 forth with the gvl and so we never
00:10:45.639 achieve more than 100% of one CPU and
00:10:48.519 each thread is operating at about
00:10:50.639 33%
00:10:52.880 effectively with that in mind what are
00:10:54.920 they good for uh and they're good for
00:10:56.800 things that block so they're good for
00:10:58.639 file operation
00:10:59.800 DB calls HTTP sleep Process Management
00:11:03.160 and if you use a library like bcrypt or
00:11:05.320 zib uh those will actually release the
00:11:07.279 gvl as well and you can operate those in
00:11:09.160 parallel so each thread waits for the OS
00:11:11.639 response in
00:11:12.959 parallel so that ultimately means that
00:11:15.760 to me uh how I refer to threads is
00:11:17.560 they're really parallel is um
00:11:19.839 effectively like if we were to take an
00:11:21.560 example that better utilizes them so
00:11:23.360 here we're doing a slow report in the
00:11:25.120 database we're retrieving a customer
00:11:27.200 from stripe we're doing another related
00:11:29.760 API call and then we're still glutton
00:11:31.959 for punishment and we're doing more
00:11:33.160 prime number
00:11:34.560 generation and so in this case we still
00:11:36.800 have to grab that gbll first but once we
00:11:39.120 go to a blocking operation we release it
00:11:41.600 and those blocking operations can
00:11:43.200 continue to happen in parallel in the
00:11:44.959 background and so in this case what ends
00:11:46.880 up happening is we we're parallel is
00:11:49.200 we've got three additional things
00:11:50.839 running in the background while we're
00:11:52.160 doing our prime number generation in the
00:11:55.720 foreground so what's an example of
00:11:57.560 threads in the real world Sidekick is a
00:12:00.079 perfect example um Sidekick is a
00:12:02.160 multi-threaded job server that was kind
00:12:04.000 of one of its Innovations at the time
00:12:05.760 jobs often block on iio so it's really
00:12:08.200 valuable to have threads they can help
00:12:10.320 you paralyze your IO uh and threads are
00:12:12.800 much cheaper than processes so you can
00:12:14.279 have a lot more of
00:12:16.360 them it's pretty simple to express jobs
00:12:19.519 essentially run on a thread they pull
00:12:21.720 jobs from redis uh job information from
00:12:24.519 reddis and then within your jobs
00:12:26.279 whenever you do something like an HTTP
00:12:27.760 call or a database call or whatever
00:12:29.800 uh those can be paralyzed and so you get
00:12:31.440 better
00:12:32.760 throughput but how many threads should
00:12:35.760 we use how much should we allocate it's
00:12:37.920 a complicated question but there is one
00:12:39.639 answer you may have seen some of Nate
00:12:40.920 burk's work where he's talked about AMD
00:12:42.880 doll's law so we don't really need to
00:12:44.880 understand this formula per se but we
00:12:46.760 should just understand a couple portions
00:12:48.320 of it so s is the proportion of our
00:12:50.240 program that can be made parallel so the
00:12:52.399 percentage in IO or CPU and then p is
00:12:55.519 the speed up factor of the parallel
00:12:57.160 portion so that's the number of process
00:12:59.560 rors threads or fibers that you're going
00:13:01.639 to be running in association with the
00:13:03.519 kind of I that you're doing so we've got
00:13:06.320 this handy table here pre pre
00:13:08.639 pre-computing the formula for you
00:13:10.639 basically at the lowest end when you're
00:13:11.839 around 10% you're really not getting any
00:13:13.760 benefit out of threads you might as well
00:13:15.000 just be running processes when you get
00:13:17.160 into the more common range of around 25
00:13:19.240 to 50% CPU in your background jobs 5 to
00:13:22.399 10 threads start to help a lot from on
00:13:24.519 the low end it's about 1.25 increase in
00:13:27.079 in throughput and we get almost up to
00:13:29.120 two times throughput as you get really
00:13:31.240 really high that might be something more
00:13:32.639 like you're making just tons of API
00:13:34.240 calls or websockets or something like
00:13:35.920 that but you can get a lot of thread
00:13:37.120 benefit out of that point and a lot more
00:13:40.399 throughput so what's sidekick good for I
00:13:42.880 mean you're probably using sidekick
00:13:44.360 right now so you probably know what it's
00:13:45.519 good for but for jobs that operate with
00:13:47.199 more than 10% IO uh your app is thread
00:13:50.199 safe which most apps start out that way
00:13:52.160 so just try to keep them that way and
00:13:53.959 when you're using sidekick you know five
00:13:55.480 to 10 threads is a safe bet
00:13:59.480 how do I decide between processes and
00:14:02.240 threads and the answer is really you
00:14:04.920 want to have both and so what's a good
00:14:07.480 example of something that has both uh
00:14:10.240 Puma oops is a multi uh process
00:14:13.279 multi-threaded server it process uh the
00:14:15.920 processes paralyze your CPU for you the
00:14:17.920 threads more efficiently paralyze your
00:14:20.600 IO so similar to Pitchfork when your
00:14:23.079 request comes in things can only be
00:14:24.959 paralyzed uh by using uh processes
00:14:28.759 there's reactor thing for connection
00:14:30.399 handling in the middle but then we
00:14:32.680 ultimately hand off to our threads which
00:14:35.000 for something that does you know look at
00:14:36.880 those am doll's law ranges of IO you can
00:14:39.959 increase and tweak the amount of threads
00:14:41.480 you have to to benefit your throughput
00:14:44.440 more so I've thrown this reactor thing
00:14:46.800 in the middle here uh what is a reactor
00:14:49.600 and and why should you maybe care or
00:14:51.839 understand what reactors
00:14:53.720 are so reactors not reactors sorry
00:14:57.839 little buddy uh uh basically a reactor
00:15:01.680 is this Loop um that runs and interacts
00:15:04.560 with something called the kernel event
00:15:06.199 que so pretty much every operating
00:15:08.079 system has this very highly optimized
00:15:10.160 queue where you can say hey I have this
00:15:12.320 particular operation I want to run I
00:15:13.880 want to do some HTP and tell me when the
00:15:15.920 socket's ready to write to or read from
00:15:18.079 you can give that off to the kernel and
00:15:19.920 it it can handle thousands hundreds of
00:15:22.079 thousands of these at a time but it's
00:15:23.880 kind of inconvenient to directly deal
00:15:25.279 with that so the reactor handles that
00:15:26.839 for you you register an event handler
00:15:28.560 saying hey when when is the socket ready
00:15:30.399 to write
00:15:31.519 to that hands off to the kernel event CE
00:15:34.680 you go off and do something else when
00:15:36.600 it's done the kernel event queue lets
00:15:37.959 the reactor know it invokes your Handler
00:15:40.040 for you and you can start writing to
00:15:41.480 your socket or reading from your
00:15:44.680 database this is the same thing as an
00:15:46.800 event Loop you may have heard of and
00:15:48.120 it's used by tons of stuff Puma action
00:15:51.000 cable event machine all sorts of things
00:15:52.800 it's a really powerful concept the only
00:15:54.720 reason I bring it up is because we're
00:15:55.880 going to even use it later
00:15:57.920 on so what are reactors good for they're
00:16:00.319 good for the foundation for highly
00:16:01.680 scalable IO they're good for buffering
00:16:04.319 requests and for slow clients so in Puma
00:16:06.440 utilizes it so if a client is really
00:16:08.319 slowly downloading or uploading content
00:16:10.560 your threads don't get occupied the
00:16:11.959 reactor can handle thousands of these at
00:16:13.560 a time and for managing incoming
00:16:15.639 connections it's not the type of thing
00:16:17.199 you may use day-to-day in your own code
00:16:19.319 um but tools you use will use
00:16:21.720 that and then what is Puma good for it's
00:16:24.079 a general purpose web server you've got
00:16:25.959 a mixture of CPU and IO no one got fired
00:16:28.639 for choosing Puma it's got 44 million
00:16:30.480 downloads you're probably using it right
00:16:33.399 now all right the next concurrency unit
00:16:35.720 we're going to talk about are fibers uh
00:16:38.720 fibers are concurrent similar to threads
00:16:41.600 they're shared memory memory similar to
00:16:43.759 threads they are user space in this case
00:16:46.079 so there's no OS equivalent that we can
00:16:47.759 introspect it's something that's
00:16:49.079 essentially just managed and created by
00:16:50.959 Ruby and they're less expensive than
00:16:53.199 threads rators processes pretty similar
00:16:55.880 interface to creating them fiber. new
00:16:59.160 no sorry
00:17:02.480 buddy so in this case we're going to do
00:17:05.280 the same example again but what we'll
00:17:07.640 notice here is that unlike uh threads
00:17:09.839 threads need to have the gvl to be able
00:17:11.520 to run code but I've got that little
00:17:13.199 thread. main with the gvl acquired
00:17:15.039 already at the bottom that's because
00:17:16.640 fibers basically run inside of a thread
00:17:18.880 so they don't acquire the GBL but they
00:17:20.400 still can't run in parallel and that's
00:17:22.400 because fibers are actually considered
00:17:24.039 to be cooperative so fibers can actually
00:17:26.919 jump back and forth between each other
00:17:28.439 they can Feld and resume they have to
00:17:30.120 communicate with each other and
00:17:31.120 cooperate with each other to be able to
00:17:32.559 be created and uh and run so at the end
00:17:35.720 here our main fiber calls resume on all
00:17:38.039 of them each one of them runs range.
00:17:40.360 select and yields the value but all we
00:17:42.760 really get is this like sequential going
00:17:45.039 back and forth between them it's just a
00:17:47.960 it just sequentially runs so it's not
00:17:50.600 really that
00:17:52.360 useful so in Ruby
00:17:55.440 3.0 there there came along the fiber
00:17:57.679 Schuler and so it gives this ability to
00:18:00.280 basically put fibers kind of on steroids
00:18:02.919 um fibers essentially they still manage
00:18:05.159 the stack for you they can still operate
00:18:06.640 con currently but we've got the
00:18:07.840 scheduler behind the scenes that will do
00:18:10.679 a bunch of extra operations for us and
00:18:12.880 will uh unlock a lot more features for
00:18:15.600 fibers and so if we take that same
00:18:17.840 reactor example that we had before and
00:18:20.080 we apply it to fibers and we'll use the
00:18:22.320 something called The async Gem which is
00:18:23.840 kind of the primary fiber schuer
00:18:25.799 implementation the first thing you do is
00:18:27.480 you create the sync block which starts a
00:18:29.159 reactor for us behind the scenes when
00:18:31.640 you call a syn there you basically just
00:18:33.640 creating a fiber behind the
00:18:35.400 scenes when we call dbquery it just
00:18:38.600 looks like a regular synchronous call
00:18:40.159 it's just the same thing we're used to
00:18:42.440 but behind the scenes what happens is
00:18:44.200 the fiber scheduler puts that into a
00:18:45.880 reactor for us it registers the event it
00:18:48.520 then puts us into a list of blocked
00:18:50.080 fibers checks for available fibers and
00:18:52.000 moves on the same thing happens with our
00:18:53.960 stripe call
00:18:55.480 here once we get to the end eventually
00:18:58.120 those operations are finished our
00:18:59.480 synchronous calls will return and the
00:19:02.039 fiber schedule kind of handles all this
00:19:03.559 transparently for us and gives us our
00:19:05.120 result
00:19:06.799 back so once you have the fiber
00:19:09.320 scheduler they're good for a bunch more
00:19:11.039 stuff but they basically at that point
00:19:12.960 still just kind of look like threats
00:19:15.720 each operation becomes a part of the
00:19:17.159 event Loop in this case instead of being
00:19:19.080 put to the OS in
00:19:20.799 parallel I'll go through the same kind
00:19:22.679 of parallel is example because they kind
00:19:24.679 of get that parallel is functionality
00:19:26.480 again at the same time once they have
00:19:28.480 the fiber Schuler so we do async here
00:19:31.880 and the difference when we're doing
00:19:32.840 these async calls is they actually
00:19:34.200 automatically go into that
00:19:36.480 background and so we've got these
00:19:38.919 background processes they're being put
00:19:40.919 into the reactor and handled for us and
00:19:42.600 we can still do our um useless
00:19:46.559 calculation of prime
00:19:48.559 numbers this all seems familiar and it
00:19:51.039 seems almost kind of pointless to have
00:19:52.960 two things that nearly operate the same
00:19:54.600 way but the way that I look at them is I
00:19:57.799 I kind of compare the strengths and
00:19:59.400 weaknesses of each and so for me with
00:20:01.480 fibers um they operate more
00:20:03.600 deterministically uh meaning because
00:20:05.679 they have to cooperate there's very like
00:20:07.840 uh exact seams of where things can
00:20:09.919 actually happen where things can switch
00:20:11.559 out versus threads where you basically
00:20:13.600 can have any instruction swap while
00:20:15.640 you're in the middle of running code
00:20:17.280 they're lighter weight on me on memory
00:20:18.919 and CPU and you know if you have five
00:20:21.559 threads and five rectors this really
00:20:22.960 isn't going to matter but as you scale
00:20:24.400 up a server the scalability of the
00:20:26.280 reactor goes much higher the cons is is
00:20:29.760 sometimes a big one it's that they block
00:20:31.000 on CPU if a fiber starts doing a really
00:20:33.400 heavy CPU operation um there's nothing
00:20:35.919 you can do about it without it
00:20:37.039 cooperatively scheduling another
00:20:39.280 fiber so what's an example of fibers in
00:20:41.559 the real world we've got Falcon which is
00:20:43.960 a multiprocess multifiber server built
00:20:46.360 on the fiber scheduler like our previous
00:20:48.919 examples like everything in C Ruby uh
00:20:51.240 the parallelization happens from our
00:20:53.200 processes but then everything else our
00:20:55.000 connection handling our request
00:20:56.200 buffering our parallel IO all of that
00:20:57.960 just gets handed off into the fiber
00:21:00.679 Schuler and so what's Falcon good for
00:21:03.159 it's it's good really for any web app
00:21:05.120 but it's particularly good for high iio
00:21:07.320 high connection or proxying web
00:21:09.039 applications um because of these this
00:21:11.320 fiber scheduler and the ability to scale
00:21:12.840 up these connections you can use web
00:21:14.360 sockets and http2 really easily in it I
00:21:17.440 did a benchmark recently against a
00:21:19.559 node.js websocket benchmark and Falcon
00:21:22.720 was very uh comparable in terms of the
00:21:25.039 performance which was great to see
00:21:26.159 because node.js is a highly optimized
00:21:27.799 environment for we
00:21:29.440 sockets additionally even without action
00:21:32.080 cable you can kind of just slap some
00:21:34.480 websocket functionality into your
00:21:35.799 controllers using Falcon and it just
00:21:39.159 works all right we're finally getting to
00:21:42.320 rators and giving them their time to
00:21:43.880 shine uh rators are both parallel and
00:21:46.880 concurrent they share M memory but with
00:21:49.559 a more strict interface which we're not
00:21:51.039 entirely going to get to today they map
00:21:53.159 to a pool of os threads so they're not
00:21:54.960 directly one to one but they do get to
00:21:56.400 utilize threads behind the scene and
00:21:58.080 they're less expensive than processes
00:22:00.320 they're pretty simple to
00:22:02.159 instantiate so in our case here uh
00:22:04.799 exactly the same as when we were
00:22:05.960 demonstrating processes when we start
00:22:07.840 calling that rapor new we immediately
00:22:09.559 start running in the uh parallel code in
00:22:11.559 the
00:22:13.360 background so that we can see that
00:22:15.360 behind the scenes when we do this
00:22:16.520 process status for our threads we are
00:22:18.480 getting actual CPU saturation every time
00:22:21.080 we create a new
00:22:22.279 Rector and once the main Rector calls
00:22:24.919 take on your rectors they can go through
00:22:27.279 their range select they can find our
00:22:29.159 prime numbers and then they can yield
00:22:30.640 those prime numbers at the
00:22:33.679 end they work but they do need more
00:22:36.960 Community Support uh you still get a
00:22:39.240 rector's experimental warning at the
00:22:41.000 beginning you know there's certain
00:22:42.960 common libraries that because of the
00:22:44.679 strict sharing support you can't utilize
00:22:46.440 out of the box which can be a little bit
00:22:47.840 confusing and there are some API uses
00:22:50.120 that need to be
00:22:52.400 fixed is there an example of rors in the
00:22:54.760 real world there are people using them
00:22:56.480 but in terms of of Library support you
00:22:58.279 don't see a lot but there is a cool
00:23:00.679 Library called Morrow um Morrow is a
00:23:03.400 multiactor experimental server uh it
00:23:06.000 uses in front of it it uses async HTTP
00:23:08.600 which is that same underlying fiber
00:23:10.039 scheduler Falcon technology to handle
00:23:12.240 connection handling and request
00:23:14.000 buffering but then it uses rators to
00:23:16.159 handle parallel CPU and
00:23:18.520 IO so what's Morrow good for it's really
00:23:21.120 just good for experimentation right now
00:23:22.679 but I encourage you to to give it a look
00:23:24.440 and and try it out because it's one of
00:23:26.200 the the main examples out there of
00:23:27.640 something trying to use rors um in a
00:23:30.279 more seriousish
00:23:32.200 way all right that was a lot uh what are
00:23:36.039 the actual takeaways for the
00:23:38.279 talk so the first takeaway that I I want
00:23:40.960 to give you is uh maximize vertical
00:23:43.320 scale so before you start trying to
00:23:45.559 scale out horizontally you know take
00:23:47.559 advantage of everything that's exists on
00:23:49.240 your container your server whatever
00:23:51.960 always take advantage of preload you
00:23:54.000 know and if you have the option you know
00:23:55.520 use refor uh Puma actually has an
00:23:57.880 experimental feature for re working as
00:23:59.640 well um you get more space for
00:24:01.600 concurrency and more space for wet bik
00:24:03.880 code so you can run your Ruby code
00:24:05.520 faster match your Ruby process count to
00:24:08.320 available CPU count and from there you
00:24:11.360 know utilize amdall law to figure out
00:24:13.600 what kind of IO percentages you're
00:24:15.240 utilizing in each of uh and figure out
00:24:17.120 how to scale your largely your threads
00:24:19.039 from there so for instance if you have
00:24:20.960 sidekick running you know make sure you
00:24:22.760 take advantage of amd's Law and
00:24:24.320 understand what your IO percentages are
00:24:25.760 in your
00:24:26.960 jobs because if not
00:24:29.039 you're just paying for
00:24:31.840 less and then in terms of a conceptual
00:24:34.320 compression for how uh how to understand
00:24:36.760 which one to use where I would say
00:24:39.159 whenever you're trying to parallel
00:24:40.880 parallelize IO within your own code I
00:24:43.559 would suggest using the asnc fiber
00:24:45.159 scheduler uh the asyn gem has a really
00:24:47.399 great uh interface and ergonomics and
00:24:49.760 because fibers operate more
00:24:51.600 deterministically uh it's largely a
00:24:53.559 safer way um to to write your code and
00:24:56.120 have less foot guns along the way
00:24:59.039 if you're trying to paralyze something I
00:25:00.559 would encourage you to use rators there
00:25:02.120 are people out there using rators for uh
00:25:05.600 paralyzing CPU operations within their
00:25:07.640 code um there's also a gem called
00:25:09.640 parallel which will run processes for
00:25:11.919 you or threads but it can also
00:25:14.159 experimentally run rors for you so you
00:25:15.880 could use that as an abstraction and
00:25:18.000 then otherwise tune your servers and let
00:25:20.080 them do what they do best so processes
00:25:22.360 for CPU and threads fibers uh for Io and
00:25:25.960 tune them based on your server settings
00:25:29.360 there were things I didn't get to
00:25:30.480 because I think all our heads would
00:25:31.799 explode if I tried to do that but uh
00:25:34.000 basically there's an MN thread
00:25:35.919 initiative in Ruby 3.3 its threads
00:25:38.760 essentially backed by reactor um which I
00:25:40.960 think is a really cool initiative and
00:25:42.440 and it is actually available to use
00:25:44.279 under a flag um so you could try it out
00:25:46.919 today solq which is a server you've
00:25:49.399 probably heard of it's multi-threaded
00:25:50.760 and multi-process job server which is
00:25:52.440 really nice um and then just you know
00:25:55.240 ideas for a simplified concurrency
00:25:56.760 feature but maybe that could be future
00:25:58.679 talk of some
00:25:59.760 kind uh and also I have I do have some
00:26:03.039 stickers of all the Ruby mascots if
00:26:04.919 you're interested I don't have a ton um
00:26:07.240 and I have more fibers than others I
00:26:08.799 don't know if that's some sort of
00:26:10.480 metaphor or something for how much
00:26:11.760 cheaper fibers are um but basically I do
00:26:14.480 have some stickers if you're if you're
00:26:15.799 interested in that um you can find me at
00:26:17.559 jpc.com or on Blue Sky at jpc.com thank
00:26:21.279 you
Explore all talks recorded at RubyConf 2024
+64