How the Ruby Global VM Lock impacts app performance

Summarized using AI

How the Ruby Global VM Lock impacts app performance

Ivo Anjo • November 13, 2024 • Chicago, IL • Talk

Summary of the Talk "How the Ruby Global VM Lock impacts app performance"

In this talk, Ivo Anjo, a senior software engineer at Datadog, discusses the implications of Ruby's Global VM Lock (GVL) on application performance. He begins by explaining that the GVL, often compared to Python's Global Interpreter Lock (GIL), regulates access to the Ruby VM. This prevents multiple Ruby threads from executing Ruby code simultaneously, impacting latency and overall application performance. The key points covered include:

  • What is the GVL?: The GVL is a mechanism ensuring the correctness of the multi-threaded C program that is Ruby. It allows for concurrency (the illusion of simultaneous execution) but not true parallelism.
  • Why Concurrency Primitives are Necessary: While the GVL protects Ruby VM’s internal state, it does not safeguard application-level code, where race conditions can lead to unexpected results.
  • Impact on Performance: The GVL can significantly affect performance as adding more threads increases contention for the GVL, leading to higher latency. Through various examples, he demonstrates how threads can impact each other, especially in I/O operations or when waiting for the GVL to become available.
  • JVL Tracing Gem: The introduction of the gvl-tracing gem allows developers to visualize the state of threads and assess their waiting times and interactions with the GVL. This tool helps identify performance bottlenecks.
  • Ractors and M:N Scheduling: Anjo discusses how Ruby 3.0 introduced Ractors, which provide a way to achieve true parallelism by creating separate execution contexts, enabling multiple threads to run concurrently without being gated by the GVL. Additionally, M:N scheduling allows more Ruby threads to share fewer operating system (OS) threads, thus improving throughput.

In conclusion, Anjo emphasizes the importance of profiling Ruby applications in real-world environments to identify performance issues related to the GVL. He encourages developers to experiment with different concurrency models, such as Ractors, to optimize application performance and manage latency more effectively. He leaves the audience with practical recommendations for tuning Ruby applications and highlights the continuous evolution of the Ruby ecosystem.

How the Ruby Global VM Lock impacts app performance
Ivo Anjo • November 13, 2024 • Chicago, IL • Talk

Have you ever heard of Ruby's Global VM Lock? Maybe you've heard of the GIL instead?

Spoiler: It's an implementation detail of Ruby that can have a big impact on latency and performance of applications.
At a high-level, it prevents Ruby code across multiple threads from running in parallel, while still safely allowing concurrency.

Join me as we explore what the GVL is; what it is not; and how we can use the `gvl-tracing` gem to show it in action. We’ll also look into how Ractors, M:N scheduling and fibers fit into this story. By the end of the talk you’ll have the tools to evaluate if your own applications are being impacted, as well as learn a number of do’s and don’ts of Ruby performance.

RubyConf 2024

00:00:15.320 uh thanks everyone for being here and
00:00:16.920 welcome to my talk on how the Ruby
00:00:18.880 Global VM lock impacts application
00:00:21.000 performance so who am I my name is evil
00:00:24.400 and I'm currently a senior software
00:00:26.160 engineer at data dog I've been a really
00:00:28.640 big fan of Ruby since I started using it
00:00:30.920 in 2014 but I've also always been like a
00:00:34.079 big fan of exploring language run times
00:00:36.200 trying to understand what What Lies
00:00:37.960 Beneath how do things connect and this
00:00:40.920 also got me to thinking about like
00:00:42.480 application performance how can we make
00:00:44.120 our applications faster how can we make
00:00:46.199 better use of the hardware that we have
00:00:48.840 and this is how I got to making tools
00:00:51.239 that allow us to uncover new performance
00:00:53.800 Insight by looking at our applications
00:00:55.960 in like weird new
00:00:58.079 ways this is uh led me to work at data
00:01:01.440 dog on the Ruby continuous profiler if
00:01:03.719 you haven't heard of data dog we have
00:01:06.560 Cloud monitoring as a service and we
00:01:09.280 this includes for Ruby profiling
00:01:11.280 distributed tracing an intelligent test
00:01:13.320 Runner security errors logs Etc so
00:01:16.520 here's a pretty screenshot of a bunch of
00:01:18.439 metrics and some distributed tracing
00:01:20.119 going on and visualizations that we can
00:01:22.079 create from this data here's the
00:01:24.079 profiler that I work on and like a flame
00:01:26.680 a flame graph which is a visualization
00:01:28.560 that we use to to show how your
00:01:30.799 resources are being are being uh spent
00:01:33.600 on your by the application and here's
00:01:36.000 the timeline visualization which is a
00:01:37.720 bit more uh deeper dive like by thread
00:01:41.040 so that's my employer let's talk uh
00:01:44.280 today about what's the
00:01:47.000 JVL how you can it can impact your
00:01:49.399 application latency and what you can do
00:01:51.840 about
00:01:52.640 it so as I said before I started in 2014
00:01:59.000 uh using Ruby and kind of shortly after
00:02:01.360 that at some point to someone kind of
00:02:04.399 either told me or I saw some references
00:02:06.719 that Ruby had like this JVL or maybe it
00:02:09.000 was the Gil thing and at some point
00:02:12.239 someone told me oh yeah that's why Ruby
00:02:14.239 doesn't use multiple cards and I went
00:02:16.400 all yeah okay and then what something
00:02:20.560 really interested uh interesting
00:02:22.319 happened in 2022 which was a ruby 3.2
00:02:25.840 introduced the JVL instrumentation API
00:02:28.519 and this API allows us to kind of um get
00:02:31.720 events on what the JVL uh is um doing
00:02:36.120 and so I I started experimenting uh with
00:02:38.760 building a timeline visualization for
00:02:40.680 this data and that's how I created the
00:02:43.360 JVL tracing Jam which produces
00:02:45.280 visualizations such as this one and I
00:02:47.840 even got a retweet by Matt which is
00:02:49.560 always cool so uh let's talk about what
00:02:53.920 is the global VM lock or JVL so when you
00:02:58.800 running when you think thinking about
00:03:00.200 your Ruby application the problems you
00:03:02.000 try to solve Etc you're usually not
00:03:03.840 thinking about the Ruby part directly uh
00:03:07.040 but the Ruby part the Ruby VM is a big
00:03:09.440 program that's written in C and has some
00:03:11.599 Ruby and has some Rust as well you might
00:03:14.400 also uh hear it called C Ruby or
00:03:17.159 MRI because there's a few other uh Ruby
00:03:19.959 implementation so this is a way we
00:03:22.319 sometimes call it to distinguish oh it's
00:03:24.200 this one not the J Ruby or trle Ruby or
00:03:26.799 some of the
00:03:27.799 others and in this Ruby in your Ruby
00:03:30.599 application when you create like a
00:03:32.480 couple of trads or maybe it's not you
00:03:34.640 creating trads maybe it's your web
00:03:36.799 server such as Puma or maybe it's your
00:03:39.640 background job processor such as
00:03:41.799 sidekick um they usually get mapped onet
00:03:44.840 to one to operating system threads I say
00:03:47.239 here usually I'll put a pin on that and
00:03:49.599 come back to it later but let's let's go
00:03:52.079 with this for now and so each Ruby
00:03:54.360 thread that you have is mapped to an
00:03:56.439 operating system thread and why is um
00:04:00.720 why should we care about this we care
00:04:02.360 about this because going back to Ruby
00:04:04.480 Ruby uh the Ruby VM is a big Pro program
00:04:07.519 that's written in C and this big program
00:04:10.200 uses multiple OS threads uh so so it's a
00:04:13.120 multi- threaded C application so this
00:04:16.120 brings us to the definition of what the
00:04:18.120 JVL is the global VM lock is a mechanism
00:04:21.759 to ensure correctness of C Ruby the
00:04:24.479 multi- threaded C program and this both
00:04:27.360 tells us a lot and not not much so let's
00:04:29.720 let's try to unpack this a bit let's
00:04:32.199 start with what's a lock a lock is a
00:04:34.440 mechanism that allows only one thread
00:04:36.440 the thread that's holding the lock uh to
00:04:38.240 work at the time so the metaphor here is
00:04:40.880 for instance a swing uh only one person
00:04:42.960 can be on the swing and if other people
00:04:44.600 want to go on it they need to wait and
00:04:46.440 if you try to get multiple people on the
00:04:48.280 swing at the same time maybe you can do
00:04:50.440 it but bad things are going to
00:04:52.440 happen and so when a trade inside the
00:04:54.960 Ruby VM is running Ruby code or
00:04:57.680 interacting with somehow the Ruby M data
00:05:00.880 it needs to be holding this JVL no other
00:05:03.919 thread can be doing the same and the
00:05:05.840 other threads that want to do so they
00:05:07.440 need to wait for their
00:05:09.160 turn you might have also heard about the
00:05:12.039 global interpreter lock or Gil so what
00:05:14.360 is the global interpreter lock What's
00:05:15.960 the Gill it's basically what python
00:05:18.280 calls their JVL and because python is a
00:05:20.639 very well-known well-used programming
00:05:22.400 language like sometimes it just yeah
00:05:24.039 yeah Ruby has the guil it's the same
00:05:25.840 thing and actually old versions of Linux
00:05:28.479 actually had the BK bkl the big kernel
00:05:31.120 lock which was the same same idea so
00:05:34.120 naming is hard people come up with
00:05:35.840 different names so it's the guil like if
00:05:37.759 someone says guil that's fine just keep
00:05:40.520 going um you might also uh uh have seen
00:05:44.680 this PR where kichi uh actually
00:05:47.600 mentioned that well the JVL is no longer
00:05:49.840 Global so it's no longer the global VM
00:05:51.840 log so it maybe um should use other
00:05:55.039 words and a bunch of functions inside
00:05:56.759 Ruby were renamed to match this but
00:05:59.039 actually some some functions still use
00:06:00.560 the name JVL so a COI proposed maybe
00:06:03.080 it's great valuable lock or something
00:06:05.120 like that so don't worry too much about
00:06:07.360 what it's called it's more the concept
00:06:09.000 and what it's doing uh inside the Ruby
00:06:11.599 VM so what is the JVL not and this is an
00:06:15.160 important part the gvl allows
00:06:17.400 concurrency which is the illusion of
00:06:19.479 having multiple threads running Ruby
00:06:21.240 code at the same time so you do
00:06:22.960 something then you switch then you do
00:06:24.680 something else then you switch then you
00:06:26.199 switch and if you do this fast and if it
00:06:28.360 kind of looks like you're doing multiple
00:06:30.199 things at the same time but it does not
00:06:32.520 allow parallelism so actually running
00:06:35.000 Ruby code at the same time on multiple
00:06:37.120 cars is not possible because of the JVL
00:06:40.280 waiting for Io can be parallel which is
00:06:43.199 what well uh something we do in all our
00:06:45.840 apps very commonly but that's waiting
00:06:47.840 that's not running Ruby code the like if
00:06:50.919 you want to think about it it's like on
00:06:52.280 a 90 single car PC where maybe Matt
00:06:55.120 started developing Ruby an image credit
00:06:57.440 here goes to me because I'm a huge
00:06:59.120 hoarder of old hardware and like well
00:07:01.840 this is sitting somewhere in my in my
00:07:03.840 father's
00:07:05.400 basement um so another thing that comes
00:07:09.240 up sometimes when I see people
00:07:10.599 discussing the JVL is if the JVL exists
00:07:13.800 why do we need Ruby concurrency apis
00:07:16.919 such as things such as mutex condition
00:07:19.560 variable Q size Q the the whole
00:07:23.240 concurrent Ruby Jam that offers a bunch
00:07:25.360 of really nice uh utilities Etc and the
00:07:28.840 answer is and this is a very key
00:07:31.240 question uh because the JVL only
00:07:33.680 protects the Ruby VM data or state more
00:07:36.720 specifically data from your app might
00:07:39.280 not end up what you wanted and let me
00:07:41.400 give you an example of what I mean
00:07:43.639 here's a very simple example where we
00:07:45.840 have two variables and we're trying to
00:07:47.479 kind of one by one uh take one and then
00:07:49.720 put in the other so at the end you would
00:07:52.039 expect that the result would be the same
00:07:54.240 that you started with but one of the
00:07:56.240 times I run this app on my machine
00:07:58.440 actually got this result
00:08:00.319 and this result demonstrates uh why we
00:08:02.800 need we still need concurrency
00:08:04.400 Primitives in Ruby it's because this is
00:08:06.720 a valid result for the Ruby VM but it's
00:08:09.319 incorrect at the application Level and
00:08:11.759 there's actually a name for this it's
00:08:13.120 called the race condition or just a race
00:08:14.960 in computer science and the intuition
00:08:17.199 here is because we have uh here um a
00:08:20.599 bunch of threads and all of the threads
00:08:22.560 are racing each other to do the same
00:08:24.759 thing at the same time and so that's why
00:08:27.280 we get a incorrect result If instead we
00:08:31.319 uh modify this application to actually
00:08:32.959 use a mutex then we get the correct
00:08:35.159 example so this is why we still need
00:08:36.839 things like mutexes Etc because
00:08:39.279 otherwise our code will not be correct
00:08:41.120 if there are any races uh in things like
00:08:43.959 this okay so one Insight is that the JVL
00:08:48.120 does not by itself protect application
00:08:50.200 correctness uh when Ruby code is running
00:08:52.240 in multiple threads and has races if
00:08:54.640 your code is written in such a way that
00:08:56.519 you don't get into this kind of problems
00:08:58.480 then you don't have to use mutexes but
00:09:01.760 in some cases you might need to except
00:09:05.399 sometimes the JVL actually kind of does
00:09:07.880 protect your Ruby code uh Often by
00:09:10.079 accident or implementation detail but
00:09:12.279 you definitely should not rely on this
00:09:14.079 for instance like in Ruby 34 uh some
00:09:16.320 Ruby um some methods that were
00:09:18.279 previously written in C are now being
00:09:20.519 written in Ruby for the VM so actually
00:09:23.480 those methods might get a different
00:09:25.240 Behavior regarding the JVL now so you
00:09:27.680 really shouldn't rely on this
00:09:30.000 and even if you kind of tested it and it
00:09:32.320 seems to work fine on your machine be
00:09:34.320 careful because sometimes like on your
00:09:36.200 machine it works but it will break in in
00:09:38.480 production because production has a lot
00:09:40.760 more customers a lot more requests all
00:09:42.680 coming in at the same time so be careful
00:09:46.480 there um a quick aside is that the J
00:09:49.480 Ruby and truffle Ruby implementations
00:09:51.440 actually don't use a JVL they use other
00:09:53.680 mechanisms to protect their own VM data
00:09:55.920 so they are able to get both concurrency
00:09:58.000 and parallelism but you can still get
00:10:00.680 dat uh uh race condition so uh that part
00:10:03.720 does not go away okay let's talk about
00:10:06.519 the most interesting part which is how
00:10:08.560 the Ruby Global VM lock actually impacts
00:10:10.800 application performance and to do that
00:10:13.160 we're going to do a few experiments
00:10:14.839 using the JVL tracing Gem and you can
00:10:16.880 also use it on your own application to
00:10:18.880 do so you can add it to your gem file it
00:10:21.000 needs Ruby 3 to or lighter because it
00:10:23.040 needs the JVL instrumentation API then
00:10:25.560 you can start it and wrap some of your
00:10:27.320 code with it and then you get like an
00:10:29.079 out outut file that you can go on this
00:10:30.839 website perfetto and look at it perfetto
00:10:34.040 is this nice website that the Google
00:10:35.839 folks built for this kind of performance
00:10:38.160 visualization data and the JVL tracing
00:10:40.800 gem that I created just kind of outputs
00:10:42.839 in this format so that you can go there
00:10:44.519 and watch it okay I have a bunch of code
00:10:47.600 examples and uh uh in my talk and if you
00:10:50.800 want to later experiment with it go on
00:10:53.320 the perfet website Etc here's a link for
00:10:56.079 it I'll post the link again in the in
00:10:58.040 the at the end uh but yeah let's get
00:11:01.040 started with a few examples so you can
00:11:02.639 kind of see how the JVL is impacting
00:11:04.639 your
00:11:05.399 application here's an example of the
00:11:07.600 visualization that uh we produce and let
00:11:10.320 me explain how you read it different
00:11:12.800 lines uh present different threads and
00:11:16.320 time flows from left to right and what
00:11:18.320 we're seeing is the state of the Ruby
00:11:20.240 thread a long time some of the states
00:11:23.279 we'll be showing is the started tracing
00:11:26.120 when we start the gem uh then we have
00:11:28.880 when a thread starts or when a thread
00:11:31.320 dies then we have the waiting State and
00:11:33.720 the waiting state is when a thread is
00:11:35.959 waiting for something either more work
00:11:37.800 to do or it's sleeping or it's uh
00:11:40.560 waiting for a response from the database
00:11:42.959 or something like that we have the all
00:11:45.480 important running State when actually
00:11:47.440 interesting things are happening Ruby
00:11:49.160 code is running your application is
00:11:50.760 making progress and then we have the key
00:11:53.320 critical state which is the want JVL
00:11:55.600 State this means that the thread
00:11:57.680 actually has work to do it's like like
00:11:59.399 ready to go but it's waiting because
00:12:01.639 another thread is running so this is
00:12:04.399 when you see this your application is
00:12:06.160 kind of being impacted by uh by
00:12:09.320 latency there's also garbage collection
00:12:11.560 although we'll not cover it a lot today
00:12:14.680 okay so this example that I have there
00:12:16.639 is actually the source code is here on
00:12:18.920 the left it's basically I start two
00:12:21.279 threads and both of them are trying to
00:12:23.199 increment this counter because the
00:12:25.000 counter is a local variable uh it each
00:12:27.560 one gets their own counter so there's no
00:12:29.199 racing here and you can kind of see that
00:12:32.079 uh well we have two threads they are
00:12:34.440 incrementing their own counter we see
00:12:36.279 the concurrency but not the parallelm
00:12:38.399 because you can kind of see when one is
00:12:39.880 running the other is waiting and then
00:12:41.639 they kind of go back and forth back and
00:12:43.720 forth the uh they they are taking their
00:12:46.279 turns and when multiple threads want to
00:12:48.680 work what we're seeing is that we'll be
00:12:50.240 switching between them every 100
00:12:51.959 milliseconds so that period you see
00:12:54.480 there is 100 milliseconds and both of
00:12:56.600 them are getting like some time and
00:12:58.279 they're going back and for let's add a
00:13:01.079 few more threads once you add more
00:13:03.160 threads it means that each thread that
00:13:06.040 uh needs to wait longer for their turn
00:13:07.880 so there like we still see the 100
00:13:10.160 milliseconds period but we see that most
00:13:12.760 of them spend a lot of time waiting and
00:13:14.959 actually if we kind of like slice in on
00:13:17.639 one of the threads what we're seeing is
00:13:19.760 that the thread uh works for 100
00:13:22.279 milliseconds then needs to wait for 900
00:13:24.839 in this case then works then runs again
00:13:28.120 so this gets us to this key Insight
00:13:29.880 number two which is as we add more
00:13:31.920 threads to a ruby app we can impact
00:13:33.720 latency of each individual thread if all
00:13:36.440 of them are fighting for the global VM
00:13:38.360 lock of course like this is not a very
00:13:40.760 realistic Ruby example so most Ruby web
00:13:43.480 apps are not like this they have a lot
00:13:45.279 of network calls they go to the database
00:13:47.399 they do a a bunch of other things and
00:13:49.639 not as much processing but this is just
00:13:52.000 to show you like a very extreme case of
00:13:54.199 what can happen if all of your threads
00:13:55.839 are trying to make progress but um
00:13:58.480 they're all fighting to get the
00:14:00.920 JVL uh another key Insight from that we
00:14:03.560 can derive from this is that if you're
00:14:05.160 using threads your app latency no longer
00:14:07.880 depends on each request or method's
00:14:09.759 performance you can like super optimize
00:14:12.199 your rubby code you can have like the
00:14:13.759 best gem ever that like is like Ultra
00:14:16.519 optimized and it will still be impacted
00:14:19.079 if it needs to wait for the JVL before
00:14:21.199 it can actually run Ruby code and do the
00:14:23.079 super optimized thing so um saying it in
00:14:26.519 another way a specific Ruby web request
00:14:28.759 Maybe slow not because of anything that
00:14:31.480 the request is doing it may be like
00:14:33.399 really well built very very optimized
00:14:35.680 Etc but instead because of some other
00:14:38.079 request or thread uh that was doing a
00:14:40.440 lot of work in the background so if you
00:14:43.120 heard of the term noisy neighbors Ruby
00:14:45.120 threads are noisy neighbored even if you
00:14:47.440 have multiple threads they are
00:14:48.759 supposedly working on separate things
00:14:51.000 they can impact each other in this
00:14:53.480 way uh let's look at another example so
00:14:56.279 if um in this case we have two threads
00:14:58.720 but one cont kind of finishes its work
00:15:00.759 the bottom one and just kind of waits
00:15:02.399 for more work and the top one keeps
00:15:04.279 running so this shows that this whole uh
00:15:07.240 the 100 milliseconds period I was saying
00:15:09.279 is only when you have multiple threads
00:15:11.079 that are trying to uh acquire the JVL at
00:15:13.880 the same time if only one thread wants
00:15:15.880 it Ruby will give it as long as possible
00:15:18.519 and you can kind of keep executing
00:15:20.440 that's fine um again when a thread is
00:15:24.639 doing a bunch of network calls it will
00:15:26.560 give up the JVL while waiting for a
00:15:28.480 response so you can kind of see this
00:15:30.399 example you have a thread that runs then
00:15:32.360 Waits a bit for the response then runs
00:15:34.440 then Waits then runs then Waits and
00:15:36.480 while it's waiting it uh yields the the
00:15:38.959 global VM lock so other Ruby threads can
00:15:41.240 work so uh this is why if you have like
00:15:44.199 a lot of network calls at the same time
00:15:46.199 this is like Ruby can handle this
00:15:47.920 perfectly well because well they don't
00:15:50.240 need a
00:15:51.480 JVL Okay so we've said that threads each
00:15:54.600 get 100 milliseconds and that seems fair
00:15:57.800 are there cases where Ruby is actually
00:16:00.880 unfair and to answer this question let's
00:16:03.279 go back to the network call example and
00:16:05.560 here I've uh I've uh I've slightly
00:16:08.880 modified my example to just kind of
00:16:11.199 Benchmark uh how fast I can get fetch
00:16:13.720 the Google uh front page and uh you can
00:16:17.000 see that there's a bunch of most of the
00:16:19.000 time is spent waiting in fact you can't
00:16:21.399 really see the other the running periods
00:16:24.360 that are going in there but because you
00:16:25.759 don't see just one waiting you just see
00:16:27.639 like a bit bits of white wait wait wait
00:16:30.240 which means that the the states are
00:16:31.839 split there's a bunch of waiting there
00:16:34.880 and when I run this on my machine uh I
00:16:37.800 got like around 17 uh requests per per
00:16:41.000 second which is not great but well my
00:16:43.560 internet connection is not great okay so
00:16:47.079 let's try to make an experiment let's
00:16:48.959 add a Noisy Neighbor we have this where
00:16:51.680 we're trying to fetch the uh the Google
00:16:53.880 homepage as fast as possible what would
00:16:56.399 you expect that to happen if I added
00:16:59.000 this a background thread that's just
00:17:00.680 like just off in the corner just
00:17:02.319 counting numbers like these things are
00:17:05.319 completely separate what would you
00:17:07.120 expect happen well what happens in
00:17:09.640 reality is this um Network performance
00:17:13.360 goes really down and I'm now able to do
00:17:15.439 two requests per second instead of 17 on
00:17:17.959 my uh network connection and the JVL
00:17:20.760 tracing gem can explain what's going
00:17:22.720 here on here before we had just a bunch
00:17:25.039 of waiting and now what happens is that
00:17:28.079 we the the thread Doing Network request
00:17:30.880 spends most of its time waiting for the
00:17:33.039 JVL and the back the the thread on the
00:17:35.679 bottom is the one that's doing the
00:17:37.080 counting spends most of its time running
00:17:39.240 and hogging the JVL and actually we need
00:17:42.679 to actually zoom in a lot on the
00:17:44.480 visualization to understand what's
00:17:46.039 happening and what's happening here is
00:17:48.520 that the uh the thread that's doing the
00:17:50.440 network calls like runs for a bit of
00:17:52.480 time and sets up the network call and
00:17:54.000 then stop uh starts waiting and because
00:17:57.240 it starts waiting it yields the JVL and
00:17:59.559 so the bottom thread now has like 100
00:18:01.760 milliseconds to run and so the thread
00:18:03.960 doing the network calls it like is ready
00:18:06.159 to do something like a few second a few
00:18:08.440 milliseconds after but by that time it
00:18:10.640 already lost the JVL so it needs to wait
00:18:13.000 for 100 milliseconds and this repeats
00:18:15.240 and repeats and repeats and that's why
00:18:17.799 um that's why the this is so impacted as
00:18:20.039 well and actually like this happened to
00:18:22.520 me when I was doing some experiments and
00:18:24.440 that's how I kind of uh did this example
00:18:27.400 based on like past experience where I
00:18:29.200 just had a background thread doing some
00:18:31.760 things and then suddenly my network was
00:18:33.919 really slow and I was really confused
00:18:35.559 what was happening so uh the key Insight
00:18:38.440 from here is that the global VM lock is
00:18:40.400 not always
00:18:41.799 Fair um Ruby threads if you were using
00:18:44.440 thread if you're using fibers you get
00:18:46.679 more utilization you get more throughput
00:18:49.400 but this can impact latency so it's uh
00:18:51.880 useful to be aware of the trade-off
00:18:53.520 you're doing here so um as a bit of a a
00:18:57.200 full uh disclaimer uh Ruby switches
00:18:59.760 threads every 100 milliseconds that the
00:19:01.760 thread spends running Ruby code or when
00:19:04.520 a thread is about to block when it's
00:19:06.200 doing iio this includes network uh file
00:19:09.400 Etc or uh it's trying to acquire a lock
00:19:12.039 and the lock is not available or you
00:19:13.919 sleep or you call try that pass or at
00:19:17.000 certain points when running C or native
00:19:19.039 code like when you go into the VM or
00:19:20.840 into a native extension there's ways for
00:19:22.919 the VM or a native extension to say oh I
00:19:25.280 don't want the JVL now you can have it
00:19:27.600 and then I want it again
00:19:29.600 again okay let's talk a bit about
00:19:32.280 rectors and MN scheduling which is like
00:19:34.559 a lot of really cool things that came up
00:19:36.320 in Ruby to make this better so rectors
00:19:39.919 were introduced in Ruby 3 as a new
00:19:41.600 concurrency primitive and rectors
00:19:43.760 provide parallelism and communicate view
00:19:45.880 messages so let's try to modify some of
00:19:48.480 the examples I've shown you to use
00:19:50.240 rectors so uh for instance in the
00:19:52.679 network encounters example I've just
00:19:54.840 changed to instead of using a background
00:19:56.760 thread I use background rectors and it
00:19:59.200 fixes the issue completely so now we see
00:20:01.559 that the top thread is able to like
00:20:03.440 spend most of its time waiting and
00:20:05.320 whenever data on the network it it can
00:20:08.080 progress again immediately and the
00:20:09.960 bottom thread can like do its counting
00:20:12.120 thing uh continuously so very successful
00:20:16.440 even like if you remember the example
00:20:18.240 where I had a bunch of counters now
00:20:20.000 there's the threads don't have to wait
00:20:21.720 for each other they all can run at the
00:20:23.640 same time in parallel and so we get a
00:20:26.080 lot better performance so inside number
00:20:28.919 seven rors are parallel as expected and
00:20:31.520 that works nice but actually there's a
00:20:34.679 bit more to it uh let's talk about
00:20:36.760 tractors until Ruby 32 versus 33 and
00:20:40.000 after the picture I was showing you was
00:20:42.960 taken from Ruby 32 where we had like 10
00:20:45.200 threads and they all running at the same
00:20:47.240 time when I run this example on Room 33
00:20:50.480 actually I saw this where a bunch of
00:20:52.640 threads were working but the bottom ones
00:20:54.799 weren't actually yet so let me zoom in
00:20:57.520 on this so what we're seeing here is
00:21:00.760 that eight threads are uh or eight eight
00:21:03.360 trades each inside their own Rector are
00:21:05.600 able to run but the two bottom rors are
00:21:08.679 not yet so why is this uh we see the
00:21:12.760 answer if we look at the Ruby 3.3 uh
00:21:15.360 release announcement where there's this
00:21:17.279 section about the MN threat scheduler
00:21:20.320 where it says the uh Ruby Max CPU
00:21:23.360 environment variable sets the maximum
00:21:25.080 number of native threads default value
00:21:27.360 is eight this is where that eight is
00:21:29.720 coming from and actually uh if for
00:21:31.880 instance you set it to four you see a
00:21:34.440 slightly different Behavior where Ruby
00:21:36.080 is only using for uh native threads at
00:21:38.240 the time and uh you need to consider
00:21:41.000 that the top thread is running on the
00:21:43.080 main Rector and the main Rector does not
00:21:45.000 get main um MN scheduling by the by
00:21:47.760 default so that's why you see the weird
00:21:49.400 Behavior there um where the we have four
00:21:52.960 threads executing then the the the top
00:21:55.080 thread keeps its own uh execution even
00:21:58.200 though it's not doing anything and then
00:22:00.640 three more can execute and then three
00:22:02.360 more can execute so we are now ready to
00:22:05.039 go back to that part where I said
00:22:06.919 threads uh Ruby threads usually get
00:22:09.159 mapped to one to one to operating system
00:22:11.600 threads and we can kind of say that this
00:22:14.000 is no longer through with the MN
00:22:16.080 scheduling and we now can have multiple
00:22:19.240 threads here we have uh 10 Ruby threads
00:22:21.919 that are mapped to four operating system
00:22:25.320 thread uh with the exception well of the
00:22:28.200 main reactor if we don't have MN
00:22:30.440 scheduling enabled on the main Rector to
00:22:33.039 be able to visualize this and think
00:22:34.679 about it and experiment a bit I actually
00:22:36.799 introduced a new uh threads uh view on
00:22:39.760 the JVL tracing gem which is which I
00:22:41.760 call the OS threads view which the idea
00:22:44.600 is that at the top we see the M Ruby
00:22:46.760 thread so we see that we're using 10
00:22:48.799 Ruby threads each inside their own RoR
00:22:51.039 and at the bottom we see what the
00:22:52.600 operating system is seeing which is
00:22:54.159 actually for uh operating system threads
00:22:57.039 and so we can see that the top thread
00:22:59.240 actually has its own native thread and
00:23:02.120 and keeps executing alone and then the
00:23:04.600 other rectors are all sharing the three
00:23:06.520 threads and we can see that once three
00:23:08.919 rectors are done three more can execute
00:23:11.039 and then three more can execute Etc you
00:23:13.640 can also maybe spot something that I
00:23:15.840 found a bit weird which is well wait
00:23:19.799 those rors they they are running for
00:23:21.919 like 10 seconds uh and then only after
00:23:24.840 the 10 seconds can other rors run like
00:23:27.320 that doesn't match like the 100
00:23:29.039 milliseconds I was expecting to see and
00:23:31.720 I believe this is a bug and I'm chatting
00:23:33.679 with the Rubik car folks on like how
00:23:35.640 this can be
00:23:36.640 improved um another thing is I was
00:23:39.919 saying so far is that the main Rector
00:23:42.320 does not get MN scheduling unless you
00:23:44.240 enable this feature uh which this
00:23:46.400 environment variable which is the Ruby
00:23:48.000 MN thread environment variable if you do
00:23:50.679 you will get MN scheduling for Al trads
00:23:54.080 and I actually did like a few
00:23:55.279 experiments and for instance if you go
00:23:57.320 back to the example where I had two 10
00:23:59.880 threads uh all fighting for the JVL I
00:24:03.080 actually see this really weird behavior
00:24:05.120 when I enable the MN scheduling where
00:24:07.880 each of the threads gets 100
00:24:09.720 milliseconds the first time and then the
00:24:12.440 the main thread keeps getting 100
00:24:14.200 milliseconds and then the others only
00:24:16.000 get 10 milliseconds at the time and you
00:24:18.480 can see that like the main thread keeps
00:24:20.799 its own West thread and the other
00:24:22.880 threads are only are all sharing the
00:24:24.799 same OS thread so this also looks weird
00:24:28.200 and and probably something that can get
00:24:30.880 can get fixed maybe tomorrow during the
00:24:32.760 heck day so the key inside number eight
00:24:37.320 is uh the timeline visualizations are
00:24:39.600 really cool like uh the like this kind
00:24:42.039 of thing like Ruby 3.3 has been out
00:24:44.039 since December last year and uh maybe
00:24:47.039 people haven't noticed some of these
00:24:48.720 things and or at least I didn't see any
00:24:50.919 any reference to it and well like when
00:24:53.360 you once once you get the right way of
00:24:55.159 looking at your performance some of
00:24:56.679 these things really pop out really easy
00:24:58.600 and you can start
00:24:59.840 going that seems weird um okay let me
00:25:03.679 say a few more things the Shopify JVL
00:25:06.440 tools gem is also really cool because it
00:25:08.600 provides JVL instrumentation as metrics
00:25:11.080 so you can get a metric for how long
00:25:12.919 threads are waiting to uh to acquire the
00:25:16.000 gvl you can also have like how many
00:25:18.360 threads are waiting for the gvl so you
00:25:20.320 can use this kind of thing for
00:25:22.320 monitoring uh also the data dog Ruby
00:25:24.880 continuous profiler which I work on I've
00:25:27.440 also added support for the JVL
00:25:29.559 information it's still off by default
00:25:31.240 but you can enable it and well you can
00:25:33.360 get the same kind of timeline
00:25:35.399 visualization here's the counter example
00:25:38.039 uh that I was showing you can see like
00:25:39.720 oh yeah the 900 milliseconds waiting for
00:25:42.200 JVL and you can actually also get like a
00:25:44.919 stack Trace to understand what exactly
00:25:46.960 was affected and you can even look at it
00:25:49.080 by web request so uh I have a lot of fun
00:25:52.000 working on this okay so I've told you a
00:25:55.279 bunch of things about performance about
00:25:56.880 threads about tractors Etc so what
00:25:59.240 should you do with all this information
00:26:01.640 well keep using threads and fibers in
00:26:04.000 most cases if it works for you it will
00:26:06.480 keep on working so don't don't like
00:26:09.240 Panic suddenly you should avoid using
00:26:11.919 too many threads if you are latest is
00:26:13.679 sensitive so uh a good like Ru rule of
00:26:16.279 time to start from if you want one is
00:26:18.960 like on a for BCP machine with around 8
00:26:21.559 gabt of memory start uh in Puma with
00:26:24.480 like four workers five threads and start
00:26:26.360 tuning from there
00:26:28.880 um you should uh think about separating
00:26:31.360 latency sensitive code from other things
00:26:33.840 remember that the performance of a ruby
00:26:35.640 app end points impacts others in the
00:26:37.840 same app so if you have like some like
00:26:40.159 report generation thing that takes a
00:26:42.240 while uh don't mix it in your web app
00:26:44.799 with like things that you want to be
00:26:47.320 like fast and uh rep respond to requests
00:26:51.240 really uh with really good
00:26:53.440 performance um also don't guess
00:26:56.080 experiment and measure uh you can you
00:26:58.240 can use the JVL uh tools gem to measure
00:27:00.559 you can use JVL tracing gem to observe I
00:27:03.440 recommend profiling or benchmarking in
00:27:05.799 production or at least in a trying to
00:27:07.640 set up a realistic environment because
00:27:10.440 as you've SE as you've seen here this
00:27:12.440 this kind of thing only happens when you
00:27:13.960 have a bunch of concurrency so if you
00:27:16.080 kind of do like a very minimalistic
00:27:17.960 example on like a oh my local machine
00:27:20.760 that doesn't have a lot of traffic Etc
00:27:23.200 you will not see these problems these
00:27:24.799 problems show up when you have a lot of
00:27:26.399 traffic when when you're
00:27:28.760 production uh you can also try pumo or
00:27:31.399 another web server in a one thread per
00:27:33.520 process and see how that behaves for
00:27:35.360 instance see if your latencies change a
00:27:37.360 lot or not if they do maybe there's
00:27:39.840 something that you should tune uh um and
00:27:43.120 and look at
00:27:44.679 improving also uh rators and J Ruby and
00:27:47.919 truffle Ruby can unlock a bunch of like
00:27:49.679 performance and latting benefits so uh
00:27:52.360 consider trying them out and
00:27:53.960 experimenting remember that the JVL is
00:27:56.640 not meant to protect your code and
00:27:58.559 sometimes does accidentally and garbage
00:28:01.320 collection remember is also not parallel
00:28:03.720 future talk idea but it's also like a
00:28:06.519 bit of an Noisy Neighbor if you have an
00:28:08.279 endpoint that allocates a lot this will
00:28:10.279 impact everything in your app so keep
00:28:12.200 this in mind as well the future is
00:28:14.559 bright Ruby is all is improving and
00:28:16.240 evolving and as I wrote this slide and I
00:28:19.000 Was preparing this presentation last
00:28:20.480 week I actually uh someone linked to
00:28:23.559 this a proposal that Aaron made uh last
00:28:26.200 week actually about adding an
00:28:28.120 environment varable to configure the 100
00:28:29.919 milliseconds to switch threads to maybe
00:28:32.279 uh allow this to be a bit more flexible
00:28:34.640 so this is what I mean about Ruby is
00:28:36.279 always improving always evolving and I'm
00:28:38.760 really excited to always like be part of
00:28:41.559 this community thank you that's all I
00:28:43.880 had feel free to email me uh catch up
00:28:47.279 with me on Twitter and here's the link
00:28:49.960 again
Explore all talks recorded at RubyConf 2024
+64