00:00:15.320
uh thanks everyone for being here and
00:00:16.920
welcome to my talk on how the Ruby
00:00:18.880
Global VM lock impacts application
00:00:21.000
performance so who am I my name is evil
00:00:24.400
and I'm currently a senior software
00:00:26.160
engineer at data dog I've been a really
00:00:28.640
big fan of Ruby since I started using it
00:00:30.920
in 2014 but I've also always been like a
00:00:34.079
big fan of exploring language run times
00:00:36.200
trying to understand what What Lies
00:00:37.960
Beneath how do things connect and this
00:00:40.920
also got me to thinking about like
00:00:42.480
application performance how can we make
00:00:44.120
our applications faster how can we make
00:00:46.199
better use of the hardware that we have
00:00:48.840
and this is how I got to making tools
00:00:51.239
that allow us to uncover new performance
00:00:53.800
Insight by looking at our applications
00:00:55.960
in like weird new
00:00:58.079
ways this is uh led me to work at data
00:01:01.440
dog on the Ruby continuous profiler if
00:01:03.719
you haven't heard of data dog we have
00:01:06.560
Cloud monitoring as a service and we
00:01:09.280
this includes for Ruby profiling
00:01:11.280
distributed tracing an intelligent test
00:01:13.320
Runner security errors logs Etc so
00:01:16.520
here's a pretty screenshot of a bunch of
00:01:18.439
metrics and some distributed tracing
00:01:20.119
going on and visualizations that we can
00:01:22.079
create from this data here's the
00:01:24.079
profiler that I work on and like a flame
00:01:26.680
a flame graph which is a visualization
00:01:28.560
that we use to to show how your
00:01:30.799
resources are being are being uh spent
00:01:33.600
on your by the application and here's
00:01:36.000
the timeline visualization which is a
00:01:37.720
bit more uh deeper dive like by thread
00:01:41.040
so that's my employer let's talk uh
00:01:44.280
today about what's the
00:01:47.000
JVL how you can it can impact your
00:01:49.399
application latency and what you can do
00:01:51.840
about
00:01:52.640
it so as I said before I started in 2014
00:01:59.000
uh using Ruby and kind of shortly after
00:02:01.360
that at some point to someone kind of
00:02:04.399
either told me or I saw some references
00:02:06.719
that Ruby had like this JVL or maybe it
00:02:09.000
was the Gil thing and at some point
00:02:12.239
someone told me oh yeah that's why Ruby
00:02:14.239
doesn't use multiple cards and I went
00:02:16.400
all yeah okay and then what something
00:02:20.560
really interested uh interesting
00:02:22.319
happened in 2022 which was a ruby 3.2
00:02:25.840
introduced the JVL instrumentation API
00:02:28.519
and this API allows us to kind of um get
00:02:31.720
events on what the JVL uh is um doing
00:02:36.120
and so I I started experimenting uh with
00:02:38.760
building a timeline visualization for
00:02:40.680
this data and that's how I created the
00:02:43.360
JVL tracing Jam which produces
00:02:45.280
visualizations such as this one and I
00:02:47.840
even got a retweet by Matt which is
00:02:49.560
always cool so uh let's talk about what
00:02:53.920
is the global VM lock or JVL so when you
00:02:58.800
running when you think thinking about
00:03:00.200
your Ruby application the problems you
00:03:02.000
try to solve Etc you're usually not
00:03:03.840
thinking about the Ruby part directly uh
00:03:07.040
but the Ruby part the Ruby VM is a big
00:03:09.440
program that's written in C and has some
00:03:11.599
Ruby and has some Rust as well you might
00:03:14.400
also uh hear it called C Ruby or
00:03:17.159
MRI because there's a few other uh Ruby
00:03:19.959
implementation so this is a way we
00:03:22.319
sometimes call it to distinguish oh it's
00:03:24.200
this one not the J Ruby or trle Ruby or
00:03:26.799
some of the
00:03:27.799
others and in this Ruby in your Ruby
00:03:30.599
application when you create like a
00:03:32.480
couple of trads or maybe it's not you
00:03:34.640
creating trads maybe it's your web
00:03:36.799
server such as Puma or maybe it's your
00:03:39.640
background job processor such as
00:03:41.799
sidekick um they usually get mapped onet
00:03:44.840
to one to operating system threads I say
00:03:47.239
here usually I'll put a pin on that and
00:03:49.599
come back to it later but let's let's go
00:03:52.079
with this for now and so each Ruby
00:03:54.360
thread that you have is mapped to an
00:03:56.439
operating system thread and why is um
00:04:00.720
why should we care about this we care
00:04:02.360
about this because going back to Ruby
00:04:04.480
Ruby uh the Ruby VM is a big Pro program
00:04:07.519
that's written in C and this big program
00:04:10.200
uses multiple OS threads uh so so it's a
00:04:13.120
multi- threaded C application so this
00:04:16.120
brings us to the definition of what the
00:04:18.120
JVL is the global VM lock is a mechanism
00:04:21.759
to ensure correctness of C Ruby the
00:04:24.479
multi- threaded C program and this both
00:04:27.360
tells us a lot and not not much so let's
00:04:29.720
let's try to unpack this a bit let's
00:04:32.199
start with what's a lock a lock is a
00:04:34.440
mechanism that allows only one thread
00:04:36.440
the thread that's holding the lock uh to
00:04:38.240
work at the time so the metaphor here is
00:04:40.880
for instance a swing uh only one person
00:04:42.960
can be on the swing and if other people
00:04:44.600
want to go on it they need to wait and
00:04:46.440
if you try to get multiple people on the
00:04:48.280
swing at the same time maybe you can do
00:04:50.440
it but bad things are going to
00:04:52.440
happen and so when a trade inside the
00:04:54.960
Ruby VM is running Ruby code or
00:04:57.680
interacting with somehow the Ruby M data
00:05:00.880
it needs to be holding this JVL no other
00:05:03.919
thread can be doing the same and the
00:05:05.840
other threads that want to do so they
00:05:07.440
need to wait for their
00:05:09.160
turn you might have also heard about the
00:05:12.039
global interpreter lock or Gil so what
00:05:14.360
is the global interpreter lock What's
00:05:15.960
the Gill it's basically what python
00:05:18.280
calls their JVL and because python is a
00:05:20.639
very well-known well-used programming
00:05:22.400
language like sometimes it just yeah
00:05:24.039
yeah Ruby has the guil it's the same
00:05:25.840
thing and actually old versions of Linux
00:05:28.479
actually had the BK bkl the big kernel
00:05:31.120
lock which was the same same idea so
00:05:34.120
naming is hard people come up with
00:05:35.840
different names so it's the guil like if
00:05:37.759
someone says guil that's fine just keep
00:05:40.520
going um you might also uh uh have seen
00:05:44.680
this PR where kichi uh actually
00:05:47.600
mentioned that well the JVL is no longer
00:05:49.840
Global so it's no longer the global VM
00:05:51.840
log so it maybe um should use other
00:05:55.039
words and a bunch of functions inside
00:05:56.759
Ruby were renamed to match this but
00:05:59.039
actually some some functions still use
00:06:00.560
the name JVL so a COI proposed maybe
00:06:03.080
it's great valuable lock or something
00:06:05.120
like that so don't worry too much about
00:06:07.360
what it's called it's more the concept
00:06:09.000
and what it's doing uh inside the Ruby
00:06:11.599
VM so what is the JVL not and this is an
00:06:15.160
important part the gvl allows
00:06:17.400
concurrency which is the illusion of
00:06:19.479
having multiple threads running Ruby
00:06:21.240
code at the same time so you do
00:06:22.960
something then you switch then you do
00:06:24.680
something else then you switch then you
00:06:26.199
switch and if you do this fast and if it
00:06:28.360
kind of looks like you're doing multiple
00:06:30.199
things at the same time but it does not
00:06:32.520
allow parallelism so actually running
00:06:35.000
Ruby code at the same time on multiple
00:06:37.120
cars is not possible because of the JVL
00:06:40.280
waiting for Io can be parallel which is
00:06:43.199
what well uh something we do in all our
00:06:45.840
apps very commonly but that's waiting
00:06:47.840
that's not running Ruby code the like if
00:06:50.919
you want to think about it it's like on
00:06:52.280
a 90 single car PC where maybe Matt
00:06:55.120
started developing Ruby an image credit
00:06:57.440
here goes to me because I'm a huge
00:06:59.120
hoarder of old hardware and like well
00:07:01.840
this is sitting somewhere in my in my
00:07:03.840
father's
00:07:05.400
basement um so another thing that comes
00:07:09.240
up sometimes when I see people
00:07:10.599
discussing the JVL is if the JVL exists
00:07:13.800
why do we need Ruby concurrency apis
00:07:16.919
such as things such as mutex condition
00:07:19.560
variable Q size Q the the whole
00:07:23.240
concurrent Ruby Jam that offers a bunch
00:07:25.360
of really nice uh utilities Etc and the
00:07:28.840
answer is and this is a very key
00:07:31.240
question uh because the JVL only
00:07:33.680
protects the Ruby VM data or state more
00:07:36.720
specifically data from your app might
00:07:39.280
not end up what you wanted and let me
00:07:41.400
give you an example of what I mean
00:07:43.639
here's a very simple example where we
00:07:45.840
have two variables and we're trying to
00:07:47.479
kind of one by one uh take one and then
00:07:49.720
put in the other so at the end you would
00:07:52.039
expect that the result would be the same
00:07:54.240
that you started with but one of the
00:07:56.240
times I run this app on my machine
00:07:58.440
actually got this result
00:08:00.319
and this result demonstrates uh why we
00:08:02.800
need we still need concurrency
00:08:04.400
Primitives in Ruby it's because this is
00:08:06.720
a valid result for the Ruby VM but it's
00:08:09.319
incorrect at the application Level and
00:08:11.759
there's actually a name for this it's
00:08:13.120
called the race condition or just a race
00:08:14.960
in computer science and the intuition
00:08:17.199
here is because we have uh here um a
00:08:20.599
bunch of threads and all of the threads
00:08:22.560
are racing each other to do the same
00:08:24.759
thing at the same time and so that's why
00:08:27.280
we get a incorrect result If instead we
00:08:31.319
uh modify this application to actually
00:08:32.959
use a mutex then we get the correct
00:08:35.159
example so this is why we still need
00:08:36.839
things like mutexes Etc because
00:08:39.279
otherwise our code will not be correct
00:08:41.120
if there are any races uh in things like
00:08:43.959
this okay so one Insight is that the JVL
00:08:48.120
does not by itself protect application
00:08:50.200
correctness uh when Ruby code is running
00:08:52.240
in multiple threads and has races if
00:08:54.640
your code is written in such a way that
00:08:56.519
you don't get into this kind of problems
00:08:58.480
then you don't have to use mutexes but
00:09:01.760
in some cases you might need to except
00:09:05.399
sometimes the JVL actually kind of does
00:09:07.880
protect your Ruby code uh Often by
00:09:10.079
accident or implementation detail but
00:09:12.279
you definitely should not rely on this
00:09:14.079
for instance like in Ruby 34 uh some
00:09:16.320
Ruby um some methods that were
00:09:18.279
previously written in C are now being
00:09:20.519
written in Ruby for the VM so actually
00:09:23.480
those methods might get a different
00:09:25.240
Behavior regarding the JVL now so you
00:09:27.680
really shouldn't rely on this
00:09:30.000
and even if you kind of tested it and it
00:09:32.320
seems to work fine on your machine be
00:09:34.320
careful because sometimes like on your
00:09:36.200
machine it works but it will break in in
00:09:38.480
production because production has a lot
00:09:40.760
more customers a lot more requests all
00:09:42.680
coming in at the same time so be careful
00:09:46.480
there um a quick aside is that the J
00:09:49.480
Ruby and truffle Ruby implementations
00:09:51.440
actually don't use a JVL they use other
00:09:53.680
mechanisms to protect their own VM data
00:09:55.920
so they are able to get both concurrency
00:09:58.000
and parallelism but you can still get
00:10:00.680
dat uh uh race condition so uh that part
00:10:03.720
does not go away okay let's talk about
00:10:06.519
the most interesting part which is how
00:10:08.560
the Ruby Global VM lock actually impacts
00:10:10.800
application performance and to do that
00:10:13.160
we're going to do a few experiments
00:10:14.839
using the JVL tracing Gem and you can
00:10:16.880
also use it on your own application to
00:10:18.880
do so you can add it to your gem file it
00:10:21.000
needs Ruby 3 to or lighter because it
00:10:23.040
needs the JVL instrumentation API then
00:10:25.560
you can start it and wrap some of your
00:10:27.320
code with it and then you get like an
00:10:29.079
out outut file that you can go on this
00:10:30.839
website perfetto and look at it perfetto
00:10:34.040
is this nice website that the Google
00:10:35.839
folks built for this kind of performance
00:10:38.160
visualization data and the JVL tracing
00:10:40.800
gem that I created just kind of outputs
00:10:42.839
in this format so that you can go there
00:10:44.519
and watch it okay I have a bunch of code
00:10:47.600
examples and uh uh in my talk and if you
00:10:50.800
want to later experiment with it go on
00:10:53.320
the perfet website Etc here's a link for
00:10:56.079
it I'll post the link again in the in
00:10:58.040
the at the end uh but yeah let's get
00:11:01.040
started with a few examples so you can
00:11:02.639
kind of see how the JVL is impacting
00:11:04.639
your
00:11:05.399
application here's an example of the
00:11:07.600
visualization that uh we produce and let
00:11:10.320
me explain how you read it different
00:11:12.800
lines uh present different threads and
00:11:16.320
time flows from left to right and what
00:11:18.320
we're seeing is the state of the Ruby
00:11:20.240
thread a long time some of the states
00:11:23.279
we'll be showing is the started tracing
00:11:26.120
when we start the gem uh then we have
00:11:28.880
when a thread starts or when a thread
00:11:31.320
dies then we have the waiting State and
00:11:33.720
the waiting state is when a thread is
00:11:35.959
waiting for something either more work
00:11:37.800
to do or it's sleeping or it's uh
00:11:40.560
waiting for a response from the database
00:11:42.959
or something like that we have the all
00:11:45.480
important running State when actually
00:11:47.440
interesting things are happening Ruby
00:11:49.160
code is running your application is
00:11:50.760
making progress and then we have the key
00:11:53.320
critical state which is the want JVL
00:11:55.600
State this means that the thread
00:11:57.680
actually has work to do it's like like
00:11:59.399
ready to go but it's waiting because
00:12:01.639
another thread is running so this is
00:12:04.399
when you see this your application is
00:12:06.160
kind of being impacted by uh by
00:12:09.320
latency there's also garbage collection
00:12:11.560
although we'll not cover it a lot today
00:12:14.680
okay so this example that I have there
00:12:16.639
is actually the source code is here on
00:12:18.920
the left it's basically I start two
00:12:21.279
threads and both of them are trying to
00:12:23.199
increment this counter because the
00:12:25.000
counter is a local variable uh it each
00:12:27.560
one gets their own counter so there's no
00:12:29.199
racing here and you can kind of see that
00:12:32.079
uh well we have two threads they are
00:12:34.440
incrementing their own counter we see
00:12:36.279
the concurrency but not the parallelm
00:12:38.399
because you can kind of see when one is
00:12:39.880
running the other is waiting and then
00:12:41.639
they kind of go back and forth back and
00:12:43.720
forth the uh they they are taking their
00:12:46.279
turns and when multiple threads want to
00:12:48.680
work what we're seeing is that we'll be
00:12:50.240
switching between them every 100
00:12:51.959
milliseconds so that period you see
00:12:54.480
there is 100 milliseconds and both of
00:12:56.600
them are getting like some time and
00:12:58.279
they're going back and for let's add a
00:13:01.079
few more threads once you add more
00:13:03.160
threads it means that each thread that
00:13:06.040
uh needs to wait longer for their turn
00:13:07.880
so there like we still see the 100
00:13:10.160
milliseconds period but we see that most
00:13:12.760
of them spend a lot of time waiting and
00:13:14.959
actually if we kind of like slice in on
00:13:17.639
one of the threads what we're seeing is
00:13:19.760
that the thread uh works for 100
00:13:22.279
milliseconds then needs to wait for 900
00:13:24.839
in this case then works then runs again
00:13:28.120
so this gets us to this key Insight
00:13:29.880
number two which is as we add more
00:13:31.920
threads to a ruby app we can impact
00:13:33.720
latency of each individual thread if all
00:13:36.440
of them are fighting for the global VM
00:13:38.360
lock of course like this is not a very
00:13:40.760
realistic Ruby example so most Ruby web
00:13:43.480
apps are not like this they have a lot
00:13:45.279
of network calls they go to the database
00:13:47.399
they do a a bunch of other things and
00:13:49.639
not as much processing but this is just
00:13:52.000
to show you like a very extreme case of
00:13:54.199
what can happen if all of your threads
00:13:55.839
are trying to make progress but um
00:13:58.480
they're all fighting to get the
00:14:00.920
JVL uh another key Insight from that we
00:14:03.560
can derive from this is that if you're
00:14:05.160
using threads your app latency no longer
00:14:07.880
depends on each request or method's
00:14:09.759
performance you can like super optimize
00:14:12.199
your rubby code you can have like the
00:14:13.759
best gem ever that like is like Ultra
00:14:16.519
optimized and it will still be impacted
00:14:19.079
if it needs to wait for the JVL before
00:14:21.199
it can actually run Ruby code and do the
00:14:23.079
super optimized thing so um saying it in
00:14:26.519
another way a specific Ruby web request
00:14:28.759
Maybe slow not because of anything that
00:14:31.480
the request is doing it may be like
00:14:33.399
really well built very very optimized
00:14:35.680
Etc but instead because of some other
00:14:38.079
request or thread uh that was doing a
00:14:40.440
lot of work in the background so if you
00:14:43.120
heard of the term noisy neighbors Ruby
00:14:45.120
threads are noisy neighbored even if you
00:14:47.440
have multiple threads they are
00:14:48.759
supposedly working on separate things
00:14:51.000
they can impact each other in this
00:14:53.480
way uh let's look at another example so
00:14:56.279
if um in this case we have two threads
00:14:58.720
but one cont kind of finishes its work
00:15:00.759
the bottom one and just kind of waits
00:15:02.399
for more work and the top one keeps
00:15:04.279
running so this shows that this whole uh
00:15:07.240
the 100 milliseconds period I was saying
00:15:09.279
is only when you have multiple threads
00:15:11.079
that are trying to uh acquire the JVL at
00:15:13.880
the same time if only one thread wants
00:15:15.880
it Ruby will give it as long as possible
00:15:18.519
and you can kind of keep executing
00:15:20.440
that's fine um again when a thread is
00:15:24.639
doing a bunch of network calls it will
00:15:26.560
give up the JVL while waiting for a
00:15:28.480
response so you can kind of see this
00:15:30.399
example you have a thread that runs then
00:15:32.360
Waits a bit for the response then runs
00:15:34.440
then Waits then runs then Waits and
00:15:36.480
while it's waiting it uh yields the the
00:15:38.959
global VM lock so other Ruby threads can
00:15:41.240
work so uh this is why if you have like
00:15:44.199
a lot of network calls at the same time
00:15:46.199
this is like Ruby can handle this
00:15:47.920
perfectly well because well they don't
00:15:50.240
need a
00:15:51.480
JVL Okay so we've said that threads each
00:15:54.600
get 100 milliseconds and that seems fair
00:15:57.800
are there cases where Ruby is actually
00:16:00.880
unfair and to answer this question let's
00:16:03.279
go back to the network call example and
00:16:05.560
here I've uh I've uh I've slightly
00:16:08.880
modified my example to just kind of
00:16:11.199
Benchmark uh how fast I can get fetch
00:16:13.720
the Google uh front page and uh you can
00:16:17.000
see that there's a bunch of most of the
00:16:19.000
time is spent waiting in fact you can't
00:16:21.399
really see the other the running periods
00:16:24.360
that are going in there but because you
00:16:25.759
don't see just one waiting you just see
00:16:27.639
like a bit bits of white wait wait wait
00:16:30.240
which means that the the states are
00:16:31.839
split there's a bunch of waiting there
00:16:34.880
and when I run this on my machine uh I
00:16:37.800
got like around 17 uh requests per per
00:16:41.000
second which is not great but well my
00:16:43.560
internet connection is not great okay so
00:16:47.079
let's try to make an experiment let's
00:16:48.959
add a Noisy Neighbor we have this where
00:16:51.680
we're trying to fetch the uh the Google
00:16:53.880
homepage as fast as possible what would
00:16:56.399
you expect that to happen if I added
00:16:59.000
this a background thread that's just
00:17:00.680
like just off in the corner just
00:17:02.319
counting numbers like these things are
00:17:05.319
completely separate what would you
00:17:07.120
expect happen well what happens in
00:17:09.640
reality is this um Network performance
00:17:13.360
goes really down and I'm now able to do
00:17:15.439
two requests per second instead of 17 on
00:17:17.959
my uh network connection and the JVL
00:17:20.760
tracing gem can explain what's going
00:17:22.720
here on here before we had just a bunch
00:17:25.039
of waiting and now what happens is that
00:17:28.079
we the the thread Doing Network request
00:17:30.880
spends most of its time waiting for the
00:17:33.039
JVL and the back the the thread on the
00:17:35.679
bottom is the one that's doing the
00:17:37.080
counting spends most of its time running
00:17:39.240
and hogging the JVL and actually we need
00:17:42.679
to actually zoom in a lot on the
00:17:44.480
visualization to understand what's
00:17:46.039
happening and what's happening here is
00:17:48.520
that the uh the thread that's doing the
00:17:50.440
network calls like runs for a bit of
00:17:52.480
time and sets up the network call and
00:17:54.000
then stop uh starts waiting and because
00:17:57.240
it starts waiting it yields the JVL and
00:17:59.559
so the bottom thread now has like 100
00:18:01.760
milliseconds to run and so the thread
00:18:03.960
doing the network calls it like is ready
00:18:06.159
to do something like a few second a few
00:18:08.440
milliseconds after but by that time it
00:18:10.640
already lost the JVL so it needs to wait
00:18:13.000
for 100 milliseconds and this repeats
00:18:15.240
and repeats and repeats and that's why
00:18:17.799
um that's why the this is so impacted as
00:18:20.039
well and actually like this happened to
00:18:22.520
me when I was doing some experiments and
00:18:24.440
that's how I kind of uh did this example
00:18:27.400
based on like past experience where I
00:18:29.200
just had a background thread doing some
00:18:31.760
things and then suddenly my network was
00:18:33.919
really slow and I was really confused
00:18:35.559
what was happening so uh the key Insight
00:18:38.440
from here is that the global VM lock is
00:18:40.400
not always
00:18:41.799
Fair um Ruby threads if you were using
00:18:44.440
thread if you're using fibers you get
00:18:46.679
more utilization you get more throughput
00:18:49.400
but this can impact latency so it's uh
00:18:51.880
useful to be aware of the trade-off
00:18:53.520
you're doing here so um as a bit of a a
00:18:57.200
full uh disclaimer uh Ruby switches
00:18:59.760
threads every 100 milliseconds that the
00:19:01.760
thread spends running Ruby code or when
00:19:04.520
a thread is about to block when it's
00:19:06.200
doing iio this includes network uh file
00:19:09.400
Etc or uh it's trying to acquire a lock
00:19:12.039
and the lock is not available or you
00:19:13.919
sleep or you call try that pass or at
00:19:17.000
certain points when running C or native
00:19:19.039
code like when you go into the VM or
00:19:20.840
into a native extension there's ways for
00:19:22.919
the VM or a native extension to say oh I
00:19:25.280
don't want the JVL now you can have it
00:19:27.600
and then I want it again
00:19:29.600
again okay let's talk a bit about
00:19:32.280
rectors and MN scheduling which is like
00:19:34.559
a lot of really cool things that came up
00:19:36.320
in Ruby to make this better so rectors
00:19:39.919
were introduced in Ruby 3 as a new
00:19:41.600
concurrency primitive and rectors
00:19:43.760
provide parallelism and communicate view
00:19:45.880
messages so let's try to modify some of
00:19:48.480
the examples I've shown you to use
00:19:50.240
rectors so uh for instance in the
00:19:52.679
network encounters example I've just
00:19:54.840
changed to instead of using a background
00:19:56.760
thread I use background rectors and it
00:19:59.200
fixes the issue completely so now we see
00:20:01.559
that the top thread is able to like
00:20:03.440
spend most of its time waiting and
00:20:05.320
whenever data on the network it it can
00:20:08.080
progress again immediately and the
00:20:09.960
bottom thread can like do its counting
00:20:12.120
thing uh continuously so very successful
00:20:16.440
even like if you remember the example
00:20:18.240
where I had a bunch of counters now
00:20:20.000
there's the threads don't have to wait
00:20:21.720
for each other they all can run at the
00:20:23.640
same time in parallel and so we get a
00:20:26.080
lot better performance so inside number
00:20:28.919
seven rors are parallel as expected and
00:20:31.520
that works nice but actually there's a
00:20:34.679
bit more to it uh let's talk about
00:20:36.760
tractors until Ruby 32 versus 33 and
00:20:40.000
after the picture I was showing you was
00:20:42.960
taken from Ruby 32 where we had like 10
00:20:45.200
threads and they all running at the same
00:20:47.240
time when I run this example on Room 33
00:20:50.480
actually I saw this where a bunch of
00:20:52.640
threads were working but the bottom ones
00:20:54.799
weren't actually yet so let me zoom in
00:20:57.520
on this so what we're seeing here is
00:21:00.760
that eight threads are uh or eight eight
00:21:03.360
trades each inside their own Rector are
00:21:05.600
able to run but the two bottom rors are
00:21:08.679
not yet so why is this uh we see the
00:21:12.760
answer if we look at the Ruby 3.3 uh
00:21:15.360
release announcement where there's this
00:21:17.279
section about the MN threat scheduler
00:21:20.320
where it says the uh Ruby Max CPU
00:21:23.360
environment variable sets the maximum
00:21:25.080
number of native threads default value
00:21:27.360
is eight this is where that eight is
00:21:29.720
coming from and actually uh if for
00:21:31.880
instance you set it to four you see a
00:21:34.440
slightly different Behavior where Ruby
00:21:36.080
is only using for uh native threads at
00:21:38.240
the time and uh you need to consider
00:21:41.000
that the top thread is running on the
00:21:43.080
main Rector and the main Rector does not
00:21:45.000
get main um MN scheduling by the by
00:21:47.760
default so that's why you see the weird
00:21:49.400
Behavior there um where the we have four
00:21:52.960
threads executing then the the the top
00:21:55.080
thread keeps its own uh execution even
00:21:58.200
though it's not doing anything and then
00:22:00.640
three more can execute and then three
00:22:02.360
more can execute so we are now ready to
00:22:05.039
go back to that part where I said
00:22:06.919
threads uh Ruby threads usually get
00:22:09.159
mapped to one to one to operating system
00:22:11.600
threads and we can kind of say that this
00:22:14.000
is no longer through with the MN
00:22:16.080
scheduling and we now can have multiple
00:22:19.240
threads here we have uh 10 Ruby threads
00:22:21.919
that are mapped to four operating system
00:22:25.320
thread uh with the exception well of the
00:22:28.200
main reactor if we don't have MN
00:22:30.440
scheduling enabled on the main Rector to
00:22:33.039
be able to visualize this and think
00:22:34.679
about it and experiment a bit I actually
00:22:36.799
introduced a new uh threads uh view on
00:22:39.760
the JVL tracing gem which is which I
00:22:41.760
call the OS threads view which the idea
00:22:44.600
is that at the top we see the M Ruby
00:22:46.760
thread so we see that we're using 10
00:22:48.799
Ruby threads each inside their own RoR
00:22:51.039
and at the bottom we see what the
00:22:52.600
operating system is seeing which is
00:22:54.159
actually for uh operating system threads
00:22:57.039
and so we can see that the top thread
00:22:59.240
actually has its own native thread and
00:23:02.120
and keeps executing alone and then the
00:23:04.600
other rectors are all sharing the three
00:23:06.520
threads and we can see that once three
00:23:08.919
rectors are done three more can execute
00:23:11.039
and then three more can execute Etc you
00:23:13.640
can also maybe spot something that I
00:23:15.840
found a bit weird which is well wait
00:23:19.799
those rors they they are running for
00:23:21.919
like 10 seconds uh and then only after
00:23:24.840
the 10 seconds can other rors run like
00:23:27.320
that doesn't match like the 100
00:23:29.039
milliseconds I was expecting to see and
00:23:31.720
I believe this is a bug and I'm chatting
00:23:33.679
with the Rubik car folks on like how
00:23:35.640
this can be
00:23:36.640
improved um another thing is I was
00:23:39.919
saying so far is that the main Rector
00:23:42.320
does not get MN scheduling unless you
00:23:44.240
enable this feature uh which this
00:23:46.400
environment variable which is the Ruby
00:23:48.000
MN thread environment variable if you do
00:23:50.679
you will get MN scheduling for Al trads
00:23:54.080
and I actually did like a few
00:23:55.279
experiments and for instance if you go
00:23:57.320
back to the example where I had two 10
00:23:59.880
threads uh all fighting for the JVL I
00:24:03.080
actually see this really weird behavior
00:24:05.120
when I enable the MN scheduling where
00:24:07.880
each of the threads gets 100
00:24:09.720
milliseconds the first time and then the
00:24:12.440
the main thread keeps getting 100
00:24:14.200
milliseconds and then the others only
00:24:16.000
get 10 milliseconds at the time and you
00:24:18.480
can see that like the main thread keeps
00:24:20.799
its own West thread and the other
00:24:22.880
threads are only are all sharing the
00:24:24.799
same OS thread so this also looks weird
00:24:28.200
and and probably something that can get
00:24:30.880
can get fixed maybe tomorrow during the
00:24:32.760
heck day so the key inside number eight
00:24:37.320
is uh the timeline visualizations are
00:24:39.600
really cool like uh the like this kind
00:24:42.039
of thing like Ruby 3.3 has been out
00:24:44.039
since December last year and uh maybe
00:24:47.039
people haven't noticed some of these
00:24:48.720
things and or at least I didn't see any
00:24:50.919
any reference to it and well like when
00:24:53.360
you once once you get the right way of
00:24:55.159
looking at your performance some of
00:24:56.679
these things really pop out really easy
00:24:58.600
and you can start
00:24:59.840
going that seems weird um okay let me
00:25:03.679
say a few more things the Shopify JVL
00:25:06.440
tools gem is also really cool because it
00:25:08.600
provides JVL instrumentation as metrics
00:25:11.080
so you can get a metric for how long
00:25:12.919
threads are waiting to uh to acquire the
00:25:16.000
gvl you can also have like how many
00:25:18.360
threads are waiting for the gvl so you
00:25:20.320
can use this kind of thing for
00:25:22.320
monitoring uh also the data dog Ruby
00:25:24.880
continuous profiler which I work on I've
00:25:27.440
also added support for the JVL
00:25:29.559
information it's still off by default
00:25:31.240
but you can enable it and well you can
00:25:33.360
get the same kind of timeline
00:25:35.399
visualization here's the counter example
00:25:38.039
uh that I was showing you can see like
00:25:39.720
oh yeah the 900 milliseconds waiting for
00:25:42.200
JVL and you can actually also get like a
00:25:44.919
stack Trace to understand what exactly
00:25:46.960
was affected and you can even look at it
00:25:49.080
by web request so uh I have a lot of fun
00:25:52.000
working on this okay so I've told you a
00:25:55.279
bunch of things about performance about
00:25:56.880
threads about tractors Etc so what
00:25:59.240
should you do with all this information
00:26:01.640
well keep using threads and fibers in
00:26:04.000
most cases if it works for you it will
00:26:06.480
keep on working so don't don't like
00:26:09.240
Panic suddenly you should avoid using
00:26:11.919
too many threads if you are latest is
00:26:13.679
sensitive so uh a good like Ru rule of
00:26:16.279
time to start from if you want one is
00:26:18.960
like on a for BCP machine with around 8
00:26:21.559
gabt of memory start uh in Puma with
00:26:24.480
like four workers five threads and start
00:26:26.360
tuning from there
00:26:28.880
um you should uh think about separating
00:26:31.360
latency sensitive code from other things
00:26:33.840
remember that the performance of a ruby
00:26:35.640
app end points impacts others in the
00:26:37.840
same app so if you have like some like
00:26:40.159
report generation thing that takes a
00:26:42.240
while uh don't mix it in your web app
00:26:44.799
with like things that you want to be
00:26:47.320
like fast and uh rep respond to requests
00:26:51.240
really uh with really good
00:26:53.440
performance um also don't guess
00:26:56.080
experiment and measure uh you can you
00:26:58.240
can use the JVL uh tools gem to measure
00:27:00.559
you can use JVL tracing gem to observe I
00:27:03.440
recommend profiling or benchmarking in
00:27:05.799
production or at least in a trying to
00:27:07.640
set up a realistic environment because
00:27:10.440
as you've SE as you've seen here this
00:27:12.440
this kind of thing only happens when you
00:27:13.960
have a bunch of concurrency so if you
00:27:16.080
kind of do like a very minimalistic
00:27:17.960
example on like a oh my local machine
00:27:20.760
that doesn't have a lot of traffic Etc
00:27:23.200
you will not see these problems these
00:27:24.799
problems show up when you have a lot of
00:27:26.399
traffic when when you're
00:27:28.760
production uh you can also try pumo or
00:27:31.399
another web server in a one thread per
00:27:33.520
process and see how that behaves for
00:27:35.360
instance see if your latencies change a
00:27:37.360
lot or not if they do maybe there's
00:27:39.840
something that you should tune uh um and
00:27:43.120
and look at
00:27:44.679
improving also uh rators and J Ruby and
00:27:47.919
truffle Ruby can unlock a bunch of like
00:27:49.679
performance and latting benefits so uh
00:27:52.360
consider trying them out and
00:27:53.960
experimenting remember that the JVL is
00:27:56.640
not meant to protect your code and
00:27:58.559
sometimes does accidentally and garbage
00:28:01.320
collection remember is also not parallel
00:28:03.720
future talk idea but it's also like a
00:28:06.519
bit of an Noisy Neighbor if you have an
00:28:08.279
endpoint that allocates a lot this will
00:28:10.279
impact everything in your app so keep
00:28:12.200
this in mind as well the future is
00:28:14.559
bright Ruby is all is improving and
00:28:16.240
evolving and as I wrote this slide and I
00:28:19.000
Was preparing this presentation last
00:28:20.480
week I actually uh someone linked to
00:28:23.559
this a proposal that Aaron made uh last
00:28:26.200
week actually about adding an
00:28:28.120
environment varable to configure the 100
00:28:29.919
milliseconds to switch threads to maybe
00:28:32.279
uh allow this to be a bit more flexible
00:28:34.640
so this is what I mean about Ruby is
00:28:36.279
always improving always evolving and I'm
00:28:38.760
really excited to always like be part of
00:28:41.559
this community thank you that's all I
00:28:43.880
had feel free to email me uh catch up
00:28:47.279
with me on Twitter and here's the link
00:28:49.960
again