00:00:10.910
Well, so hi again.
00:00:13.320
I'm going to talk about GIL, the Global Interpreter Lock in Ruby.
00:00:16.289
I appreciate you all being here on the third day when everyone is usually tired already.
00:00:19.529
I will do my best to make this talk as engaging as possible.
00:00:26.550
This talk is intended to be junior-friendly, but even if you are an experienced developer, I believe you will find something new today.
00:00:34.320
So whether you're a junior, experienced, or pretending to be experienced, this talk is for you.
00:00:44.640
I have been working in the industry for a long time, almost ten years with Ruby, and I absolutely love it.
00:00:52.199
In fact, even the text on my wedding cake was written in Ruby, even though my wife is a Pythonista.
00:01:02.670
Just like many of you, I didn’t bother researching how exactly multi-threading works. What exactly does GIL do? I had to conduct this research because a disaster happened to me and my project three years ago.
00:01:15.320
My project, called Vico, has been under development for five years, and I started building it even before it was officially founded.
00:01:30.270
Now, in 2017, we are thriving and have all chances to succeed; however, three years ago, in 2014, we were at the brink of collapsing due to a mistake in our thread safety.
00:01:49.229
I will share that story in detail later in this talk, but for now, to explain better what happened, I'll start with a simple example of what a race condition is.
00:02:05.909
I’ll give an example of how to earn a million dollars in Ruby. You have to start small; begin with ten thousand. If you have a bank account and you repeatedly add to it ten thousand times, you'll eventually reach one million.
00:02:22.379
After fifteen years in this industry, I can assure you this is the only way to earn significant money in Ruby.
00:02:31.079
I will even add tests to ensure we reach a million, but please don’t be alarmed by those symbols below—they are just there to highlight characters in the console with color.
00:02:56.310
Now, imagine that instead of repeating this cycle 400 times, I ask a hundred threads to do the job. Don't try to learn the exact syntax of how to spawn threads from this example; it’s irrelevant to your understanding.
00:03:12.390
Just trust me that this code will run one hundred threads, each of which will perform the cycle of ten thousand. Initially, I will try to run this code with a Ruby implementation that does not have GIL.
00:03:45.329
Does anyone here know an implementation that doesn't have GIL?
00:03:47.340
Right! JRuby, thank you.
00:03:48.930
So I ran it with JRuby, and we got an error. We didn’t get our million; instead, we encountered a problem.
00:03:56.129
Why? Because it’s a race condition, which is quite easy to understand. The reason for this race condition boils down to one line of this code.
00:04:11.790
This simple line in the scope of this code is responsible for the entire issue we are facing. If we analyze it closely, we can expand this line of code into three distinct operations: first, reading from the instance variable (the bank account), then incrementing it by one in memory, and finally writing back to the instance variable.
00:04:27.420
Now imagine that two threads are doing this simultaneously. Each thread will have its own copy of the local variable value to work with, but they will share the common instance variable of the bank account.
00:04:51.060
Picture this: the first thread reads the value from the bank account, which is currently five. The second thread reads the same value, five.
00:05:06.300
The first thread increments its value to six, and then the second thread does the same, also getting six. The first thread saves six to the bank account, and so does the second thread.
00:05:20.490
What should have been a two-dollar increase results in only one dollar being added to the bank account. This issue is known as a race condition.
00:05:28.310
Now, I will try to run the same code with Ruby MRI, which does have GIL. By the way, ladies and gentlemen, this will be the first introduction of GIL in this talk.
00:05:48.760
So, behold, I’m running the same code with Ruby MRI, and it appears to be correct. No matter how many times I run it, it seems to hold up.
00:06:01.470
It seems GIL saved us from the race condition, but it's too early to celebrate.
00:06:10.450
Now, imagine that a junior developer comes in and refactors the code. This talk is junior-friendly, so I'm partially blaming juniors for what has happened.
00:06:39.880
Basically, two methods—operations of reading from and writing to the bank account—have been extracted. This is actually quite a sensible refactor, and even well-known software engineers would approve of it.
00:06:56.160
However, GIL does not favor this change. Look, I’m writing it again, and… why is that? Well, we have two questions here to address.
00:07:12.690
The first question is: why did we allow the race condition to occur in the first place? And the second question is: why did it only happen when the junior developer extracted methods?
00:07:37.390
Great questions, and I will answer them, but first, I need to clarify a common misconception about multi-threading in Ruby.
00:07:42.770
This laptop has eight cores. I will now turn off seven of them and run the same code with Ruby MRI with GIL once again. Over to you—what do you think will happen when I turn off all the additional cores? Will the race condition go away or not?
00:08:11.049
Okay, yes, let’s try that. Just in case, I’m checking because sometimes it fails. You may say that yes, sometimes you will get correct results; after all, it’s all about probability.
00:08:19.220
But think of it—our code becoming unreliable due to a simple refactor which seemed innocent. I will turn off the cores and run it again. I’m ensuring only one core is available.
00:08:43.060
I will run it several times... and the race condition hasn’t disappeared. Our code is still unreliable. So just to clarify, I’m turning back my cores. Now I have three questions.
00:09:05.290
Why didn’t the race condition go away when we turned off other cores?
00:09:08.350
The answer is that parallelism is not concurrency. They may seem like synonyms, but they are not. Let’s examine why. Imagine we have two threads: the first thread aims to read from 'red' and the second aims to color 'blue'.
00:09:35.980
When people hear GIL allows only one thread to run at a time, they sometimes visualize it as one thread coming after another, forming a neat queue.
00:09:54.920
However, this is not how it works; this situation is neither concurrent nor parallel. We wish each thread occupied its own core and executed simultaneously, which is what true concurrency and parallelism entail.
00:10:24.240
With GIL, you can achieve concurrency but not parallelism. In this case, no matter how many cores are available, GIL will prevent your ability to use more than one core.
00:10:47.660
While one core remains engaged, both threads continuously contend for execution, with Ruby allocating a certain number of milliseconds to each thread before switching.
00:11:05.500
This explains why a race condition did not disappear even when we turned off additional cores—because GIL will still allow concurrent execution, differentiating between concurrent and parallel.
00:11:13.660
So, let's agree on one thing today: every time MRI switches between threads, it's referred to as a context switch. This is a crucial concept that will recur throughout this talk.
00:11:36.300
When I discuss multi-threading with juniors, they often ask why, if we can’t utilize more than one core, we would consider multi-threading in Ruby at all.
00:12:01.720
Well, let's consider a scenario: imagine you need to communicate with a very slow remote API, and you are to make 25 requests. Instead of waiting for each one sequentially, you could create 25 threads, each waiting for its corresponding response.
00:12:44.210
Waiting does not consume CPU resources, so even a single core can manage 25 threads simply waiting for responses from a sluggish API.
00:13:00.900
Returning to the prior discussion, why did the race condition occur only when the junior developer refactored code?
00:13:08.100
The answer lies in how Ruby MRI switches contexts; this is initiated when you call a method.
00:13:16.820
When the junior developer refactored, the method call was introduced into the critical section of our algorithm, leading to the race condition.
00:13:25.200
Now we’ve addressed that question, but why was GIL introduced in the first place? What was the need for it?
00:13:48.440
Here’s another example: instead of inflating bank account values, we’re trying to populate the same array with three threads, and we want to ensure we get a million elements total.
00:13:55.950
The method call array.push is quite complex. If performed concurrently, many things can break or get corrupted inside.
00:14:23.500
If I run this with Ruby MRI, it will work correctly. This is because Ruby’s GIL protects the integrity of built-in methods written in C.
00:14:39.500
It also protects your C methods, unless there are callbacks to Ruby, which is not relevant to our current discussion. GIL was created to protect the internal integrity of Ruby’s data structures.
00:14:53.460
So let’s try running the same code with JRuby, which doesn’t have GIL. You might see an error indicating invalid array content due to concurrent modifications.
00:15:09.280
In JRuby, concurrent modifications can lead to data corruption, which will never occur in Ruby MRI due to GIL.
00:15:24.300
Today’s main takeaway is that GIL is not designed for your convenience but rather for the convenience of MRI developers.
00:15:46.370
To illustrate, I've added another check that verifies not only the array size but also its contents, and regardless of how many times I run it, the array size remains consistently accurate.
00:16:01.610
However, the contents are always incorrect, which means that while GIL protects the operation of insertion into an array, it does not protect your code surrounding that insertion.
00:16:13.670
This is the answer to our first question: Why did a race condition occur at all? GIL isn't designed to protect your code from race conditions.
00:16:37.490
As a final note, I want to share a vital insight about context switching.
00:16:55.290
Let me finally fulfill my promise and share my experience regarding what happened to my project three years ago.
00:17:03.000
Like many of you, I thought this topic was irrelevant to my work, that I would never use multi-threading, and that I was not going to encounter race conditions. I was mistaken.
00:17:18.150
My project, Vico, is a platform for e-commerce retailers that aggregates orders from various platforms like Amazon and eBay.
00:17:37.290
In the background, we use frameworks for making these API calls—can anyone guess the most popular background job processing framework in Ruby?
00:17:57.780
That’s right! Sidekiq. We were utilizing it to talk to Shopify.
00:18:02.967
However, three years ago, Rails Active Resource wasn’t thread-safe due to a bug. One morning, on the day of our Christmas party, a disaster occurred.
00:18:23.389
Our users began seeing commercial orders belonging to other users, exposing sensitive data.
00:18:40.660
Three years ago, we did not have many users, so we detected and fixed the problem fairly quickly; however, if it happened again today, it could be catastrophic given our large user base.
00:19:06.400
Such mistakes in thread safety can cost a business dearly.
00:19:20.120
One important lesson is to be smart: don't be like I was. Learn your stuff in advance to avoid being in a difficult position.
00:19:36.080
When our team was at a restaurant enjoying the view, we developers had our heads down, performing surgery on the database.
00:19:52.520
I want to discuss another important concept related to context switching.
00:20:04.090
Let’s look at the very simple example from earlier—now reminding you that it is thread-safe.
00:20:18.300
I will increase it to ten million to ensure no false negatives appear.
00:20:35.860
I’ll take this line and add 'if true'. It shouldn’t change anything in behavior. Running it again, it appears to be correct.
00:20:58.210
Next, I’ll convert it to 'unless false'; it shouldn’t change anything. But, boom! Out of nowhere, we have a race condition.
00:21:11.050
The only reasonable explanation is that the exact points at which Ruby MRI switches contexts are undocumented internal parts of Ruby.
00:21:43.320
You should never rely on these undocumented features; they can change from version to version without warning.
00:21:57.100
Consider this: I’m running Ruby 2.3 and can switch to 2.4. It’s okay; however, there were 3,000 commits between those two versions.
00:22:23.150
These changes affect internal behavior and can lead to unpredictable outcomes.
00:22:42.050
In conclusion, assume that context can be switched at any point in your Ruby code. The only safe approach is to be cautious.
00:23:08.910
You may wonder how to protect yourself from race conditions. There are many strategies, but they are far beyond the scope of this talk.
00:23:26.450
Initially, we reduced concurrency of workers to one in Sidekiq, manually fixed the database mess, and upgraded to Rails 4, where Active Resource became thread-safe.
00:23:48.350
Additionally, realizing that while Ruby 3.0 will introduce improvements, you cannot simply rely on GIL as it won’t save you from all concurrency issues.
00:24:07.220
To reiterate, let’s highlight a few crucial points: only Ruby MRI has GIL; GIL isn’t a solution for race conditions; parallelism is not concurrency; and you cannot rely on GIL for your convenience.
00:24:43.210
GIL will not prevent race conditions, nor is it a magical tool that can magically eliminate such issues.
00:24:57.420
As we conclude this talk, I won’t take questions because I may not fully understand your lovely American accents.
00:25:07.590
However, feel free to catch me in the corridor; I'm very friendly and will do my best to help. Alternatively, follow me on Twitter or email me; I’m always willing to assist.
00:25:34.220
Thanks for your attention!