00:00:15.080
All right, I’m glad to get started. First off, thank you everybody for being here. It’s really nice to have people who actually want to listen to what you have to say, especially when you've not actually given this talk before. This is my first time giving this talk, and hopefully I have some very interesting stuff in it, and hopefully people will learn a few things from it.
00:00:34.440
The obviously inflammatory title, which was clearly intended to make a statement, is 'Everything You Know About the GIL is Wrong'. Now I'm sure that there's at least one person in the room who knows way more about the GIL than I do, but for the most part, a lot of people don’t understand this particular thing. So we're going to talk about it.
00:00:56.520
My name is Jerry D'Antonio, and I am from Akron, Ohio. How many people here have ever heard of Akron? Okay, all right. Now how many have heard of LeBron James? He is a local kid who lives just down the street from me and is pretty good at basketball.
00:01:09.680
I work for Test Double, so many of you have probably heard of Test Double. Some of you probably went and saw Justin's talk yesterday morning about how to stop hating your test suite. Justin is one of the founders of Test Double, and we are a consulting company out of Columbus, Ohio. The probably most relevant thing about me with respect to this conversation is that I created a gem called Concurrent Ruby.
00:01:36.119
Who here has heard of it? Okay, cool. Concurrent Ruby is a Ruby gem that provides a suite of concurrency tools to extend our options for building concurrent applications. Concurrent Ruby is being used by a number of projects like Rails, Sidekiq, Logstash, Dry-Flow, Volt, Hamster, and Microsoft Azure in their cloud. It’s really humbling to see these projects on the list saying that they're using our work.
00:02:13.360
But there is a sad, unfortunate truth about all this: I've actually been wasting my time. This whole effort to build a concurrency gem for Ruby has felt like a fool’s errand. Why? Because everybody knows that Ruby can't do concurrency. Raise your hand if you've ever heard someone say Ruby can't do concurrency.
00:02:48.159
Let’s just get that out of the way. For those of you who have not heard it, clearly, you don’t have Twitter accounts. If you follow Twitter, you’ll find that apparently, Ruby cannot do concurrency, and if it's on the internet, it must be true. So knowing how to Google and use the internet, I thought, 'Well, before I give this presentation about the GIL, let me look up a few factoids about the GIL. That’ll be fun!'
00:03:05.599
According to the internet, here are a few things: first, Ruby has this thing called a Global Interpreter Lock, also referred to as a Global Virtual Machine Lock or GBL. The GIL is considered soulless, heartless, and purely evil. The GIL hates you and wants you personally to be miserable. It supposedly eats babies for breakfast, kittens for dessert, and puppies for a midnight snack. Some say the GIL is the sole cause of climate change, and it's been speculated that if there were no GIL, there would be no more problems.
00:03:39.640
So, you all have heard those assertions. But for fun, since you're all here, let’s take a look at some code. This is a quick sample program. What this does is essentially hit an API. However, I apologize that this is via PowerPoint, so colors may not show very well. I will try to explain it the best I can and will post this on the web later.
00:04:19.320
Essentially, I am going to hit Yahoo’s Finance API. I picked 20 stock ticker symbols from Bloomberg. I’m bringing back the data for all 20 tickers and pulling out what the stock price was at the end of 2014. I have two functions: one called 'get_year_end_closing', which retrieves the data serially, and another called 'concurrent_get_year_end_closing' that retrieves it concurrently. The latter uses a feature from the Concurrent Ruby Library to handle multiple flagged requests.
00:05:25.639
What’s great is I can do that in one line of code, and it's straightforward. I will fire off 20 requests that run in background threads on a thread pool. Then, I collect the future objects — they will have their state updated when the task completes — and retrieve them in an array.
00:06:06.319
The second part of this script involves benchmarking. Has anyone here used Benchmark before? It's really useful! For those who haven't, the 'Benchmark.bmbm' method will run many of the tasks to determine how many times it must run them to gather enough data, then run them again to give output for comparison.
00:06:56.440
Now don’t say anything but think about what you expect to see from this: as we know, Ruby can't do concurrency. The GIL is a lock preventing interesting operations. Therefore, when I execute both serially and concurrently, I expect the times should be the same, right? Let’s run that and see what happens.
00:07:30.400
When you look at the output, you can see that it took about four seconds to do the serial method and less than three seconds to do the concurrent method. Clearly, something is awry with my test! Let's compare this same thing to runtimes that can perform concurrency, like JRuby.
00:08:21.520
JRuby took roughly four seconds, while the concurrent method took about one second. What about Rhinus? That runs on an LM without a GIL; it took similar amounts of time. MRI Ruby took less time to execute concurrently than JRuby or Rhinus, which are designed for concurrency. Apparently, the internet, which often decries MRI Ruby’s concurrency capabilities, misled me.
00:08:45.919
Let me ask you honestly, was anyone surprised to see that MRI Ruby could perform that quickly with concurrency? Thank you for being honest. But let’s talk about why that is. Many of you might want to see a 10 times performance improvement in your applications, right? This goes against the narrative we typically hear.
00:09:27.519
I have a lot to cover, but I'm going to try to convey it clearly. First, let’s define concurrency versus parallelism. Raise your hand if you've ever encountered a concurrency versus parallelism talk before. Just to make it clear, concurrency is not parallelism.
00:09:52.880
Over the next few slides, I will reference Rob Pike, who is one of the creators of the Go programming language. He speaks greatly on this topic. Quoting him: 'Concurrency programming is the composition of independently executing processes.' Think about our previous examples: we fired off multiple futures as independent processes and then composed them into a useful application.
00:10:09.600
Parallelism, however, involves the simultaneous execution of possibly related computations and requires more than one processor core. Concurrency can happen with or without multiple cores. The fundamental idea of concurrency is designing our applications around independently executing processes.
00:10:30.520
Here’s the takeaway: non-concurrent programs gain no benefit from running on multiple processors. If I write my code concurrently, I can gain speed when I have parallelism available. If I only run it serially, I won't benefit regardless of the number of processors.
00:11:00.399
Now, let’s talk about the GIL. The 'L' in GIL stands for lock, and in computer science, a lock protects resources from being accessed simultaneously by multiple threads. A thread wants to access a resource, and if the lock is available, it acquires it. If not, it typically blocks until it becomes available.
00:11:40.720
A thread is a sequence of program instructions managed by a scheduler. The number of threads running can far exceed the number of processes. For instance, while using my MacBook Pro, I observed more than 1,000 threads running, yet it doesn't have that many processors.
00:12:20.560
Many languages, including Ruby and Java, map language constructs to operating system threads. Ruby creates an OS thread for each new thread created in Ruby. Other languages, like Erlang and Go, manage their concurrency internally across operating system threads.
00:12:50.319
In a situation where the operating system decides to perform a context switch, programming languages must be designed to maintain internal consistency. Ruby uses the GIL to protect its internal state during OS context switches.
00:13:34.560
To simplify, while thread A is operating, it locks the GIL. When the OS needs to switch to thread B, thread B cannot access the locked area, causing it to wait. Eventually, when thread A completes its operation, it releases the lock, allowing thread B to run.
00:14:17.480
The GIL ensures that only one unit of Ruby code can run at any given time. Even though there may be multiple threads, only one has the lock at any moment. Therefore, while we may have context switches, they can result in operations where other threads cannot proceed.
00:15:08.639
This state guarantees that Ruby's internal state remains consistent, preventing corruption. However, this does not provide any guarantees regarding the correctness of our code.
00:16:03.520
Every variable in Ruby is a reference to an area of memory where an object is stored. For example, if I have two variables referencing the same memory, they can be accessed simultaneously by different threads, leading to potential issues.
00:16:49.160
To illustrate this issue, consider a situation where two threads mutate a string in memory simultaneously. The outcome can vary widely, leading to different results each time, demonstrating an issue with thread safety and logical correctness.
00:17:34.560
Even if the code seems thread-safe, the logic may still be flawed. In a shared memory system, two threads accessing the same memory simultaneously presents a risk of corruption, which can be difficult to manage.
00:18:23.680
Ruby is an interpreted language, which compiles the code into bytecode. Ruby optimizes and reorders the compiled code without guarantees that the operations will be executed in the perceived linear order, complicating thread safety.
00:19:00.000
Despite these complexities, the GIL protects Ruby’s internal state, ensuring its consistency. However, it doesn't guarantee that our code itself is safe from concurrency issues.
00:19:42.440
A memory model describes how threads interact with shared data and ensures that certain behaviors are predictable. Java, for example, made considerable revisions to its memory model to provide adequate concurrent operations.
00:20:26.480
Currently, Ruby does not have a documented memory model. When Ruby was created, concurrency wasn’t a primary focus and thus, no comprehensive memory model was outlined. However, the GIL provides an implied memory model based on its behavior.
00:21:19.560
However, it's important to remember that you must not assume this memory model is consistent, as it may change unpredictably with updates in Ruby.
00:22:03.680
The GIL in MRI Ruby prevents true parallelism. But Ruby excellently multiplies threads efficiently when performing a lot of I/O operations, which is often the case in web applications.
00:22:53.000
Ruby handles I/O well because if one Ruby thread is blocked, others can continue operating without affecting Ruby's internal state. This behavior can lead to significant performance increases, like the tenfold improvement we previously discussed.
00:23:36.600
What kind of operations are we referring to with I/O? Reading and writing files, database interactions, listening for network connections, or API calls. Everyone in this room surely performs these types of operations regularly.
00:24:27.200
So, it becomes clear that Ruby's approach to handling I/O works very well for typical application requirements. Yet, there seems to be negative sentiment surrounding Ruby's performance in concurrency.
00:25:14.200
But a factor is that Ruby isn’t perfect at concurrency. It excels at handling concurrent I/O, but not so much for processor-intensive tasks due to the GIL's limitations.
00:25:34.080
In a demonstration comparing the same operations executed both serially and concurrently, we find that MRI Ruby does not gain speeds while performing a processor-heavy operation. However, Ruby's design means it would not slow down either.
00:26:10.680
In contrast, when running similar operations on JRuby or Trinius, you can observe a notable performance boost due to their ability to utilize parallelism.
00:26:56.960
The final reason for Ruby's perception problems with concurrency is that it lacks comprehensive tools. While Ruby provides basic concurrency tools like threads and mutexes, it doesn’t compare with the tooling available in languages like Java, Go, and Erlang.
00:27:48.560
However, keep in mind that Ruby’s charm lies in its productivity. You can accomplish complex tasks with minimal code thanks to Ruby's rich standard library.
00:28:40.680
The need remains for better tooling and a stronger advocate to promote Ruby's strengths in concurrency. This presentation was not a sales pitch for Concurrent Ruby, but I want to clarify that providing good tools can alleviate worries about the GIL.
00:29:30.960
To conclude, let’s think about the narratives surrounding concurrency: concurrency is not parallelism. This is paramount to understand. Secondly, the GIL exists to protect Ruby's internal state during context switching, not to ensure thread safety of your code.
00:30:21.760
The GIL does prevent true parallelism in MRI Ruby, but Ruby is capable of handling multiple threads efficiently during intensive I/O operations. That represents Ruby's real capabilities.
00:30:57.220
Let’s spread the word that Ruby performs better than the often-debated perception indicates. Ruby handles concurrent I/O quite well and may enhance your application's performance considerably.
00:31:28.000
Finally, remember, keep calm and don’t sweat the GIL. Again, I am Jerry D'Antonio; I appreciate you all for being here. If you're interested in discussing further about concurrent Ruby or working on Ruby, please reach out!