Understanding the Ruby Global VM Lock by observing it

00:00:02.340 Hello, everyone! Thank you for coming to my talk on understanding the Ruby Global VM Lock, or GVL, by observing it. My name is Ivo Anjo, and I am a senior engineer at Datadog. I've been a big Ruby fan since I started using it around 2014, and I really enjoy exploring language runtimes such as the Ruby VM, Java VM, and others. I'm also passionate about application performance, as I like to make the most out of the hardware we pay for. Additionally, I enjoy creating visualization tools to uncover new insights from our applications.

00:01:16.619 Currently, I am working on building Datadog's open-source continuous profiler for Ruby, which is shipped as part of the DD Trace channel. If you're interested in learning more about it, feel free to talk to me afterwards. I would also like to thank my employer, Datadog, for sponsoring this trip. Today, I'm here to talk about the Global VM Lock: what it is, what it is not, how we can observe it, how we can measure it, and what insights we can gain from this information. So, let's get started.

00:01:29.160 I started using Ruby at my first job around 2014 because I wanted to learn more about it and become a master at the tool I was using daily. At some point, I came across references online to this thing called the GVL. Later, someone mentioned to me, 'Oh, the GVL is why Ruby doesn't run on multiple cores.' I thought I understood it, but it actually took me a long time to grasp the impact of the GVL on application performance, especially regarding application latency. It also took the efforts of developers like John Bausier and Matt Valentine at Shopify to make the GVL much easier to observe, so thanks to John and Matt for their hard work.

00:02:06.180 All of this led me to create the GVL tracing gem, which produces a visual timeline of what the Ruby Global VM Lock is doing. This helps us learn more about it. I was excited when Matt retweeted my work, which motivated me to give this talk and share what I have learned so far on this journey. To begin, let's ask: what is the Global VM Lock, or GVL?

00:02:40.379 When we run our Ruby applications, we often call Ruby scripts without considering the Ruby VM under the hood. However, Ruby is a virtual machine, written mainly in C with some parts in Rust and Ruby. The term 'Ruby VM' may also refer to 'C Ruby' or 'MRI.' The Global VM Lock (GVL) is a mechanism that controls how multiple threads of Ruby code interact with the VM.

00:03:01.320 When we write Ruby code and create threads, these threads are mapped one-to-one to operating system threads. Although they might not show up by default, you can use your OS task manager to see the threads, confirming that for every thread in your Ruby application, there is indeed an operating system thread. The GVL is essential to ensure that multiple threads handle the VM data correctly. In simple terms, this lock mechanism allows only one thread at a time to interact with the Ruby VM, preventing potential issues.

00:03:46.479 For a clearer analogy, think of a swing: only one person can be on a swing at a time, while others need to wait for their turn. Similarly, when a thread is running Ruby code or interacting with the VM, it holds the GVL. Other threads cannot execute until the first thread releases the lock. You might have heard about the Global Interpreter Lock (GIL) in other languages, like Python. The GVL in Ruby serves a similar function; while it controls how Ruby threads access the VM, it is an internal mechanism that supports Ruby's operation. This locking mechanism can be observed in other systems as well, such as the Big Kernel Lock (BKL) in older versions of Linux. Our terminology for the GVL has evolved — it was referred to as the Global VM Lock until Ruby 3.2, when it was renamed, though the concept remains the same.

00:05:07.620 Now, moving on to what the GVL is not: it allows for concurrency, which gives the illusion that multiple things are being processed simultaneously by rapidly switching between them. Consider a ninja who switches between writing code, standing around, and planning: if done quickly, it may appear as if there are three ninjas at work, when, in reality, it's the same one ninja switching tasks. This contrasts with parallelism, where multiple tasks are genuinely done at the same time. The GVL supports concurrency but not parallelism. For example, older PC systems could only handle one task at a time, but they gave the illusion of multitasking by switching quickly between tasks.

00:06:06.180 Another question that arises is: if the GVL exists, why do we need concurrency APIs? The GVL serves to protect the Ruby VM state but does not protect your application state. If we only rely on the GVL, we can end up with unexpected results when multiple threads interact with shared data. For instance, we might see results that don’t match our expectations due to race conditions when using threads without proper synchronization.

00:07:32.220 To illustrate this concept, consider a code example where we attempt to transfer amounts from one variable to another while using threads. The result may not align with our intended outcome if we don’t use concurrency APIs such as mutexes to ensure proper locking around critical sections of the code.

00:07:47.280 A mutex acts as a lock in Ruby, ensuring only one thread can access the shared critical section at a time. By implementing a mutex in our example, we can ensure that our application behaves as intended, showing the correct outcome after executing the transfers.

00:09:01.140 It's crucial to understand that while the GVL protects the Ruby VM, it doesn't inherently protect our application logic. Therefore, we should not solely rely on the GVL. It's essential to implement appropriate concurrency tools when needed.

00:09:36.600 As a catch, if you’ve ever worked with JRuby or TruffleRuby, you might know that these high-performance Ruby implementations do not use a GVL. Instead, they implement different strategies for managing concurrency, which allows for both concurrency and parallelism. This presents a striking contrast to how MRI Ruby operates.

00:10:13.680 Now, let's shift our focus to observing the GVL itself. Starting from Ruby 3.2, the JVL Instrumentation API was introduced, which allows developers to gain insights regarding when threads in Ruby applications acquire and release the GVL. This API operates at the native level and provides callbacks for events related to GVL usage.

00:10:43.020 To make use of this API, I developed my own gem called the GVL Tracing Gem, which takes the information from the GVL Instrumentation API and creates a visual timeline. If you want to use the GVL tracing gem, it's as simple as three steps: first, add it to your gem file (note that this API works only with Ruby 3.2 and above). Second, start the gem, specify the output file, and run your code. Finally, stop the tracing. You will receive a JSON file that you can upload to the ui.perfetto.dev website for visualization.

00:11:45.780 The tool provides a line for each thread, with time flowing from left to right, showcasing the state of the GVL over time. We show various states, renamed for clarity, including started, running, and waiting.

00:12:14.580 The waiting state indicates a thread waiting for work to do, for example, while waiting for a response from a network request. The running state means that a thread is actively making progress, and one of the most critical states to note is the want GVL state, where a thread is ready to run but is waiting for the GVL.

00:12:59.220 For a practical demonstration, consider a simple method counting to a billion using two threads, each executing the same code. The visual representation will show how concurrency is functioning but not parallelism. The blue parts represent when a thread is actively running, while the waiting states are the segments where the thread is sitting idle.

00:13:38.220 As we increase the number of threads, we observe an interesting pattern: as more threads are created, they spend less time executing the code and more time waiting for their turn. This effect illustrates how adding threads can negatively impact latency when they start competing for the GVL.

00:14:31.260 To reinforce this point, let’s consider a scenario with just one thread. Even with a high degree of optimization in one particular piece of code, it may experience significant latency if it has to wait for the GVL. Consequently, this could impact user experience or performance even for what seems to be an efficient request.

00:15:25.380 In web applications, remember that the latent performance of one endpoint can be heavily influenced by the performance of others simultaneously running requests. This overlapping latency can create what is known as 'noisy neighbors' – where the latency generated by one thread adversely impacts other threads.

00:16:31.500 Another example illustrating this point involves two threads, where one performs network requests and the other acts as a timeout function included within the Ruby standard library. In this context, the network waiting periods can significantly affect overall latency, especially if threads wish to interact with the GVL at the same time.

00:17:54.540 If we look back at the network call example, we can notice patterns of waiting and execution. If that network call is slow and threads start waiting, the other will execute without hindrance until the previous one gets its turn. Interestingly enough, if a thread completes its task, the other will find the time to run without having to wait for the GVL when it is free.

00:19:25.620 As we dig deeper into scenarios like this, a question arises: is there ever an instance of unfairness in how Ruby allocates the GVL? If a thread frequently has to yield the GVL while waiting for responses, it impacts its overall execution in comparison to others in a round-robin manner, which can lead to unequal utilization.

00:20:03.420 This behavior leads to the understanding that using Ruby threads and fibers tends to enhance utilization and throughput; however, they can adversely affect latency. As a reminder, the rules for switching the GVL for other threads dictate that Ruby allocates 100 milliseconds for each thread to execute.

00:21:36.300 Once a thread blocks on an input/output operation, the GVL will get switched to another thread until the first thread is ready to run again. Furthermore, with the introduction of Reactors in Ruby 3.0, a new concurrency primitive that offers parallelism, we’ll explore its behavior.

00:22:16.020 When using Reactors, the previous examples illustrated how they function effectively. Each reactor obtains its GVL, so operations can run concurrently without conflicts. Both the top thread (waiting on network calls) and the bottom thread (performing counters) can run independently without impacting one another.

00:23:05.760 The GVL tools gem, created by Shopify, utilizes the same JVL instrumentation API as the GVL tracing gem but focuses on metrics. It provides insights into how long threads spend waiting for the GVL, which can help you to monitor latency and identify potential performance issues.

00:23:51.060 So, what should you take away from today? First, keep using threads and fibers as they work well for most applications. However, be cautious about using too many threads if your application is latency-sensitive. For instance, a starting point for a web app on a four-core machine with eight gigabytes of memory could be four workers with five threads each and then adjust as needed.

00:25:11.940 Second, separate latency-sensitive code from other processes. If you have critical endpoints that require immediate response times, consider offloading heavier tasks, such as generating large CSVs, to background jobs with tools like Sidekiq. Remember that the performance of one endpoint not only affects itself but can impact other endpoints too.

00:26:41.520 To wrap up, experiment and measure your application's performance carefully. Tools like profiling and benchmarking in realistic setups are vital to understanding how your code performs under different conditions. The JVL tools gem can aid in monitoring the GVL's effect on your application, while the GVL tracing gem will help you to visualize what's happening in real-time.

00:27:38.580 Additionally, consider testing web servers like Puma or Unicorn under a setup of one thread per process and evaluate their impact on latency. Finally, remember that alternative Ruby implementations such as JRuby and Truffle Ruby, which do not rely on the GVL, can provide both performance and latency benefits.

00:28:43.920 In conclusion, the future of Ruby looks bright, with continuous improvements and innovations. Recent advancements, such as the introduction of reactors and ongoing discussions about modifying thread mappings, signal a robust trajectory for future enhancements. Thank you for attending my talk; please feel free to reach out via email or on Twitter if you have any questions. Don’t forget that you can also explore the Perfetto visualization tool; you don’t need to run anything yourself, just follow the links provided. Once again, thank you!