Keynote: In defence of GVL

00:00:11.840 Hi, this is Yukihiro Matsumoto, the creator of the Ruby programming language. I am here for the keynote at Euruko 2023.

00:00:19.279 Unfortunately, I am not able to attend in person, mostly due to the ongoing pandemic. However, maybe next year I can visit you physically.

00:00:34.719 This year, the title of the keynote is 'In Defense of GVL.' GVL stands for Global Virtual Machine Lock. Recently, the Python community accepted a proposal named PEP 703, which makes the Global Interpreter Lock optional.

00:00:51.399 The Global Interpreter Lock (GIL) is a mechanism that restricts execution to one thread at a time in the virtual machine. This lock is often referred to simply as 'GIL' and serves the purpose of ensuring concurrency and thread safety.

00:01:28.320 Let me first explain what the GIL is and the need for it. To achieve thread safety in complex programs, such as virtual machines, shared mutable data must be protected. Without proper protection, race conditions can occur. Thus, every read and write operation on such data must be exclusive.

00:01:58.399 As you may know, developing complex programs like virtual machines and interpreters that are thread-safe is far more difficult than it seems. We often encounter Heisenbugs, which are software bugs that seem to disappear or change their behavior when one attempts to study them, as noted by Wikipedia. The language runtimes are complex, typically consisting of hundreds of thousands of lines of code.

00:03:07.360 In Ruby, we refer to the global virtual machine lock as GVL. The GVL allows only one thread to execute at a time, making it thread-safe. However, this raises challenges with the advent of multicore processors.

00:03:45.720 Ruby was created in 1993, around the same time Python was first released in 1991. Thirty years ago, most computers were single-core, and threads were implemented using time-slicing techniques. This meant that threads were not primarily used to improve performance but rather for certain concurrent architectures, such as producer-consumer models. During that time, the GIL was not a problem because of the time-slicing implementation.

00:04:43.080 As technology has advanced, however, multicore processors have become standard. The improvements in hardware over the past 30 years include a dramatic increase in memory size and computational power. Memory size has improved by 88,000 times, storage capacity by 6,400 times, and clock speeds have increased by 300 times.

00:05:05.519 This rapid advancement has been largely driven by Moore's Law. Moore's Law states that the number of transistors on an integrated circuit doubles approximately every two years. This exponential growth has made computers faster and cheaper, making them more accessible.

00:05:37.680 However, in the last five to ten years, we have encountered limitations to Moore's Law due to two key factors: quantum physics and heat density. Currently, the heat density of CPUs can exceed 200 degrees Celsius, which makes cooling them effectively quite challenging.

00:07:00.400 As a result, rather than creating fast single-core processors, manufacturers have begun to implement smaller cores distributed across a chip to manage heat better. This necessitates a move toward parallelism in computing.

00:07:42.640 Parallelism, however, poses its own challenges. In Japan, we sometimes struggle with pronouncing the L and R sounds, and parallelism is one of those more difficult words. Nevertheless, physical concurrency is essential for improving performance in software development.

00:08:21.320 As software developers, we often use threads to enhance performance. However, the GIL presents a barrier to effectively utilizing parallelism, which has led many developers to criticize its existence. There are calls from various developers within the community to remove the GIL.

00:09:28.320 If we are to abandon the GIL, we need an alternative strategy known as fine-grained locking. Fine-grained locking involves locking and unlocking around each data access to ensure exclusivity. However, this approach presents two significant problems.

00:10:22.840 First, if you forget to protect even one data access, you may end up encountering serious bugs, such as Heisenbugs, which are notoriously difficult to diagnose. This issue is theoretically solvable, as long as you can ensure that every access is protected.

00:11:37.480 The second problem is even more serious: implementing fine-grained locks can lead to increased complexity, higher memory consumption, and slower performance for the software. Both the Ruby community and Python community have tried, experimented, and ultimately abandoned the idea of fine-grained locking due to these drawbacks, even though Python's calls to remove the GIL have grown in popularity.

00:12:54.480 In Python, the process for removing the GIL has been documented in a proposal known as PEP 703. I admire the Python community for their transparency and thorough documentation, which is an area where the Ruby community can improve.

00:14:12.320 The strategies discussed in PEP 703 for removing the GIL include improving reference counting, introducing fine-grained locks, and other enhancements. However, Python's garbage collection is more complex than Ruby’s due to its reliance on reference counting.

00:15:31.160 Python's virtual machine maintains a reference count for each object, which can be altered whenever a reference is assigned. This process of maintaining a reference count can be a shared mutable data issue, resulting in difficulties for parallelism.

00:16:47.680 To address the reference counting challenges, Python has proposed three strategies. The first is 'immortalization,' where reference counting is avoided for certain long-lived objects. The second is 'biased reference counting,' which separates reference counting for access from the same thread and from other threads. Lastly is the 'direct reference counting' method, which updates the reference count at the end of a function to minimize calculation overhead.

00:18:54.400 By applying these strategies, the Python community aims to alleviate the issues caused by reference counting without the need for the GIL. They also experiment with optimistic locks, which assume that race conditions are rare, thus reducing the need for strict locking mechanisms for shared data.

00:20:52.320 That said, structures such as lists and dictionaries require traditional fine-grained locks since optimistic locks may not apply to these types of containers. Overall, the changes described in PEP 703 require substantial effort, estimated at over five years, to implement fully.

00:22:50.360 Despite acknowledging the enormity of the challenges ahead, there remain doubts about the feasibility of PEP 703 in the real world, particularly concerning how much can actually be improved by removing the GIL. Developers also worry about backwards compatibility issues arising from such drastic changes.

00:24:44.000 In the Ruby community, Ruby is often used for web applications, an area where GVL's impact is less pronounced because HTTP requests are stateless, independent threads that can be multiplexed through server architectures. Additionally, most Ruby applications are I/O bound, meaning that the bottlenecks typically lie outside of CPU processing. Therefore, many Ruby threads release the GVL during blocking I/O operations.

00:26:47.840 This allows other threads to be scheduled concurrently, which significantly mitigates concerns over CPU-bound performance. Because of this, removing the GIL is not as critical for Ruby applications as it is for Python. In the Python community, there is significant motivation to remove the GIL, driven by computational demands, particularly in machine learning.

00:28:31.160 In Ruby, we introduced actors, known as Ractors, in Ruby 3 to help address CPU-bound tasks. Each Ractor operates with its thread, and multiple Ractors can execute concurrently without GS. This way, we can carry out CPU-intensive tasks, as demonstrated by the example of the 'try' function, which can run in parallel across multiple Ractors.

00:30:23.480 Experimentation has shown that the parallel version of a simple benchmarking program can execute nearly four times faster than its sequential counterpart. Thus, we embrace Ractors in Ruby for handling CPU-bound tasks, maintaining the GVL for now, while allowing for multithreading.

00:32:30.000 In summary, while we may consider the future reduction or removal of the GIL, such changes are not currently prioritized within the Ruby community, especially given that we can effectively utilize multiple cores even with the GVL.

00:34:17.680 We continue to develop Ruby to make it enjoyable for programmers, and I encourage everyone to embrace happy programming. Thank you for your attention, and enjoy the rest of the conference. This talk was sponsored by NSL Company in Japan, which has supported my work for the past 26 years, as well as USS Vision, a new consulting firm I founded. I also want to extend my gratitude to the GitHub Sponsors and the Ruby community for their ongoing support.

00:35:19.000 Thank you, and goodbye.