RubyKaigi Takeout 2021

Why Ruby's JIT was slow

Japanese: https://youtu.be/rE5OucBHm18

In Ruby 2.6, we started to use a JIT compiler architecture called "MJIT", which uses a C compiler to generate native code. While it achieved Ruby 3x3 in one benchmark, we had struggled to optimize web application workloads like Rails with MJIT. The good news is we recently figured out why.

In this talk, you will hear how JIT architectures impact various benchmarks differently, and why it matters for you. You may or may not benefit from Ruby's JIT, depending on what JIT architecture we'll choose beyond the current MJIT. Let's discuss which direction we'd like to go.

RubyKaigi Takeout 2021: https://rubykaigi.org/2021-takeout/presentations/k0kubun.html

RubyKaigi Takeout 2021

00:00:00.960 Hello everyone! Today I'm going to talk about why Ruby's JIT was slow.
00:00:03.360 Let me introduce myself first. My name is Takashi Kokubun, and on the internet, I use an alias called Kokubun, where the first 'o' is actually a zero.
00:00:09.519 I'm also working as a Ruby committer in my spare time. While I originally became a Ruby committer as an ERB maintainer, these days I work on the JIT compiler and some other features like IRB coloring errors, command shortcuts, and the short source command which are set to be introduced in Ruby 3.1.
00:00:20.640 I also work as a treasurer data employee, focusing on storage services in our platform. A brief memory of my first Ruby-related idea dates back to 2015 when we presented the high-performance hammer implementation. Since then, we've merged newer versions of the hammer implementation into the core, and that's currently in the main branch. It hasn't been released yet, so I'm not sure if it's going to be part of Ruby 6, but that's the current plan.
00:00:47.200 Now, let’s dive into the main part: why Ruby's JIT was slow. In the Ruby 3.0 release, we claimed that we achieved Ruby 3x3, which means making Ruby 3 performance three times faster than Ruby 2.0. We kind of accomplished this in a benchmark called OptCarrot, which is a NES emulator, but it doesn’t represent real-world workloads like web applications. We acknowledged that we weren't quite ready for optimizing workloads like Rails, and I was kind of sad that this was the case because my primary usage of Ruby is for web applications.
00:01:24.840 I wanted to make Rails faster by simply enabling the JIT compiler, but unfortunately, that didn't happen in the Ruby 3.0 release. If you examine the performance benchmarks of the JIT compiler across various versions in the default configuration, you'll see that the compiler's performance was actually slower than the virtual machine. In the corresponding graph, if it doesn't reach 1.0, it indicates that the compiler's performance was lower than that of the virtual machine.
00:01:45.919 On the other hand, if you adjust the configuration a bit—for example, by setting the max cache to 10,000—then the performance can significantly improve. In Ruby 3.0, this change can lead to a performance increase of about 5% over the virtual machine. Thus, we found that Ruby 3.0 introduced a good optimization, making it possible to enhance performance by enabling the JIT compiler—provided users are careful about their configuration settings.
00:02:12.720 The default configuration of Ruby 3.0 limited the JIT compiler to using only 100 methods at most, which was insufficient for many applications. However, if you increase that to 1,000 methods, it can actually make the rest of your application run faster. So, we are planning to make this adjustment the default in Ruby 3.1.
00:02:41.599 This naturally leads us to the question: why was Ruby's performance slow with the default configuration? Even with the original setting of 10,000 methods, it was still slow in previous versions. The root of the issue lies in the architecture of the JIT compiler, which was contributing to slower performance.
00:03:02.720 The MJIT uses a C compiler and has to allocate memory for compiled code. Each method is compiled as a shared library, and these libraries are often not stored in an adjacent manner within memory. As a result, if you compile, say, 100 methods, those methods won't share much of the code that the virtual machine utilizes. This leads to a heightened cache pressure.
00:03:36.320 In Ruby 3.0, we addressed this by ensuring that the same code is shared across different methods, which ultimately prevents unnecessary duplicated code in memory. With the improved cache efficiency, we need to combine everything to avoid thrashing the cache between the virtual machine and the JIT compiler. However, there are still drawbacks to the architecture, even if we resolve the cache issues. For example, the compilation process itself remains slow; benchmarks showed that I had to wait for five minutes to execute a single benchmark when using Rails. This prolonged compilation time is not only a hassle during the development phase, but also presents problems in production applications.
00:04:39.520 Specifically, if you want to optimize a Rails application’s performance using compiler optimizations, the lengthy warm-up time hinders overall application performance. Additionally, during warm-up, if the GCC (GNU Compiler Collection) is running, it pressures both the CPU and memory, leading to delays that affect the primary Ruby threads.
00:05:09.080 The compilation speed is sluggish, and warm-up is similarly slow, which results in the overall performance being negatively impacted. Even though the intent of the JIT compiler is to enhance performance, the reality is quite the opposite due to these issues. Another factor is the use of position-independent code (PIC) in the architecture, which inherently makes the generated code slower. This relates to how shared libraries need to accommodate dynamic addressing.
00:05:49.120 Therefore, the choice of JIT architecture greatly matters to Ruby users. For instance, if warming up an application takes around five minutes, it changes how you manage and operate that application. During peak times, if you need to service instances that require optimal performance, you may have to wait for those lengthy warm-up processes before responding to requests.
00:06:34.800 This issue can also affect applications that require a quick turnaround for executing compiler options. For example, we wanted to introduce this architecture for a competitive programming website called HackerRank, but it was rejected because using SSG (Static Site Generation) slowed down the execution of competitive programming, which has strict time constraints. In a scenario where you're limited to two seconds, you can't afford the overhead created by the current JIT compiler.
00:08:09.440 The MGIT architecture has two significant problems: the time it takes to halt the existing thread for the current compilation can lead to delays of hundreds of milliseconds, which is unacceptable in competitive scenarios. Additionally, during compilation, the Ruby main method slows down, adversely affecting performance.
00:09:11.839 There’s also a concern over the development of default revisions. By keeping the main payload's default settings without adjustments, you can significantly prolong the compilation time, which is counterproductive—especially when time limits are tight. The emphasis on JIT compilers within dynamic languages is essential. Developers in the Ruby community, such as Julian, express concerns that without robust JIT solutions, dynamic languages will struggle to compete. A currently popular dynamic language, Python, for example, will have a competitive edge without effective JIT optimizations.
00:10:29.279 In conclusion, to maintain our relevance in web applications and other areas, we need to enhance the Ruby compiler performance. It’s essential that we make choices regarding the architecture that allow sustained improvements and ensure we do not fall behind alternatives like Python. Additionally, there's notable collaboration happening between various implementations of JIT compilers where we can learn from one another. The goal is to contribute collectively, leveraging experience from both MGIT and YJIT.
00:11:20.799 As we move forward, we need a balanced approach towards the architecture of Ruby's JIT to ensure better performance. Thank you for listening to my talk today, and I hope this discussion helps guide us towards a more efficient Ruby compiler architecture.