RubyKaigi 2024

Breaking the Ruby Performance Barrier

RubyKaigi 2024

00:00:19.000 Hello everyone, my name is Maxime Chevalier Boisvert.
00:00:21.279 I joined Shopify in 2020, and I've been working as a tech lead on the WET project. This project was started in 2020 to build a new JIT compiler inside Ruby to make Ruby significantly faster.
00:00:28.560 The goal is to maintain 100% compatibility. It was built at Shopify but is fully open-source, with significant contributions from GitHub. We try to take a data-driven approach to optimization, which involves benchmarking frequently and gathering detailed metrics.
00:00:41.640 It's a team effort. I'm lucky enough to work with some of the best engineers in the world, and we have had many contributors to the project over the years.
00:01:07.119 In this talk, I would like to go over some quick WET news updates, talk about the history of supersonic flights, discuss the traditional ways of making Ruby code faster, why we might need a new approach, and also talk about Proto Buff, which is our pure Ruby implementation.
00:01:12.360 In WET news, Ruby 3.3 includes the third release of WET. WET was first introduced in Ruby as part of Ruby 3.1, with marked performance improvements in versions 3.2 and 3.3. Our main goal with Ruby 3.3 was to ensure that it was better tuned for production deployments out of the box.
00:01:36.720 In previous versions, we often received feedback that for some users, WET was not making their code faster, leading them to adjust various command line parameters. With Ruby 3.3, we aimed to use less memory and to ship with better defaults.
00:01:54.960 Additionally, we had a significant push to fix as many bugs as possible before the release. I believe this release has been very successful as we did not receive much feedback after the release; no complaints were reported, which is generally a good sign. We received positive feedback regarding speedups in the wild.
00:02:19.760 WET has been enabled on RubyGems, resulting in speedups of up to 20%. It has also been enabled on Mastodon social platforms, achieving impressive performance gains. Moreover, earlier this year, GitHub quietly deployed Ruby 3.3, reporting approximately a 15% performance improvement with a very smooth deployment. If you visited GitHub.com today, you might have experienced the benefits.
00:02:49.159 We also received recognition from DHH, the co-founder of Rails, who has been fully supportive of our work. In future versions of Rails, WET will be enabled by default.
00:03:05.280 Furthermore, the CEO of Shopify mentioned that hostfastyet.com ranks Shopify as the fastest hosting provider, with reports indicating a 20% speedup in response times, which likely contributes to this ranking.
00:03:22.239 Now, let me start by talking about the history of supersonic flight. In 1903, the Wright brothers achieved the first successful powered flights with the Wright Flyer, reaching approximately 48 km/h.
00:03:37.400 Just over a decade later in Great Britain, the SE4 had a top speed of 217 km/h. This marked a rapid progression in aviation speed. By the beginning of World War II, Britain had the P-51 Mustang with a speed of 690 km/h.
00:04:08.320 During World War II, there was competitive pressure to build faster airplanes since faster fighter planes are better in dogfights, and faster bombers can evade fire better.
00:04:25.760 At the end of World War II, we also saw the first jet-powered flyers introduced on both sides, from Germany and the UK. Due to this competitive pressure to build faster aircraft, scientists found that traditional propellers lose efficiency as they approach the speed of sound, which poses a limit on how fast propeller-driven aircraft can go.
00:04:51.040 However, jet engines removed that limitation, and towards the end of World War II, fighter pilots occasionally experienced strange incidents when they accidentally broke the sound barrier.
00:05:06.960 Unfortunately, airflow behaves differently at supersonic speeds. Aircraft not designed for supersonic flight tend to shake violently as they approach the transonic region, and control surfaces can become reversed, making it easy for pilots to lose control.
00:05:27.400 As we approach the speed of sound, drag rises sharply, requiring significantly more thrust to push through the transonic region and achieve supersonic speeds.
00:05:46.960 Therefore, to build supersonic aircraft, we need to rethink aircraft design, which demands more thrust to transition through the transonic region, smaller aircraft wings to reduce aerodynamic drag, and also deal with heating due to air compression.
00:06:05.960 The first aircraft to achieve supersonic flight was the Bell X-1, a rocket-propelled experimental aircraft, flown in 1947 by Chuck Yeager. Because the X-1 was rocket-powered, it could not fly for very long distances and had to be airlifted by a modified B-29 Superfortress. Chuck Yeager ultimately became a general and lived to be 97 years old.
00:06:42.680 When we look at modern passenger aircraft, like the 787 Dreamliner, we can see that there have been many incremental improvements made to subsonic aircraft. However, the airframe still superficially resembles older aircraft like the Douglas DC-9, which first flew in 1965.
00:07:06.240 Building supersonic aircraft, on the other hand, involves a different set of constraints that demand radically different designs. Just 17 years after the first supersonic flight, the SR-71 Blackbird was developed, capable of flying three times the speed of sound.
00:07:24.280 At this moment, you might be wondering, 'This is RubyKaigi. When are you going to talk about Ruby?' I will, but first, let me discuss Python.
00:07:41.199 There are two main assumptions about performance in Python: firstly, that Python is a slow interpreted language, and secondly, that C is a fast compiled language. In the Python ecosystem, you're never allowed to complain about Python's speed.
00:08:01.000 Whenever you express that Python is too slow, people suggest that you rewrite the slow sections of your code in C, since Python is built for expressiveness. For speed, you could turn to languages like C, C++, or Go, or other system languages.
00:08:24.560 However, I believe this is not a good trade-off. Rust has a similar issue, leading to a catch-22 situation where the more code we write in C, the less we can optimize. It may be time to reconsider how we operate.
00:08:48.680 WET is the engine pushing Ruby closer to the speed of C. As Ruby gets faster, the cost of calling C functions and transferring data back and forth between C and Ruby increases, leading to potential inefficiencies.
00:09:04.000 Therefore, as Ruby code increases in speed, maintaining C extensions becomes less appealing. Consequently, we are approaching a limit when optimizing Ruby performance with WET.
00:09:24.560 While we can make Ruby code run faster with WET, we cannot optimize C, and calls between C and Ruby remain relatively slow compared to the machine code generated by WET.
00:09:44.720 So, how can we reach Ruby's maximum performance potential? Would it be possible to write more gems in pure Ruby, and do we have sufficient tools in Ruby to accomplish what C does as effectively?
00:10:04.680 One crucial area to examine is why people write C extensions. There are three main reasons: first, to interface with external I/O APIs not available in Ruby; second, to enhance performance for tasks such as number crunching or matrix multiplication; and third, to interface with specific native libraries.
00:10:25.000 At Shopify, as we work in a web environment, we frequently utilize libraries for parsing and serialization. Let's discuss writing pure Ruby gems.
00:10:54.320 The Redis client gem has two drivers: one is a native C library binding (high-Rus), and the other is a pure Ruby implementation. My colleagues Aaron Patterson and John Busier have worked to improve the pure Ruby driver and found that with WET enabled, the new Ruby driver performs on par with the Native C extension.
00:11:10.760 We believe there is still room for improvement in the performance of this driver. For instance, the C API allows for pre-allocating hashes with the correct size, while this capability is currently missing in Ruby.
00:11:36.240 If we could incorporate these capabilities into Ruby, the pure Ruby driver could perform even better. Another example is the GraphQL gem, which is widely used at Shopify and across the industry.
00:11:54.240 The default GraphQL parser relies on Rack, which is relatively inefficient due to numerous calls between Ruby and C. However, Aaron Patterson has written a pure Ruby GraphQL parser that, with WET enabled, could outperform the native C extension.
00:12:11.560 You can find a detailed blog post on the Rails blog if you're interested in more information.
00:12:28.399 The third topic I want to discuss is Proto Buff, a binary serialization protocol developed by Google. At Shopify, we use it for serializing and deserializing various data. We are also adopting Twirp as our internal RPC protocol based on Proto Buff.
00:12:51.920 Unfortunately, Google's Proto Buff can be a cumbersome dependency, often leading to problems during Ruby upgrades, high memory usage, and memory leaks or crashes.
00:13:07.440 Thus, we investigated whether it was possible to create a pure Ruby implementation that performs just as well as Google's native gem. This led to the development of Proto Buff, our pure Ruby implementation of the Protobuf spec.
00:13:41.280 At this point, it is still experimental, serving as a proof of concept. However, it has many advantages: it is easier to debug and maintain, accessible to Rubyists, and highly portable across any Ruby environment.
00:14:02.080 There are no system package dependencies, and it is not tied to any specific Ruby ABI version, generating self-contained code without constraints.
00:14:23.920 Let’s explore what Proto does well. Like Tiny gql and RClient, it handles parsing and serialization. It operates over byte streams and performs low-level bitwise operations to manipulate bytes while allocating objects in memory, all of which are straightforward in Ruby.
00:14:43.720 The design is such that it interprets Proto files to generate self-contained Ruby code for encoding and decoding Proto Buff streams. We've engineered the generated code with WET in mind so that it works optimally within that framework.
00:15:14.560 When we initially benchmarked our Proto Buff against Google's library, we found it to be about 7.43 times slower, which was disappointing. However, we learned that Google's Proto Buff library parses streams lazily.
00:15:42.360 If you don't read any fields, it won't parse them. Therefore, when benchmarking decoding and reading fields between Google's Proto Buff and our Pure Ruby library, we found that we are actually 2.36 times faster.
00:16:13.200 Even more surprisingly, when WET is enabled, we achieve nearly 9.45 times faster performance—an outcome that astonished us.
00:16:56.639 However, I must note some caveats: while our implementation is currently much faster than Google's library at decoding, it is slower at encoding, chiefly due to a lack of optimization in the encoding logic as of now.
00:17:22.560 We are indeed considering using it internally at Shopify, although it is currently available at your own risk, as it carries zero support.
00:17:44.640 That being said, if you do encounter issues, we will strive to address them quickly. We developed Proto Buff to examine what optimizations we could implement using WET to enhance its performance and the feasibility of creating a faster pure Ruby Proto Buff.
00:18:11.760 We discovered that optimizing the performance was not overly challenging. We approached the problem by generating Ruby code optimized for WET and adding some enhancements to WET to ensure the generated Ruby code executes faster.
00:18:42.480 Lastly, is enhancing WET to make Proto Buff faster 'cheating'? Not quite! The optimizations we integrated are beneficial for all code utilizing core methods, including string get byte and string set byte.
00:19:01.440 With the optimizations included in WET 3.4, we have been able to run Proto Buff approximately 14.5% faster.
00:19:22.320 This is some of the code we generate to parse Proto Buff. It's somewhat convoluted, but we designed the code with WET's strengths and weaknesses in mind, ensuring that WET will execute it as swiftly as possible.
00:19:48.410 While we wouldn’t recommend that you write your Ruby code this way, it is acceptable for generated code.
00:20:06.320 Looking forward, there are still numerous areas needing improvement, particularly challenges related to low-level operations in Ruby. Ruby has useful string methods that often necessitate allocating small strings, which can hinder performance.
00:20:28.480 Similarly, buffer reads that are integral for parsing often involve string allocation. Ruby has IOB buffer oriented towards binary protocols, yet these and string IIO haven't adequately catered to our needs.
00:20:54.560 We are actively working on solutions to address these limitations, including the reset of binary encodings through strings. We also lack some APIs necessary for allocating hashes, strings, and arrays with certain capacities.
00:21:12.240 Solving these issues could significantly enhance Ruby's utility for writing pure Ruby gems and improve overall performance in those scenarios.
00:21:37.440 To conclude, there has traditionally been a drive to write performance-critical code in C. However, with WET, we are entering a new paradigm.
00:21:50.720 As Ruby's speed continues to improve, the dynamics of writing low-level code change, and maintaining C code or native dependencies can become burdensome. It may expose developers to fragile system dependencies, leading to difficulties with debugging.
00:22:07.840 In some cases, we may indeed develop pure Ruby gems that rival C performance. Therefore, to leverage WET effectively, we need to nurture our Ruby capabilities and strive to refine Ruby for low-level code in the future.
00:22:30.080 The WET team is currently working on WET 3.4, where we are already seeing noteworthy performance enhancements.
00:22:49.600 Additionally, we are focusing on enhancing the quality of the generated code and improving quality of life regarding performance analysis in production.
00:23:07.920 With WET 3.4, you can expect better performance across the board, comparable memory usage, and a more thoroughly tested and stable WET implementation.
00:23:22.640 Lastly, if you're inclined, you can experiment with Ruby master and provide your feedback to us.
00:23:36.320 Finally, I should mention that we are also striving to enhance the performance of RSpec and other tools.
00:23:59.200 Thank you for listening. If you wish to learn more about WET, follow our work on the Rails blog, check out the WET readme, or feel free to reach out to me via email or Twitter.
00:24:02.600 After the talk, I am open to discussions and questions.
00:24:12.799 Thank you very much!