00:00:19.000
Hello everyone, my name is Maxime Chevalier Boisvert.
00:00:21.279
I joined Shopify in 2020, and I've been working as a tech lead on the WET project. This project was started in 2020 to build a new JIT compiler inside Ruby to make Ruby significantly faster.
00:00:28.560
The goal is to maintain 100% compatibility. It was built at Shopify but is fully open-source, with significant contributions from GitHub. We try to take a data-driven approach to optimization, which involves benchmarking frequently and gathering detailed metrics.
00:00:41.640
It's a team effort. I'm lucky enough to work with some of the best engineers in the world, and we have had many contributors to the project over the years.
00:01:07.119
In this talk, I would like to go over some quick WET news updates, talk about the history of supersonic flights, discuss the traditional ways of making Ruby code faster, why we might need a new approach, and also talk about Proto Buff, which is our pure Ruby implementation.
00:01:12.360
In WET news, Ruby 3.3 includes the third release of WET. WET was first introduced in Ruby as part of Ruby 3.1, with marked performance improvements in versions 3.2 and 3.3. Our main goal with Ruby 3.3 was to ensure that it was better tuned for production deployments out of the box.
00:01:36.720
In previous versions, we often received feedback that for some users, WET was not making their code faster, leading them to adjust various command line parameters. With Ruby 3.3, we aimed to use less memory and to ship with better defaults.
00:01:54.960
Additionally, we had a significant push to fix as many bugs as possible before the release. I believe this release has been very successful as we did not receive much feedback after the release; no complaints were reported, which is generally a good sign. We received positive feedback regarding speedups in the wild.
00:02:19.760
WET has been enabled on RubyGems, resulting in speedups of up to 20%. It has also been enabled on Mastodon social platforms, achieving impressive performance gains. Moreover, earlier this year, GitHub quietly deployed Ruby 3.3, reporting approximately a 15% performance improvement with a very smooth deployment. If you visited GitHub.com today, you might have experienced the benefits.
00:02:49.159
We also received recognition from DHH, the co-founder of Rails, who has been fully supportive of our work. In future versions of Rails, WET will be enabled by default.
00:03:05.280
Furthermore, the CEO of Shopify mentioned that hostfastyet.com ranks Shopify as the fastest hosting provider, with reports indicating a 20% speedup in response times, which likely contributes to this ranking.
00:03:22.239
Now, let me start by talking about the history of supersonic flight. In 1903, the Wright brothers achieved the first successful powered flights with the Wright Flyer, reaching approximately 48 km/h.
00:03:37.400
Just over a decade later in Great Britain, the SE4 had a top speed of 217 km/h. This marked a rapid progression in aviation speed. By the beginning of World War II, Britain had the P-51 Mustang with a speed of 690 km/h.
00:04:08.320
During World War II, there was competitive pressure to build faster airplanes since faster fighter planes are better in dogfights, and faster bombers can evade fire better.
00:04:25.760
At the end of World War II, we also saw the first jet-powered flyers introduced on both sides, from Germany and the UK. Due to this competitive pressure to build faster aircraft, scientists found that traditional propellers lose efficiency as they approach the speed of sound, which poses a limit on how fast propeller-driven aircraft can go.
00:04:51.040
However, jet engines removed that limitation, and towards the end of World War II, fighter pilots occasionally experienced strange incidents when they accidentally broke the sound barrier.
00:05:06.960
Unfortunately, airflow behaves differently at supersonic speeds. Aircraft not designed for supersonic flight tend to shake violently as they approach the transonic region, and control surfaces can become reversed, making it easy for pilots to lose control.
00:05:27.400
As we approach the speed of sound, drag rises sharply, requiring significantly more thrust to push through the transonic region and achieve supersonic speeds.
00:05:46.960
Therefore, to build supersonic aircraft, we need to rethink aircraft design, which demands more thrust to transition through the transonic region, smaller aircraft wings to reduce aerodynamic drag, and also deal with heating due to air compression.
00:06:05.960
The first aircraft to achieve supersonic flight was the Bell X-1, a rocket-propelled experimental aircraft, flown in 1947 by Chuck Yeager. Because the X-1 was rocket-powered, it could not fly for very long distances and had to be airlifted by a modified B-29 Superfortress. Chuck Yeager ultimately became a general and lived to be 97 years old.
00:06:42.680
When we look at modern passenger aircraft, like the 787 Dreamliner, we can see that there have been many incremental improvements made to subsonic aircraft. However, the airframe still superficially resembles older aircraft like the Douglas DC-9, which first flew in 1965.
00:07:06.240
Building supersonic aircraft, on the other hand, involves a different set of constraints that demand radically different designs. Just 17 years after the first supersonic flight, the SR-71 Blackbird was developed, capable of flying three times the speed of sound.
00:07:24.280
At this moment, you might be wondering, 'This is RubyKaigi. When are you going to talk about Ruby?' I will, but first, let me discuss Python.
00:07:41.199
There are two main assumptions about performance in Python: firstly, that Python is a slow interpreted language, and secondly, that C is a fast compiled language. In the Python ecosystem, you're never allowed to complain about Python's speed.
00:08:01.000
Whenever you express that Python is too slow, people suggest that you rewrite the slow sections of your code in C, since Python is built for expressiveness. For speed, you could turn to languages like C, C++, or Go, or other system languages.
00:08:24.560
However, I believe this is not a good trade-off. Rust has a similar issue, leading to a catch-22 situation where the more code we write in C, the less we can optimize. It may be time to reconsider how we operate.
00:08:48.680
WET is the engine pushing Ruby closer to the speed of C. As Ruby gets faster, the cost of calling C functions and transferring data back and forth between C and Ruby increases, leading to potential inefficiencies.
00:09:04.000
Therefore, as Ruby code increases in speed, maintaining C extensions becomes less appealing. Consequently, we are approaching a limit when optimizing Ruby performance with WET.
00:09:24.560
While we can make Ruby code run faster with WET, we cannot optimize C, and calls between C and Ruby remain relatively slow compared to the machine code generated by WET.
00:09:44.720
So, how can we reach Ruby's maximum performance potential? Would it be possible to write more gems in pure Ruby, and do we have sufficient tools in Ruby to accomplish what C does as effectively?
00:10:04.680
One crucial area to examine is why people write C extensions. There are three main reasons: first, to interface with external I/O APIs not available in Ruby; second, to enhance performance for tasks such as number crunching or matrix multiplication; and third, to interface with specific native libraries.
00:10:25.000
At Shopify, as we work in a web environment, we frequently utilize libraries for parsing and serialization. Let's discuss writing pure Ruby gems.
00:10:54.320
The Redis client gem has two drivers: one is a native C library binding (high-Rus), and the other is a pure Ruby implementation. My colleagues Aaron Patterson and John Busier have worked to improve the pure Ruby driver and found that with WET enabled, the new Ruby driver performs on par with the Native C extension.
00:11:10.760
We believe there is still room for improvement in the performance of this driver. For instance, the C API allows for pre-allocating hashes with the correct size, while this capability is currently missing in Ruby.
00:11:36.240
If we could incorporate these capabilities into Ruby, the pure Ruby driver could perform even better. Another example is the GraphQL gem, which is widely used at Shopify and across the industry.
00:11:54.240
The default GraphQL parser relies on Rack, which is relatively inefficient due to numerous calls between Ruby and C. However, Aaron Patterson has written a pure Ruby GraphQL parser that, with WET enabled, could outperform the native C extension.
00:12:11.560
You can find a detailed blog post on the Rails blog if you're interested in more information.
00:12:28.399
The third topic I want to discuss is Proto Buff, a binary serialization protocol developed by Google. At Shopify, we use it for serializing and deserializing various data. We are also adopting Twirp as our internal RPC protocol based on Proto Buff.
00:12:51.920
Unfortunately, Google's Proto Buff can be a cumbersome dependency, often leading to problems during Ruby upgrades, high memory usage, and memory leaks or crashes.
00:13:07.440
Thus, we investigated whether it was possible to create a pure Ruby implementation that performs just as well as Google's native gem. This led to the development of Proto Buff, our pure Ruby implementation of the Protobuf spec.
00:13:41.280
At this point, it is still experimental, serving as a proof of concept. However, it has many advantages: it is easier to debug and maintain, accessible to Rubyists, and highly portable across any Ruby environment.
00:14:02.080
There are no system package dependencies, and it is not tied to any specific Ruby ABI version, generating self-contained code without constraints.
00:14:23.920
Let’s explore what Proto does well. Like Tiny gql and RClient, it handles parsing and serialization. It operates over byte streams and performs low-level bitwise operations to manipulate bytes while allocating objects in memory, all of which are straightforward in Ruby.
00:14:43.720
The design is such that it interprets Proto files to generate self-contained Ruby code for encoding and decoding Proto Buff streams. We've engineered the generated code with WET in mind so that it works optimally within that framework.
00:15:14.560
When we initially benchmarked our Proto Buff against Google's library, we found it to be about 7.43 times slower, which was disappointing. However, we learned that Google's Proto Buff library parses streams lazily.
00:15:42.360
If you don't read any fields, it won't parse them. Therefore, when benchmarking decoding and reading fields between Google's Proto Buff and our Pure Ruby library, we found that we are actually 2.36 times faster.
00:16:13.200
Even more surprisingly, when WET is enabled, we achieve nearly 9.45 times faster performance—an outcome that astonished us.
00:16:56.639
However, I must note some caveats: while our implementation is currently much faster than Google's library at decoding, it is slower at encoding, chiefly due to a lack of optimization in the encoding logic as of now.
00:17:22.560
We are indeed considering using it internally at Shopify, although it is currently available at your own risk, as it carries zero support.
00:17:44.640
That being said, if you do encounter issues, we will strive to address them quickly. We developed Proto Buff to examine what optimizations we could implement using WET to enhance its performance and the feasibility of creating a faster pure Ruby Proto Buff.
00:18:11.760
We discovered that optimizing the performance was not overly challenging. We approached the problem by generating Ruby code optimized for WET and adding some enhancements to WET to ensure the generated Ruby code executes faster.
00:18:42.480
Lastly, is enhancing WET to make Proto Buff faster 'cheating'? Not quite! The optimizations we integrated are beneficial for all code utilizing core methods, including string get byte and string set byte.
00:19:01.440
With the optimizations included in WET 3.4, we have been able to run Proto Buff approximately 14.5% faster.
00:19:22.320
This is some of the code we generate to parse Proto Buff. It's somewhat convoluted, but we designed the code with WET's strengths and weaknesses in mind, ensuring that WET will execute it as swiftly as possible.
00:19:48.410
While we wouldn’t recommend that you write your Ruby code this way, it is acceptable for generated code.
00:20:06.320
Looking forward, there are still numerous areas needing improvement, particularly challenges related to low-level operations in Ruby. Ruby has useful string methods that often necessitate allocating small strings, which can hinder performance.
00:20:28.480
Similarly, buffer reads that are integral for parsing often involve string allocation. Ruby has IOB buffer oriented towards binary protocols, yet these and string IIO haven't adequately catered to our needs.
00:20:54.560
We are actively working on solutions to address these limitations, including the reset of binary encodings through strings. We also lack some APIs necessary for allocating hashes, strings, and arrays with certain capacities.
00:21:12.240
Solving these issues could significantly enhance Ruby's utility for writing pure Ruby gems and improve overall performance in those scenarios.
00:21:37.440
To conclude, there has traditionally been a drive to write performance-critical code in C. However, with WET, we are entering a new paradigm.
00:21:50.720
As Ruby's speed continues to improve, the dynamics of writing low-level code change, and maintaining C code or native dependencies can become burdensome. It may expose developers to fragile system dependencies, leading to difficulties with debugging.
00:22:07.840
In some cases, we may indeed develop pure Ruby gems that rival C performance. Therefore, to leverage WET effectively, we need to nurture our Ruby capabilities and strive to refine Ruby for low-level code in the future.
00:22:30.080
The WET team is currently working on WET 3.4, where we are already seeing noteworthy performance enhancements.
00:22:49.600
Additionally, we are focusing on enhancing the quality of the generated code and improving quality of life regarding performance analysis in production.
00:23:07.920
With WET 3.4, you can expect better performance across the board, comparable memory usage, and a more thoroughly tested and stable WET implementation.
00:23:22.640
Lastly, if you're inclined, you can experiment with Ruby master and provide your feedback to us.
00:23:36.320
Finally, I should mention that we are also striving to enhance the performance of RSpec and other tools.
00:23:59.200
Thank you for listening. If you wish to learn more about WET, follow our work on the Rails blog, check out the WET readme, or feel free to reach out to me via email or Twitter.
00:24:02.600
After the talk, I am open to discussions and questions.
00:24:12.799
Thank you very much!