Keynote: Optimizing YJIT’s Performance, from Inception to Production

00:00:02.040 Thank you.

00:00:09.960 Hi everybody, my name is Maxime, and today I'm going to be telling you about optimizing YJIT's performance from inception to production.

00:00:24.539 I mean, wait until after to talk to the club. We all love Ruby, but optimizing Ruby's performance has proven to be a difficult and daunting problem.

00:00:35.820 There have been many different projects trying to build just-in-time compilers for Ruby to improve its performance, but most of these projects today are either abandoned or didn't reach the outcomes they hoped for at the beginning in terms of adoption.

00:01:01.140 Recently, YJIT has reached an important milestone at Shopify; we've deemed it to be 'production ready.' This is not just talk; we've deployed YJIT to all of the storefront render infrastructure at Shopify, serving all of the requests at Shopify stores.

00:01:12.780 Every time you visit a Shopify store, you're using code that was run by YJIT. We're seeing significant end-to-end speedups, and it's not just us; as of yesterday, Discord announced that they are also using YJIT in production and getting similar speedups as well.

00:01:45.720 This talk is not just technical; it's about some of the decisions that led to YJIT being production-ready. I also want to share a bit about the origin story and the background behind YJIT.

00:01:58.560 There's a famous quote by Mary Kay Ash: 'Ideas are a dime a dozen; people who implement them are priceless.' What I take from this is that many good ideas exist, but there aren't enough people out there implementing these good ideas.

00:02:29.459 Unfortunately, people tend to shorten the quote to 'Ideas are a dime a dozen,' which can lead to a cynical interpretation that all ideas are equally worthwhile or worthless, merely a matter of implementation.

00:02:50.400 However, I believe that starting from a bad concept may never deliver the desired outcomes, no matter how many resources or years are thrown at it. In this talk, I want to discuss the origins and goals of the YJIT project, the benchmarks we've curated to evaluate its performance, our data-driven approach to optimization, and some of the engineering trade-offs involved in compiler design.

00:03:35.040 The YJIT project started over two years ago, originally built primarily at Shopify but fully open-sourced. Notably, we received significant contributions from individuals at GitHub as well.

00:03:56.459 A key aspect of this project is that we take a data-driven approach to optimization. To achieve this, we have a large and diverse set of benchmarks; we benchmark often and gather detailed metrics.

00:04:08.220 The effort behind YJIT isn't just my own; it's a much larger team. This project would not be where it is today without the contributions of many amazing programmers.

00:04:19.919 One of the original goals of the YJIT project was to run any Ruby code, aiming for 100% compatibility with the code that we run in production at Shopify.

00:04:41.220 We decided early on that we couldn't require developers to change the codebase to fit YJIT, as our production codebase is vast.

00:04:58.500 Our primary focus is web workloads, mainly Ruby on Rails, and we aimed to achieve double-digit speedups on real-world software. Additionally, we wanted to ensure that YJIT never slows down Ruby code—at worst, we should see no speedup and no slowdown.

00:05:18.000 Our Ruby codebase is large, and YJIT has enabled efficient lazy generation of machine code.

00:05:34.740 We optimize our execution through runtime value promotion, type specialization, and polymorphic inline caches.

00:05:50.400 There’s more; in the past year, a lot of work has gone into YJIT for Ruby 3.2. We've imported YJIT to Rust, which was originally unplanned.

00:06:05.940 We've also implemented a new backend for ARM64, which offers native support for Apple hardware and Raspberry Pi, achieving good performance.

00:06:11.880 In 2022, we deployed YJIT across our infrastructure as no longer experimental.

00:06:31.380 I was thrilled with this progress.

00:06:37.860 Now, let’s go back a bit to the origins of YJIT from my perspective.

00:06:48.539 I began my undergraduate degree at McGill University in Montreal, Canada in 2004, and discovered a passion for compilers.

00:07:02.520 In 2007, I joined Professor Lori Hendren’s team at McGill's Sable Lab to pursue a master's degree.

00:07:14.639 My advisor wanted me to develop a JIT compiler for Matlab, focusing on numerical optimizations.

00:07:27.780 While working on this, I realized there was significant potential in type optimizations for dynamically typed languages.

00:08:06.560 As my master's work focused on generating specialized versions of Matlab functions based on arguments and type propagation, I culminated that work in a published paper.

00:08:26.639 In 2009, I transitioned to the University of Montreal for a PhD, initially planning to pursue optimization for Python.

00:08:41.820 Ultimately, we pivoted to optimizing JavaScript with hybrid type analysis, blending inter-procedural type analysis and speculative optimization.

00:09:05.519 Traditional fixed-point type analyses tend to be expensive and inefficient for dynamically typed languages, so they often don't yield valid type assumptions.

00:09:30.660 I explored a method to speculate on type correctness and to avoid expensive type analysis by disallowing speculative parts of the code.

00:09:49.860 However, I got scooped, which is a common challenge in academic research.

00:10:21.899 After some soul-searching, my advisor and I reconsidered our approach to type specialization.

00:10:41.460 We found a more efficient strategy without extensive type analysis, balancing between performance and resource overhead.

00:11:47.959 What is a JIT compiler? Many think of it as simply a static ahead-of-time compiler running at program initiation.

00:12:04.380 However, JIT compilers can observe the running program, giving them valuable insights for optimization.

00:12:26.579 This access to live program data allows us to generate more efficient machine code than static compilers.

00:12:40.560 Through this understanding, we devised lazy basic block versioning.

00:13:05.820 This technique leverages self-modifying code to enhance execution efficiency by delaying code generation.

00:13:35.100 Initially, this technique faced skepticism in the compiler community but proved successful after repeated attempts to publish.

00:14:53.840 Ultimately, our results showed that the lazy basic block versioning technique was able to achieve performance beyond traditional type analysis.

00:15:19.680 Shortly after joining Shopify, I discussed building a JIT for CRuby with my manager.

00:15:32.640 The project evolved from a toy to a dedicated effort, leveraging the principles we researched.

00:15:47.100 We generated super-instructions to optimize common sequences of YARV instructions.

00:16:04.620 Though our first prototype achieved speedups, it underperformed in Rails benchmarks.

00:16:20.400 Subsequently, we developed YJIT, which targeted x86_64 and aimed to achieve double-digit speedups.

00:16:41.100 After nine months of concentrated development, we delivered impressive performance results for realistic benchmarks.

00:17:10.239 Ultimately, effective benchmarking is crucial in JIT compiler development, historically focused on microbenchmarks.

00:17:37.679 However, a narrower focus on benchmarks can obscure potential performance issues.

00:17:52.780 The choice of benchmarks significantly impacts performance evaluation in compilers.

00:18:09.279 Our approach for benchmarking YJIT has enabled real-time usability and has gathered valuable metrics for optimization.

00:18:39.240 This allows developers to run benchmarks easily and encourages meaningful participation in the YJIT development process.

00:19:03.480 We focused on both representative and synthetic benchmarks, leading to better insights into our optimizations.

00:19:34.260 Attention to the setup process ensures that contributors are not deterred by complexity during benchmarking.

00:19:54.720 In terms of methodology, we recommend benchmarking on stable environments rather than on laptops.

00:20:11.540 It’s also essential to stabilize benchmarking results through consistent configurations.

00:20:31.380 The evolution of YJIT has shown continuous improvement in Ruby performance.

00:20:52.940 Performance metrics indicate significant advancements, with active record benchmarks showing over two times the speed as compared to Ruby 3.1.

00:21:14.300 We use various metrics to effectively evaluate the efficiency of our optimizations within the YJIT project.

00:21:43.800 Shopify's storefront renderer is crucial for all Shopify stores and requires high efficiency.

00:22:00.300 The storefront renderer processes significant traffic daily, affecting operational efficiency.

00:22:19.160 Therefore, we aim to measure YJIT performance directly against the storefront renderer.

00:22:50.300 We employ a systematic approach for testing in production systems.

00:23:06.600 As of January 2023, we initially achieved around a 6% speedup and recently improved that to 18%.

00:23:18.500 These results translate into significant savings in server infrastructure.

00:23:40.000 Designing a JIT compiler entails balancing performance, memory usage, and code generation speed.

00:24:02.300 The challenge lies in optimizing for user experience while ensuring the compiler remains efficient.

00:24:23.640 YJIT has made impressive strides in achieving better memory allocation strategies.

00:24:38.360 The implementation of code garbage collection and lazy memory allocation optimizes resource usage.

00:25:00.000 Overall, the YJIT project demonstrates continuous improvement in being production-ready.

00:25:15.620 Looking forward, we aim to enhance functionality and explore advanced profiling techniques.

00:25:34.150 Your feedback is vital for our continued improvement.

00:26:00.210 If YJIT is beneficial for your applications, please let us know.

00:26:12.859 We value constructive feedback on both successful implementations and performance challenges.

00:26:42.800 Our quest for optimization continues, and we are grateful for your support on this journey.

00:27:06.220 Thank you for listening, and I welcome your insights and questions.