Performance

Summarized using AI

The TruffleRuby Compilation Pipeline

Chris Seaton • March 22, 2019 • Wrocław, Poland

The video, titled "The TruffleRuby Compilation Pipeline," features Dr. Chris Seaton from Oracle Labs discussing TruffleRuby, a new implementation of the Ruby programming language aimed at enhancing its performance without altering the core features of Ruby. Dr. Seaton provides a detailed technical overview of the TruffleRuby compilation pipeline, beginning with a high-level introduction to TruffleRuby's goals.

Key points discussed include:
- TruffleRuby's Purpose: TruffleRuby aims to improve Ruby's performance by running idiomatic Ruby code faster, executing C extensions in a safe environment, and enhancing interoperability with other programming languages.
- Historical Context: The Ruby community has previously tackled performance issues with initiatives like the three-by-three initiative aimed at making Ruby three times faster, the new JIT compiler introduced in Ruby 2.6, and other implementations like JRuby and MagLev.
- Performance Benchmarking: Dr. Seaton benchmarks the performance of TruffleRuby against standard Ruby by rendering ERB templates, which illustrates the speed advantages of TruffleRuby. In his tests, TruffleRuby performs significantly better than the standard Ruby implementation and even JRuby.
- Compilation Process: The compilation process of Ruby code into native machine code is outlined, discussing the challenges related to Ruby's complex semantics and the advantages provided by GraalVM, the underlying compiler used for Ruby.
- Optimizations and Type Inference: TruffleRuby employs partial evaluation and type inference techniques to optimize Ruby code execution, moving from a general to a more specific type-based execution model, improving execution speed.
- Tooling and Ecosystem: Dr. Seaton highlights new tooling initiatives for debugging Ruby applications and integrating C extensions directly within the Ruby ecosystem, enhancing performance further.
- Future Directions: The video concludes with aspirations to make programming languages more performant across the board, with TruffleRuby serving as a prominent example within the GraalVM project, which seeks to allow smooth cross-language integration and execution.

The main takeaways from the talk emphasize the ongoing research focus at Oracle Labs and the potential of TruffleRuby to significantly improve developer experience and performance in Ruby applications, encouraging developers to explore and experiment with TruffleRuby. The project remains a work in progress, with ongoing user feedback playing a crucial role in its development.

The TruffleRuby Compilation Pipeline
Chris Seaton • March 22, 2019 • Wrocław, Poland

wroclove.rb 2019

00:00:14.620 Hello, I'm Dr. Chris Seaton, and I work for Oracle Labs, which is the research arm of Oracle. We conduct experiments in programming languages and various areas of computer science. Today, I'm going to talk about TruffleRuby.
00:00:21.490 TruffleRuby is our new implementation of Ruby, and I’ll discuss the compilation pipeline. I was asked to give a detailed technical talk, so I will explain what TruffleRuby is at a high level before diving into the technical intricacies of how its compilation pipeline works.
00:00:28.630 As we delve deeper, I hope everyone can follow along, regardless of their expertise level. Because Oracle Labs is a research institution, this work is not yet a product; it’s simply a research project at this stage. You shouldn’t base your purchase of Oracle products or stocks on this talk.
00:00:34.750 Let’s start with the basics of TruffleRuby. Many people are interested in improving Ruby’s performance today because it is a language that many developers enjoy using. They appreciate its capabilities, especially in building systems like Rails.
00:00:47.080 However, these features often come at the cost of performance. Unfortunately, Ruby is frequently regarded as a slower programming language compared to others. Therefore, there is a widespread interest in improving Ruby’s performance without changing the language itself, which would be a fantastic outcome.
00:01:07.270 Currently, the core Ruby team is working on a three-by-three initiative to make Ruby three times faster.
00:01:10.820 This initiative includes a new just-in-time (JIT) compiler in Ruby 2.6. The JRuby team has also been focused on improving Ruby’s performance on the Java Virtual Machine (JVM).
00:01:20.470 Historically, in the Ruby community, various implementations have attempted to enhance performance. For instance, MagLev was an implementation of Ruby that ran on a Smalltalk VM. Not many people may know that IBM created an implementation of Ruby called OMR based on its JVM internals.
00:01:33.969 TruffleRuby builds on these foundations as a new Ruby implementation aimed at improving performance by executing idiomatic Ruby code faster. We also want to execute Ruby code in parallel, run C extensions in a managed environment for safety, and improve interoperability with other languages.
00:01:47.409 In today’s polyglot development world, we aim to enable users to seamlessly integrate different programming languages. Additionally, we are developing new tooling for debugging and monitoring, while maintaining high compatibility with the standard Ruby implementation. We are not looking to change Ruby, but rather to run it as it is.
00:02:02.380 The primary focus today will be on running idiomatic Ruby code faster. We achieve this by compiling it just-in-time down to machine code. This is the key to our performance increase with Ruby.
00:02:15.850 TruffleRuby is a Ruby implementation that you can install today. If you use RPM to manage your Ruby environment, you can readily install and run it, providing you with performance details.
00:02:30.520 The version I will be demonstrating today is open-source. Let's examine the performance of TruffleRuby to show that it runs idiomatic Ruby code faster.
00:02:47.080 When it comes to benchmarking, there are numerous approaches and ways to measure performance. I have written a program that uses the ERB templating library, which simply says "Hello, World!" and prints the execution time.
00:03:03.200 Then, I run a loop to render that template 100,000 times. This scenario mimics typical web applications where one renders ERB templates, and the output must be sent to standard output, which cannot be optimized away.
00:03:14.470 Ultimately, the time taken is printed after executing those 100,000 renders. On the standard Ruby implementation, this takes around 2.2 seconds or so.
00:03:31.350 I suspect that the standard implementation is trying to improve performance through their three-by-three initiative by adding numerous small optimizations on top of the existing features.
00:03:44.640 Presently, their major focus lies with the addition of the new just-in-time compiler, which attempts to compile Ruby code into machine code at runtime for optimization purposes.
00:03:58.930 However, currently, for idiomatic code, it has not demonstrated a significant speed-up yet. It's important to note that developing a JIT compiler is a lengthy process, and it’s not an immediate criticism of their work.
00:04:15.040 The JIT compiler is functional in other contexts, such as small numerical micro-benchmarks, but it struggles with rendering templates.
00:04:30.020 This new JIT compiler available in Ruby 2.6, accessible via a flag, allows you to attempt optimizations in your own code. However, the amazing aspect of the new implementation is that it does not alter Ruby's existing code.
00:04:47.080 Therefore, if your Ruby code runs without issues in the current implementation, it should equally run in TruffleRuby without requiring changes.
00:05:04.180 The JRuby team has also invested significant time in optimization, but when reviewing idiomatic code performance on JRuby, we do not observe a performance boost compared to the standard Ruby implementation.
00:05:23.200 Using JRuby version 2.6, for example, we find its performance is lower than standard Ruby, despite the engineering efforts and innovative ideas brought forth by JRuby.
00:05:43.680 One experimental feature they are working on, called InvokeDynamic, shows slight performance improvements when activated, though it isn't yet faster than the standard implementation of Ruby.
00:05:54.660 In contrast, TruffleRuby, installed via the RPM package and executed similarly, demonstrates significant performance improvements right away.
00:06:10.270 It does require a bit of warming up, consistent with how just-in-time compilation works, but we can achieve real-world performance increases when rendering an ERB template with idiomatic Ruby code.
00:06:24.640 Ultimately, TruffleRuby is a fast implementation of Ruby, accomplishing its performance goals.
00:06:40.300 However, there exists a challenge because many languages compete for developers who often select languages for human-centric reasons.
00:06:55.030 When people choose a programming language, they typically tie themselves to a specific set of libraries, an ecosystem, tools, and a certain performance profile.
00:07:12.610 In this chart I’ll show the general performance of various languages, which reflects the investment and resources devoted by large corporations over years.
00:07:32.470 Java and JavaScript, for instance, benefit from significant funding which enhances their performance over time, while other languages without such investment tend to lag behind.
00:07:47.490 At Oracle, we are endeavoring to create a system that automatically optimizes languages to a similar performance level without requiring such heavy investment.
00:08:00.250 With TruffleRuby, we are moving Ruby’s performance down to those more competitive levels.
00:08:19.440 Traditionally, implementing languages involves extensive refinement over time, beginning with simple prototypes and progressing to full-fledged virtual machines, often with substantial effort and resources.
00:08:35.030 For Ruby, this journey involved the development of a bytecode interpreter and an eventual JIT compiler.
00:08:51.180 We want to automate this process to swiftly progress from a prototype to high-performance implementations.
00:09:10.700 TruffleRuby is one of the languages being enhanced through our efforts, as part of the larger GraalVM project, which seeks to optimize programming languages across the board.
00:09:25.360 This comprehensive project facilitates the running of languages like Ruby, JavaScript, Python, and C/C++ together seamlessly.
00:09:40.520 Now, let’s delve into the concept of compilation and what it entails. Compilation is an abstraction that can occur at many stages in computer science.
00:09:56.510 Here, I refer specifically to the compilation of Ruby code to native machine code, not asset compilation or similar bytecode combinations.
00:10:11.870 For a simple Ruby program, we need to translate the Ruby code to machine code that can run on the processor. The goal is to do this efficiently and effectively.
00:10:29.480 When the Ruby program contains an addition operator, our aim is to generate a straightforward add instruction at the machine level.
00:10:42.090 That said, implementation isn’t straightforward due to Ruby's complex semantics, which can complicate the compilation process.
00:10:58.840 Compiling Ruby may seem esoteric and intimidating; yet, it’s just a function that takes Ruby source code as a string input, returning an array of bytes as output.
00:11:12.690 Although the compilation process involves numerous complex data structures and algorithms, it fundamentally revolves around strings and numbers.
00:11:28.610 Graal VM serves as the compiler we use for Ruby, and it was developed using Java, allowing for a higher-level approach compared to traditional methods in C or C++.
00:11:43.650 One of the significant challenges in compiling Ruby arises from its extensive capabilities and corner cases that increase complexity.
00:11:55.930 Ruby is a vast language when compared to, say, JavaScript, which is considerably simpler. The reason being, Ruby includes numerous core library features that JavaScript lacks.
00:12:11.430 Furthermore, Ruby's meta-programming capabilities allow for intricate patterns and behaviors that complicate compilation.
00:12:26.980 If you’re curious about the challenges of optimizing Ruby, I recommend reading Charli Nutter's blog post 'So You Want to Optimize Ruby,' which outlines the intricacies involved.
00:12:39.970 He summarizes that compiling Ruby necessitates addressing simple operations, such as whether you can produce a direct machine code instruction for addition.
00:12:56.420 Also, handling scenarios where variables may overflow or needs to be captured within closures complicates the design.
00:13:12.520 There are also global variables, method invalidation, and garbage collection to consider, so the compiler must effectively manage all these intricacies.
00:13:28.110 Maintaining Ruby's dynamic nature while attempting to compile the code is a paradox we constantly navigate. Our goal is to enable the efficient execution of Ruby code, often exemplified by running Rails.
00:13:45.580 Now, let's look specifically at the TruffleRuby compilation pipeline and how it addresses several of these challenges. I will first demonstrate a Fibonacci benchmark, which calculates the nth Fibonacci number.
00:14:03.410 The Fibonacci sequence can be defined recursively, which introduces various aspects we need to optimize, including conditionals, arithmetic operations, and function calls.
00:14:19.220 When it comes to Ruby code compilation, we start by parsing the source code into a corresponding data structure, effectively transforming strings from text into manageable objects.
00:14:35.550 This parsing process is vital as it enables us to manipulate the program more effectively, moving away from its string representation.
00:14:52.840 From here, the unique features of the TruffleRuby execution model begin to shine. We execute the parsed tree structure directly instead of generating bytecode.
00:15:05.800 During execution, we will start inferring types based on the nodes along the way.
00:15:21.860 If nodes represent literal integers, for instance, we can optimize these by marking them as fixed numbers.
00:15:35.080 In addition, we analyze the incoming values of local variables through execution to determine their types, thus allowing for more effective type-based optimizations.
00:15:56.720 By ensuring consistent input types, we can convert from general send operations to direct method calls, expediting function execution tremendously.
00:16:11.780 As the process continues, we will continue to refine and strengthen the type of the program, ensuring that it evolves into a strongly typed version analogous to C or Java.
00:16:27.670 At each iteration, we can validate our optimized types, which allows for the safe execution of code but remains flexible enough to accommodate variations in input.
00:16:43.060 Moving forward, we implement a methodology known as partial evaluation, which consolidates the relevant code into a cohesive unit.
00:16:59.570 Through this process, we treat chunks of code as if they were compiled from a single method definition, enhancing performance potential.
00:17:15.510 I will now demonstrate the actual workings of the compiler to visualize how the input is processed and executed.
00:17:29.790 Here, we use two visualization tools: the Ideal Graph Visualizer and the HotSpot Cline Compiler Visualizer.
00:17:42.690 These tools facilitate understanding of how the compiler interprets the Ruby code, transitioning it from a high-level structure to executable instructions.
00:17:57.220 By taking a closer look, you can notice how the compiler structures the nodes, identifying conditional branches, arithmetic operations, and handling of literals.
00:18:09.830 We generate a comprehensive graph representation where nodes express computations and edges indicate operations that need to happen, incorporating compiler logic.
00:18:24.800 The actual implementation of the graph might appear complicated, but the compiler harnesses structure to understand and optimize code effectively.
00:18:42.020 To further illustrate how this works in practice, we can look at how the compiled graph transitions to executable instructions, all the way down to detailing the machine code generation.
00:18:55.580 Once the structure is established, we need to define proper instructions, where every compiled node transitions into machine-level instructions.
00:19:10.010 This step involves selecting registers and ensuring instructions work seamlessly together to produce efficient machine code.
00:19:23.800 The trace of instructions fulfills the coordination necessary for successful execution, managing how data flows through the program.
00:19:39.590 Finally, we arrive at the generation of actual machine code, where the abstract structures become concrete instructions.
00:19:56.220 With these specific instructions in place, our Ruby code achieves high-performance execution, successfully delivering on the promise of TruffleRuby.
00:20:12.160 C extensions also play a significant role within this ecosystem. When executing Ruby applications, many rely on C extensions, which are libraries written in C, compiled, and integrated into the Ruby environment.
00:20:27.300 Our strategy with TruffleRuby allows us to interpret C extensions using the same frameworks that power Ruby, facilitating tighter integrations.
00:20:45.000 In fact, we utilize the LLVM compiler infrastructure to handle the compilation of C language extensions, merging the benefits of both languages.
00:21:02.360 Our efforts extend beyond Ruby to incorporate a range of languages, enhancing their performance and compatibility across the entire GraalVM ecosystem.
00:21:16.310 In conclusion, the scope of TruffleRuby and GraalVM aims to democratize programming languages, allowing developers to harness Ruby’s capabilities while enjoying optimized performance.
00:21:30.780 Ultimately, we strive to empower developers to choose the languages fitting their preferences, without losing the benefits of performance or surrounding ecosystem.
00:21:46.050 If you are interested in exploring TruffleRuby and GraalVM further, there's ample information available online, including GitHub repositories where you can access the source code.
00:21:58.590 We highly encourage trying out your own applications to see the performance benefits first-hand. If you encounter any issues, please don't hesitate to reach out to us.
00:22:12.320 Remember, this project remains a work in progress, and we appreciate user input as it assists in refining and enhancing the experience.
00:22:26.540 Thank you for your attention. I’m happy to take any questions you might have.
00:22:42.290 Audience Question: Who sound fantastic talk! Could you tell us the underlying architecture of GraalVM? Is it based on the JVM or built from scratch?
00:22:55.710 Dr. Chris Seaton: GraalVM was initially built on the JVM, but instead of generating Java bytecode, it interacts directly with the compiler within the JVM.
00:23:11.400 We have also developed a new JVM called Substrate VM, allowing us to compile applications into native executables with no dependencies on the JVM.
00:23:28.930 This enables us to run Ruby implementations without requiring pre-installed JVMs, providing a faster startup than traditional Ruby implementations.
00:23:44.510 Audience Question: What parameters do you consider when deciding to optimize code or when to hold back?
00:24:00.480 Dr. Chris Seaton: We monitor applications and gather profiling data to assess when methods are invoked often enough to warrant compilation.
00:24:14.320 The decision to optimize often involves a threshold where, when a method is invoked enough, we start to compile it.
00:24:32.920 Sometimes, while compiling a method, we observe usage patterns that allow us to optimize even further by inlining methods.
00:24:45.070 Audience Question: How do you compare memory usage with other implementations?
00:25:07.200 Dr. Chris Seaton: TruffleRuby may utilize more memory due to its complex graph data structures, particularly during the compilation phase.
00:25:23.330 However, once the program is optimized and running, memory consumption may be reduced due to more compact data structures.
00:25:39.140 Memory management in relation to JVM characteristics can be complex; however, runtime optimizations help balance additional memory usage with overall performance.
00:25:54.690 Audience Question: What are the broader visions for GraalVM? Is there potential for AWS Lambda-like services within its framework?
00:26:09.830 Dr. Chris Seaton: GraalVM is part of Oracle’s cloud offerings, and although it’s not tied to a specific product, it presents promising possibilities in various cloud service models.
00:26:25.040 Audience Question: Can simple scripts benefit from these optimizations? Could they be compiled once and cached?
00:26:42.800 Dr. Chris Seaton: We are examining options for ahead-of-time compilation methods in future developments, which may allow cached optimizations.
00:26:57.540 The current structure naturally leads to higher memory usage during compilation, but we aim to enhance boot times and execution speeds.
00:27:13.900 Audience Question: How does this affect debugging tools? Can existing tools be used, or do we need new ones?
00:27:30.720 Dr. Chris Seaton: Existing debugging tools can be used, but they may appear somewhat opaque. We've developed new debugging tools tailored for TruffleRuby that will work around Ruby's optimizations.
00:27:46.100 We also have a Chrome developer tools integration facilitating smooth usage of debugging Ruby applications, regardless of the language stack.
00:28:00.450 Audience Question: Regarding fibers, how does TruffleRuby handle them compared to MRI?
00:28:13.720 Dr. Chris Seaton: Fibers in TruffleRuby presently run as threads, which don't perform identically to the fibers in MRI. However, modifications are underway to improve fiber implementations.
00:28:28.760 Thank you for your insightful questions, and I appreciate the interest in the exciting developments of TruffleRuby.
00:28:41.830 Transcription concludes.
Explore all talks recorded at wroclove.rb 2019
+13