00:00:14.620
Hello, I'm Dr. Chris Seaton, and I work for Oracle Labs, which is the research arm of Oracle. We conduct experiments in programming languages and various areas of computer science. Today, I'm going to talk about TruffleRuby.
00:00:21.490
TruffleRuby is our new implementation of Ruby, and I’ll discuss the compilation pipeline. I was asked to give a detailed technical talk, so I will explain what TruffleRuby is at a high level before diving into the technical intricacies of how its compilation pipeline works.
00:00:28.630
As we delve deeper, I hope everyone can follow along, regardless of their expertise level. Because Oracle Labs is a research institution, this work is not yet a product; it’s simply a research project at this stage. You shouldn’t base your purchase of Oracle products or stocks on this talk.
00:00:34.750
Let’s start with the basics of TruffleRuby. Many people are interested in improving Ruby’s performance today because it is a language that many developers enjoy using. They appreciate its capabilities, especially in building systems like Rails.
00:00:47.080
However, these features often come at the cost of performance. Unfortunately, Ruby is frequently regarded as a slower programming language compared to others. Therefore, there is a widespread interest in improving Ruby’s performance without changing the language itself, which would be a fantastic outcome.
00:01:07.270
Currently, the core Ruby team is working on a three-by-three initiative to make Ruby three times faster.
00:01:10.820
This initiative includes a new just-in-time (JIT) compiler in Ruby 2.6. The JRuby team has also been focused on improving Ruby’s performance on the Java Virtual Machine (JVM).
00:01:20.470
Historically, in the Ruby community, various implementations have attempted to enhance performance. For instance, MagLev was an implementation of Ruby that ran on a Smalltalk VM. Not many people may know that IBM created an implementation of Ruby called OMR based on its JVM internals.
00:01:33.969
TruffleRuby builds on these foundations as a new Ruby implementation aimed at improving performance by executing idiomatic Ruby code faster. We also want to execute Ruby code in parallel, run C extensions in a managed environment for safety, and improve interoperability with other languages.
00:01:47.409
In today’s polyglot development world, we aim to enable users to seamlessly integrate different programming languages. Additionally, we are developing new tooling for debugging and monitoring, while maintaining high compatibility with the standard Ruby implementation. We are not looking to change Ruby, but rather to run it as it is.
00:02:02.380
The primary focus today will be on running idiomatic Ruby code faster. We achieve this by compiling it just-in-time down to machine code. This is the key to our performance increase with Ruby.
00:02:15.850
TruffleRuby is a Ruby implementation that you can install today. If you use RPM to manage your Ruby environment, you can readily install and run it, providing you with performance details.
00:02:30.520
The version I will be demonstrating today is open-source. Let's examine the performance of TruffleRuby to show that it runs idiomatic Ruby code faster.
00:02:47.080
When it comes to benchmarking, there are numerous approaches and ways to measure performance. I have written a program that uses the ERB templating library, which simply says "Hello, World!" and prints the execution time.
00:03:03.200
Then, I run a loop to render that template 100,000 times. This scenario mimics typical web applications where one renders ERB templates, and the output must be sent to standard output, which cannot be optimized away.
00:03:14.470
Ultimately, the time taken is printed after executing those 100,000 renders. On the standard Ruby implementation, this takes around 2.2 seconds or so.
00:03:31.350
I suspect that the standard implementation is trying to improve performance through their three-by-three initiative by adding numerous small optimizations on top of the existing features.
00:03:44.640
Presently, their major focus lies with the addition of the new just-in-time compiler, which attempts to compile Ruby code into machine code at runtime for optimization purposes.
00:03:58.930
However, currently, for idiomatic code, it has not demonstrated a significant speed-up yet. It's important to note that developing a JIT compiler is a lengthy process, and it’s not an immediate criticism of their work.
00:04:15.040
The JIT compiler is functional in other contexts, such as small numerical micro-benchmarks, but it struggles with rendering templates.
00:04:30.020
This new JIT compiler available in Ruby 2.6, accessible via a flag, allows you to attempt optimizations in your own code. However, the amazing aspect of the new implementation is that it does not alter Ruby's existing code.
00:04:47.080
Therefore, if your Ruby code runs without issues in the current implementation, it should equally run in TruffleRuby without requiring changes.
00:05:04.180
The JRuby team has also invested significant time in optimization, but when reviewing idiomatic code performance on JRuby, we do not observe a performance boost compared to the standard Ruby implementation.
00:05:23.200
Using JRuby version 2.6, for example, we find its performance is lower than standard Ruby, despite the engineering efforts and innovative ideas brought forth by JRuby.
00:05:43.680
One experimental feature they are working on, called InvokeDynamic, shows slight performance improvements when activated, though it isn't yet faster than the standard implementation of Ruby.
00:05:54.660
In contrast, TruffleRuby, installed via the RPM package and executed similarly, demonstrates significant performance improvements right away.
00:06:10.270
It does require a bit of warming up, consistent with how just-in-time compilation works, but we can achieve real-world performance increases when rendering an ERB template with idiomatic Ruby code.
00:06:24.640
Ultimately, TruffleRuby is a fast implementation of Ruby, accomplishing its performance goals.
00:06:40.300
However, there exists a challenge because many languages compete for developers who often select languages for human-centric reasons.
00:06:55.030
When people choose a programming language, they typically tie themselves to a specific set of libraries, an ecosystem, tools, and a certain performance profile.
00:07:12.610
In this chart I’ll show the general performance of various languages, which reflects the investment and resources devoted by large corporations over years.
00:07:32.470
Java and JavaScript, for instance, benefit from significant funding which enhances their performance over time, while other languages without such investment tend to lag behind.
00:07:47.490
At Oracle, we are endeavoring to create a system that automatically optimizes languages to a similar performance level without requiring such heavy investment.
00:08:00.250
With TruffleRuby, we are moving Ruby’s performance down to those more competitive levels.
00:08:19.440
Traditionally, implementing languages involves extensive refinement over time, beginning with simple prototypes and progressing to full-fledged virtual machines, often with substantial effort and resources.
00:08:35.030
For Ruby, this journey involved the development of a bytecode interpreter and an eventual JIT compiler.
00:08:51.180
We want to automate this process to swiftly progress from a prototype to high-performance implementations.
00:09:10.700
TruffleRuby is one of the languages being enhanced through our efforts, as part of the larger GraalVM project, which seeks to optimize programming languages across the board.
00:09:25.360
This comprehensive project facilitates the running of languages like Ruby, JavaScript, Python, and C/C++ together seamlessly.
00:09:40.520
Now, let’s delve into the concept of compilation and what it entails. Compilation is an abstraction that can occur at many stages in computer science.
00:09:56.510
Here, I refer specifically to the compilation of Ruby code to native machine code, not asset compilation or similar bytecode combinations.
00:10:11.870
For a simple Ruby program, we need to translate the Ruby code to machine code that can run on the processor. The goal is to do this efficiently and effectively.
00:10:29.480
When the Ruby program contains an addition operator, our aim is to generate a straightforward add instruction at the machine level.
00:10:42.090
That said, implementation isn’t straightforward due to Ruby's complex semantics, which can complicate the compilation process.
00:10:58.840
Compiling Ruby may seem esoteric and intimidating; yet, it’s just a function that takes Ruby source code as a string input, returning an array of bytes as output.
00:11:12.690
Although the compilation process involves numerous complex data structures and algorithms, it fundamentally revolves around strings and numbers.
00:11:28.610
Graal VM serves as the compiler we use for Ruby, and it was developed using Java, allowing for a higher-level approach compared to traditional methods in C or C++.
00:11:43.650
One of the significant challenges in compiling Ruby arises from its extensive capabilities and corner cases that increase complexity.
00:11:55.930
Ruby is a vast language when compared to, say, JavaScript, which is considerably simpler. The reason being, Ruby includes numerous core library features that JavaScript lacks.
00:12:11.430
Furthermore, Ruby's meta-programming capabilities allow for intricate patterns and behaviors that complicate compilation.
00:12:26.980
If you’re curious about the challenges of optimizing Ruby, I recommend reading Charli Nutter's blog post 'So You Want to Optimize Ruby,' which outlines the intricacies involved.
00:12:39.970
He summarizes that compiling Ruby necessitates addressing simple operations, such as whether you can produce a direct machine code instruction for addition.
00:12:56.420
Also, handling scenarios where variables may overflow or needs to be captured within closures complicates the design.
00:13:12.520
There are also global variables, method invalidation, and garbage collection to consider, so the compiler must effectively manage all these intricacies.
00:13:28.110
Maintaining Ruby's dynamic nature while attempting to compile the code is a paradox we constantly navigate. Our goal is to enable the efficient execution of Ruby code, often exemplified by running Rails.
00:13:45.580
Now, let's look specifically at the TruffleRuby compilation pipeline and how it addresses several of these challenges. I will first demonstrate a Fibonacci benchmark, which calculates the nth Fibonacci number.
00:14:03.410
The Fibonacci sequence can be defined recursively, which introduces various aspects we need to optimize, including conditionals, arithmetic operations, and function calls.
00:14:19.220
When it comes to Ruby code compilation, we start by parsing the source code into a corresponding data structure, effectively transforming strings from text into manageable objects.
00:14:35.550
This parsing process is vital as it enables us to manipulate the program more effectively, moving away from its string representation.
00:14:52.840
From here, the unique features of the TruffleRuby execution model begin to shine. We execute the parsed tree structure directly instead of generating bytecode.
00:15:05.800
During execution, we will start inferring types based on the nodes along the way.
00:15:21.860
If nodes represent literal integers, for instance, we can optimize these by marking them as fixed numbers.
00:15:35.080
In addition, we analyze the incoming values of local variables through execution to determine their types, thus allowing for more effective type-based optimizations.
00:15:56.720
By ensuring consistent input types, we can convert from general send operations to direct method calls, expediting function execution tremendously.
00:16:11.780
As the process continues, we will continue to refine and strengthen the type of the program, ensuring that it evolves into a strongly typed version analogous to C or Java.
00:16:27.670
At each iteration, we can validate our optimized types, which allows for the safe execution of code but remains flexible enough to accommodate variations in input.
00:16:43.060
Moving forward, we implement a methodology known as partial evaluation, which consolidates the relevant code into a cohesive unit.
00:16:59.570
Through this process, we treat chunks of code as if they were compiled from a single method definition, enhancing performance potential.
00:17:15.510
I will now demonstrate the actual workings of the compiler to visualize how the input is processed and executed.
00:17:29.790
Here, we use two visualization tools: the Ideal Graph Visualizer and the HotSpot Cline Compiler Visualizer.
00:17:42.690
These tools facilitate understanding of how the compiler interprets the Ruby code, transitioning it from a high-level structure to executable instructions.
00:17:57.220
By taking a closer look, you can notice how the compiler structures the nodes, identifying conditional branches, arithmetic operations, and handling of literals.
00:18:09.830
We generate a comprehensive graph representation where nodes express computations and edges indicate operations that need to happen, incorporating compiler logic.
00:18:24.800
The actual implementation of the graph might appear complicated, but the compiler harnesses structure to understand and optimize code effectively.
00:18:42.020
To further illustrate how this works in practice, we can look at how the compiled graph transitions to executable instructions, all the way down to detailing the machine code generation.
00:18:55.580
Once the structure is established, we need to define proper instructions, where every compiled node transitions into machine-level instructions.
00:19:10.010
This step involves selecting registers and ensuring instructions work seamlessly together to produce efficient machine code.
00:19:23.800
The trace of instructions fulfills the coordination necessary for successful execution, managing how data flows through the program.
00:19:39.590
Finally, we arrive at the generation of actual machine code, where the abstract structures become concrete instructions.
00:19:56.220
With these specific instructions in place, our Ruby code achieves high-performance execution, successfully delivering on the promise of TruffleRuby.
00:20:12.160
C extensions also play a significant role within this ecosystem. When executing Ruby applications, many rely on C extensions, which are libraries written in C, compiled, and integrated into the Ruby environment.
00:20:27.300
Our strategy with TruffleRuby allows us to interpret C extensions using the same frameworks that power Ruby, facilitating tighter integrations.
00:20:45.000
In fact, we utilize the LLVM compiler infrastructure to handle the compilation of C language extensions, merging the benefits of both languages.
00:21:02.360
Our efforts extend beyond Ruby to incorporate a range of languages, enhancing their performance and compatibility across the entire GraalVM ecosystem.
00:21:16.310
In conclusion, the scope of TruffleRuby and GraalVM aims to democratize programming languages, allowing developers to harness Ruby’s capabilities while enjoying optimized performance.
00:21:30.780
Ultimately, we strive to empower developers to choose the languages fitting their preferences, without losing the benefits of performance or surrounding ecosystem.
00:21:46.050
If you are interested in exploring TruffleRuby and GraalVM further, there's ample information available online, including GitHub repositories where you can access the source code.
00:21:58.590
We highly encourage trying out your own applications to see the performance benefits first-hand. If you encounter any issues, please don't hesitate to reach out to us.
00:22:12.320
Remember, this project remains a work in progress, and we appreciate user input as it assists in refining and enhancing the experience.
00:22:26.540
Thank you for your attention. I’m happy to take any questions you might have.
00:22:42.290
Audience Question: Who sound fantastic talk! Could you tell us the underlying architecture of GraalVM? Is it based on the JVM or built from scratch?
00:22:55.710
Dr. Chris Seaton: GraalVM was initially built on the JVM, but instead of generating Java bytecode, it interacts directly with the compiler within the JVM.
00:23:11.400
We have also developed a new JVM called Substrate VM, allowing us to compile applications into native executables with no dependencies on the JVM.
00:23:28.930
This enables us to run Ruby implementations without requiring pre-installed JVMs, providing a faster startup than traditional Ruby implementations.
00:23:44.510
Audience Question: What parameters do you consider when deciding to optimize code or when to hold back?
00:24:00.480
Dr. Chris Seaton: We monitor applications and gather profiling data to assess when methods are invoked often enough to warrant compilation.
00:24:14.320
The decision to optimize often involves a threshold where, when a method is invoked enough, we start to compile it.
00:24:32.920
Sometimes, while compiling a method, we observe usage patterns that allow us to optimize even further by inlining methods.
00:24:45.070
Audience Question: How do you compare memory usage with other implementations?
00:25:07.200
Dr. Chris Seaton: TruffleRuby may utilize more memory due to its complex graph data structures, particularly during the compilation phase.
00:25:23.330
However, once the program is optimized and running, memory consumption may be reduced due to more compact data structures.
00:25:39.140
Memory management in relation to JVM characteristics can be complex; however, runtime optimizations help balance additional memory usage with overall performance.
00:25:54.690
Audience Question: What are the broader visions for GraalVM? Is there potential for AWS Lambda-like services within its framework?
00:26:09.830
Dr. Chris Seaton: GraalVM is part of Oracle’s cloud offerings, and although it’s not tied to a specific product, it presents promising possibilities in various cloud service models.
00:26:25.040
Audience Question: Can simple scripts benefit from these optimizations? Could they be compiled once and cached?
00:26:42.800
Dr. Chris Seaton: We are examining options for ahead-of-time compilation methods in future developments, which may allow cached optimizations.
00:26:57.540
The current structure naturally leads to higher memory usage during compilation, but we aim to enhance boot times and execution speeds.
00:27:13.900
Audience Question: How does this affect debugging tools? Can existing tools be used, or do we need new ones?
00:27:30.720
Dr. Chris Seaton: Existing debugging tools can be used, but they may appear somewhat opaque. We've developed new debugging tools tailored for TruffleRuby that will work around Ruby's optimizations.
00:27:46.100
We also have a Chrome developer tools integration facilitating smooth usage of debugging Ruby applications, regardless of the language stack.
00:28:00.450
Audience Question: Regarding fibers, how does TruffleRuby handle them compared to MRI?
00:28:13.720
Dr. Chris Seaton: Fibers in TruffleRuby presently run as threads, which don't perform identically to the fibers in MRI. However, modifications are underway to improve fiber implementations.
00:28:28.760
Thank you for your insightful questions, and I appreciate the interest in the exciting developments of TruffleRuby.
00:28:41.830
Transcription concludes.