00:00:14.790
I'm Chris Seaton, and this is a talk about Ruby's C extension problem and how we're solving it. I work for Oracle, where I focus on a new implementation of Ruby called JRuby + Truffle.
00:00:21.430
I'll talk a bit about what JRuby + Truffle is and explain our progress, but first, I want to address the C extension problem, and why it is crucial to solve it.
00:00:32.579
Oracle wants you to know that this is just a research project, and you shouldn't buy anything from Oracle based on this being a real project that you can use. It is purely research.
00:00:43.809
We know we want to make Ruby faster. Many developers run applications and experience performance issues, and some are considering moving to other languages because they are not as fast as they'd like.
00:00:50.140
The main Ruby project, MRI (Matz's Ruby Interpreter), has been pursuing this challenge by trying various optimizations over the past years. Their goal is to make Ruby three times faster by 3.0, if feasible.
00:01:01.480
Meanwhile, JRuby has always aimed to enhance Ruby's performance by running on the JVM, utilizing the optimizations it provides. Other implementations, like Rubinius, use LLVM to improve Ruby's speed, leveraging a JIT written in C++. Recently, some improvements have come from hiring a Ruby fellow to focus on making Ruby faster.
00:01:13.990
For those new to the Ruby community, you might not remember a project called MagLev, which was another alternative implementation designed to speed up Ruby performance. Even IBM is working on OMR to enhance MRI's speed.
00:01:24.340
All these efforts focus on applying optimizations and new ideas about how to represent Ruby programs effectively. However, the traditional and effective way to increase Ruby's performance has been through C extensions.
00:01:36.790
C extensions are designed to allow Ruby programs to run on the Ruby interpreter while enabling users to create extensions to the interpreter using C. You compile these extensions with a C compiler, producing a binary library that extends the capabilities of the Ruby interpreter.
00:01:50.049
This approach effectively introduces new methods that appear as though they are part of the core library, and they execute with nearly the same speed as core library methods.
00:02:00.100
Historically, C extensions have delivered strong performance improvements. For example, the clamp routine clamps a number between a minimum and maximum value. This routine comes from real code used in a library for processing Photoshop files.
00:02:15.670
Unfortunately, this approach can be slow because it creates an array of numbers, sorts it, and indexes to find the middle value. This method, while effective for clamping, suffers from performance overhead.
00:02:28.120
To address this issue, the PSD library offers a C extension called PSD native. In this C function, it makes the parameters explicit and uses simple C logic to determine the clamped value between the two, avoiding unnecessary allocations and sorts.
00:02:41.590
Although C extensions have greatly increased Ruby's performance so far, there are significant problems with them. Many developers have a misconception about how C extensions operate.
00:02:55.630
They believe there is a neat API that cleanly facilitates communication between the Ruby interpreter and the C extension. They assume other implementations can seamlessly swap out the API, allowing for compatibility.
00:03:09.639
In reality, there is no standardized Ruby API; instead, there exists a dumping of Ruby internals into a header file. Developers can access any part of Ruby's internals, leading to chaos, especially for alternative implementations attempting to connect with C extensions.
00:03:25.149
When alternatives like JRuby and Rubinius try to interface with the C extension API, confusion arises due to the myriad ways to manipulate Ruby internals. This unpredictability stifles innovation and complicates performance optimization.
00:03:39.630
This issue isn't exclusive to alternative implementations. MRI also struggles to develop and improve its performance due to the restrictions imposed by the C extension API when attempting to implement optimizations.
00:03:51.190
As MRI seeks to enhance Ruby for version 3.0, the limitations of the C extension API may impede their progress due to the need to satisfy existing C extensions.
00:04:04.209
For instance, let's examine the OpenSSL C extension, which comes from MRI's codebase. This extension should be a best-practice example; however, it exposes several complexities.
00:04:10.110
The function, which retrieves a C pointer to a Ruby string's character data, serves various purposes. OpenSSL, for example, uses this pointer to pass the password to native code functions.
00:04:23.000
This works in MRI because every string is tied to a character pointer. However, in other Ruby implementations like JRuby, strings are represented as Java byte arrays, presenting a mismatch.
00:04:36.790
As optimizations continue to evolve, complications will arise, making access to these internal pointers problematic. Moreover, C extensions provide ways to access internal pointers for arrays, which further complicates representation in various Ruby implementations.
00:04:52.480
For example, the PSD extension retrieves a native array corresponding to pixel data in an image and processes it. This forces Ruby arrays to represent heavyweight Ruby values for numbers, rather than simple, compact representations.
00:05:07.090
Copying directly into the C extension API exposes all the internals of Ruby. This complicates data management for Ruby objects, especially in terms of the underlying structures that represent Ruby's state.
00:05:19.780
There are macros in the C extension API designed to simplify access to these fields, but they often fall short. The complexity of the C extension API can lead to performance degradation.
00:05:32.870
Another drawback of the C extension API is that invoking methods can be slower. In Ruby, when calling a method, we cache the method lookup for later use. However, this is not feasible in C code, resulting in slower method calls in C than in Ruby.
00:05:51.640
In the past, this was not an issue because Ruby's execution model was different, but with newer optimizations, this has become relevant—the native code is often harder for optimizers to inspect, leading to suboptimal performance.
00:06:06.780
C extensions are essentially a black box; they cannot be analyzed as easily as Ruby code. This complicates optimization opportunities, especially for the powerful compiler improvements that MRI aims to implement.
00:06:17.780
To address the C extension problem, several proposed solutions have emerged over the years. Currently, libraries like FFI (Foreign Function Interface) and Fiddle allow direct calls to C functions from Ruby, circumventing the need to write C extensions.
00:06:32.950
While these alternatives exist, the challenge is the vast amount of existing C code. There are about 2.1 billion lines of code in the Ruby gem repository, with about half a billion lines being C extensions.
00:06:44.780
It would be useful if developers wrote C extensions with FFI, but this has not been the case. Thus, we must enable current C extensions to work without forcing developers to transition to FFI.
00:06:55.890
We make attempts to implement the C extension API as effectively as possible while also providing optimizations alongside it. This generally requires extensive copying to manage extensions as needed.
00:07:06.970
For those cases where we represent strings more efficiently internally, exposing them directly to C extension APIs can present challenges. Ruby strings could potentially be gigabytes in size, leading to performance bottlenecks.
00:07:22.670
Attempts were made in the past to adopt this approach, as Rubinius is still doing today. JRuby, when I previously tried it, could only support about 60% of the extensions I was interested in, whereas Rubinius managed to run 90%.
00:07:36.280
However, a significant issue arises when these attempted C extensions fail; they often do not provide clear error messages indicating incompatibility, causing confusion and complicating the debugging process.
00:07:47.880
As progress continues, it becomes evident that C extension work has limitations. While there have been advancements in MRI, issues remain, particularly in relation to the internal structures of Ruby objects.
00:08:01.590
The documentation has stated that developers should avoid directly manipulating these structures to minimize the complexities of the whole process.
00:08:10.290
JRuby, unfortunately, had to abandon their efforts on C extensions. Although they had talented developers working on it, maintaining C extensions proved too complex.
00:08:24.219
As a result, they have removed support for C extensions entirely, which they might revisit in the future depending on the developments within JRuby + Truffle.
00:08:36.780
JRuby encourages developers to write Java extensions instead of C extensions. While this may work, the lack of widespread adoption of FFI limits the potential for this approach.
00:08:48.220
To optimize Ruby while keeping many of its internals constant, IBM's OMR has introduced a new garbage collector (GC) and Just-In-Time (JIT) compiler to Ruby, benefiting MRI without sacrificing compatibility.
00:09:02.050
However, the techniques they can leverage are limited, resulting in modest performance improvements, leading us back to consider JRuby + Truffle as a more robust solution.
00:09:17.240
I'll provide an introduction to our project and how it operates. There is already a Ruby implementation that operates on the JVM called JRuby. However, the JVM itself can present challenges when optimizing code.
00:09:36.020
JRuby can push bytecode to the JVM and attempt to run it efficiently, but its success is limited based on how well the JVM handles the bytecode from JRuby.
00:09:47.430
At Oracle, we aim to take the JIT outside the JVM, rewriting it in Java and exposing it as a library. This allows more precise communication with the JIT compiler, improving performance and optimization.
00:09:59.590
To facilitate the implementation of this concept, we employed a framework on top called Truffle. This framework enables developers to write interoperability between languages, streamlining the process.
00:10:12.460
By leveraging code from MRI, JRuby, and Rubinius, we were able to develop a new Ruby implementation on top of the Graal VM, which combines the JVM with the Graal compiler.
00:10:30.490
Our JRuby + Truffle project is part of a broader effort to enhance Ruby performance.
00:10:39.800
When we deal with expressions like A + B * C, we represent this as an abstract syntax tree (AST). We can compile that AST down into an equivalent module.
00:10:55.880
This compilation produces optimized machine code that mirrors optimized performance approaches that conventional compilers employ. Our output here is x86-64 machine code.
00:11:06.580
Through careful optimization, we can integrate Ruby's behavior within C code contexts, preserving Ruby functionality without unnecessary overhead.
00:11:20.340
For instance, in Ruby, when an overflow occurs (as with arithmetic operations), it seamlessly transitions to a big number representation. We manage these transitions effectively while preserving efficient execution.
00:11:34.240
Our approach is radical: we plan to interpret C code rather than simply compile it. We’ve created a C interpreter that will run C code within our Ruby interpreter, allowing seamless enhancements.
00:11:45.900
By interpreting the C code, we gain more control over how Ruby works and can adapt the interpreter more easily to maintain consistent integration of changes within Ruby.
00:12:02.080
This requires an intermediary representation of the C code that uses LLVM’s capabilities. We compile the C extension into a simplified representation, which we refer to as IR.
00:12:15.540
This IR is considerably less complex than raw C but retains essential logic. Instead of direct access to native pointers, IR references are more abstracted without compromising the functionality.
00:12:30.120
This abstraction allows us to optimize both Ruby and C code simultaneously. The integration ensures that when you call a C function in a Ruby context, we can inline the functions for performance gains.
00:12:42.190
An interesting avenue we've explored is developing parts of the C extension API in Ruby. For functions that convert fixed numbers to actual C integers, we implemented Ruby counterparts.
00:12:55.220
This means that every time a function in the C extension API is called, it returns to Ruby, leveraging Ruby's capabilities to manipulate and optimize how data is managed.
00:13:08.890
For example, regarding strings, by using a representation called ropes, we can avoid excessive memory copying when manipulating string data. Rather than directly managing character pointers, the representation indexes data more effectively.
00:13:20.350
Ultimately, we circumvent many traditional methods and provide the functionality directly, ensuring that developer experience is enhanced while maximizing performance.
00:13:32.640
Previous implementations of C extensions faced instability issues, particularly regarding performance benchmarks. Our new interpretations aim to provide viable comparisons, allowing us to reevaluate the usefulness of C extensions moving forward.
00:13:45.290
Although we present these theoretical approaches, we have actual performance metrics to demonstrate success in the form of benchmarks from libraries like Chunky PNG and Oily PNG.
00:14:01.620
Our native library implementations yielded significantly better results than pure Ruby versions. For instance, C extensions can enhance performance by up to ten times compared to Ruby code.
00:14:14.690
While performance improvements are notable in MRI, approaches in JRuby showed marginal gains, trailing behind those in MRI's performance.
00:14:25.320
With our implementation in JRuby + Truffle, we've achieved performance that is roughly three times faster than traditional MRI running C extensions.
00:14:38.970
People are often skeptical, thinking it counterintuitive that interpreted code can outperform compiled code. This is primarily due to the inherent inefficiencies found in the traditional native code execution model.
00:14:51.060
In our evaluated benchmarks, we turned off inlining to understand where gains were achieved. It was determined that inlining across languages was a primary contributor to performance enhancement.
00:15:05.170
As we continue to refine this process, we maintain high expectations for simplified access and performance maximization. Nevertheless, several limitations remain.
00:15:18.717
For this project, it's essential to access the source code of your C extensions. Without it, we cannot guarantee the desired performance and functionality.
00:15:34.700
The requirement exists primarily due to the need to modify internal behavior for efficiency and control. Thus, the freedom to access C source code is a critical factor in our implementation.
00:15:48.150
Another consideration revolves around managing object pointers. If you're using a native library like LibSSL, you can’t directly reference Ruby objects without introducing potential stability issues.
00:16:01.420
As such, we have built an API to convert objects to native handles and back. This, however, can limit some interactions, and we still have work to do on this front.
00:16:17.270
While I initially criticized FFI for low usage, I must clarify that it remains an excellent method for creating C extensions, providing robust support across various Ruby implementations.
00:16:32.050
In the future, we hope to reconcile the challenge of effectively utilizing FFI in conjunction with JRuby + Truffle to provide a streamlined interface for developers.
00:16:48.040
If you write C extensions today, embracing an FFI-centric approach could pave the way for smoother transitions in implementations.
00:17:01.950
We should also aim to develop robust baseline Ruby versions for libraries like PSD and Chunky PNG to ensure compatibility with our optimizations moving forward.
00:17:15.580
As we consider Java extensions, the approach in JRuby doesn't address the underlying problems faced by C extensions. Both Java and C extensions expose Ruby internals without a well-defined API.
00:17:29.370
JRuby currently lacks a defined Java extension support, but we can take Java extensions and compile them into a manageable Java bytecode interpreter for JRuby + Truffle.
00:17:44.970
This directed approach may yield fruitful long-term applications, drawing insights from work progressed in the JRuby + Truffle environment.
00:17:49.860
We have also considered utilizing LLVM IR that could allow us to enhance the overall performance while maintaining Ruby's core language features.
00:18:08.200
In terms of current JRuby + Truffle development, we've achieved classical research benchmarks that show significant performance improvements over MRI.
00:18:20.000
Our performance is from ten to twenty-five times faster than MRI in classic benchmarks, and up to ten times faster than JRuby.
00:18:31.690
While performance increases for memory allocation-bound applications or big integers can be limited, performance dramatically increases for computationally intense tasks.
00:18:46.000
For computationally intensive tasks like an N-body simulation, we’re seeing speed improvements of around forty times faster than traditional MRI implementations.
00:18:58.520
MRI's current focus on making Ruby three times faster has included efforts centered on an emulator called OptCarrot, which allows for exciting benchmarks.
00:19:12.800
We’ve been able to run OptCarrot about nine times faster than MRI and will continue working to refine performance as Ruby 3.0 approaches.
00:19:23.880
JRuby's current performance is around double that of MRI for these benchmarks, and optimizations across all implementations will change as future improvements come.
00:19:36.390
Additionally, JRuby + Truffle has achieved a significant milestone—passing 99% of the language spec tests with 96% coverage on core specs.
00:19:49.040
While full specs for standards remain a work in progress, we are now able to run Rails applications effectively, with basic support for various components.
00:20:02.650
After years of effort, we can run a basic Rails blog application, demonstrating substantial progress and validating our continued work in this area.
00:20:14.300
However, despite these accomplishments, significant limitations persist. The use of C extensions remains the biggest challenge, impacting the overall functionality of our developments.
00:20:27.670
Currently, many essential libraries such as database drivers and OpenSSL compatibility do not work, blocking progress on most Ruby applications.
00:20:41.960
Testing dependencies like Nokogiri pose significant issues due to the reliance on C extensions, preventing us from executing most applications effectively.
00:20:54.680
While our specifications don’t cover every edge case, we are working to optimize our performance and ensure Ruby functions as intended across mixed scenarios.
00:21:07.650
To experiment with JRuby + Truffle, you can search for Graal OTN (Oracle Technology Network) to access a binary tarball containing the necessary files.
00:21:19.750
You can also find our implementation of Ruby integrated with the newly developed JIT compiler, offering everything you need to explore.
00:21:33.160
For updates, visit my website where I share relevant papers, blog posts, and project updates. You can connect with us via GitHub, Ruby community forums, or on social media.
00:21:46.050
The team behind JRuby + Truffle is one of the largest Ruby implementation teams globally, comprising talented developers dedicated to advancing Ruby technology.
00:22:01.580
Many have made crucial contributions over the years in various capacities, and it’s important to acknowledge their hard work as we continue to make strides in Ruby development.
00:22:16.120
Thank you for your attention. I'm open to any questions regarding the project and its implications for C extensions or Ruby performance.
00:22:30.570
The question raised concerns whether the project requires the C source code for libraries interfacing with Ruby. Indeed, access to the source code is essential for any component using the Ruby C API.
00:22:44.400
Thus, while binary libraries can interface with Ruby, any interaction via the C API necessitates an accessible C source code.
00:22:57.540
Questions arose regarding whether the project implements locking. No locking is performed directly by default, aligning with JRuby and Rubinius's threading approaches, which only lock what is necessary.
00:23:12.340
In investigating memory models, we aim to establish formal rules and guidelines to ensure consistency across Ruby implementations while addressing current challenges.
00:23:25.340
Another inquiry pertained to passing data into native libraries, such as whether copying data is required under native calls. Yes, that’s an anticipated issue we’re actively examining.
00:23:39.340
We analyze how to optimize data passing for various libraries to minimize overhead while ensuring reliability.
00:23:53.220
Questions arose about the results presented based on normalized warm code. Indeed, warm-up is necessary for JIT compilation to reach effective runtime performance.
00:24:07.500
With JRuby + Truffle, we intentionally favor a start-up time trade-off, allowing more extended periods for optimizations, which will yield better performance overall.
00:24:19.180
We are also developing ahead-of-time compilation solutions to bridge the gap for developers needing quickly deployed applications.
00:24:34.710
In terms of memory usage, it's challenging to provide exact figures without running typical applications. We anticipate that while JRuby + Truffle may be heavier at first glance, the long-run performance benefits will outweigh initial loads.
00:24:52.900
For optimal performance, deploying JRuby + Truffle on a powerful server that can service multiple clients efficiently could offer substantial benefits, especially when optimizations are integrated.
00:25:05.790
In summary, COVID-19 C extensions remain a central challenge in optimizing Ruby performance while taking advantage of implemented innovations within JRuby + Truffle.