Talks

Ruby’s C Extension Problem and How We're Solving It

Ruby’s C Extension Problem and How We're Solving It

by Chris Seaton

The video titled "Ruby’s C Extension Problem and How We're Solving It" presented by Chris Seaton at RubyConf 2016 addresses the challenges posed by Ruby's C extensions and introduces JRuby + Truffle as a solution.

Main Topics Covered:

  • Overview of JRuby + Truffle: Chris introduces JRuby + Truffle, a project aiming to improve Ruby’s performance by addressing limitations associated with C extensions.
  • C Extension Problem: While C extensions have traditionally enhanced Ruby performance, they hinder significant improvements by exposing Ruby's internals. This leads to complexity and restricts innovation across various Ruby implementations.
  • Performance Challenges: Developers often experience performance issues with Ruby, prompting some to consider alternative languages. The inconsistency in the C extension API creates confusion for both MRI and alternative implementations like JRuby, impacting performance optimization efforts.
  • Alternative Implementations: Various projects like Rubinius and IBM's OMR attempt to improve Ruby performance using different strategies, but face limitations due to the dependency on C extensions.
  • Recent Advancements: JRuby previously supported C extensions but faced significant maintenance challenges, leading to their removal. Instead, JRuby now encourages writing Java extensions, although this lacks widespread adoption.
  • JRuby + Truffle Approach: This project is set to radically reinterpret C extensions by allowing Ruby to interpret C code, creating an interoperation layer that can optimize both Ruby and C simultaneously. The framework utilizes the Graal VM, rewriting the JIT compiler for enhanced efficiency.
  • Performance Metrics: Through benchmarking, JRuby + Truffle demonstrates up to three times improved performance over traditional MRI. Examples include speed benefits in computationally demanding applications like N-body simulations.
  • Future Directions: Access to C source code is essential for optimizing libraries interfacing with Ruby. JRuby + Truffle aims to simplify developer experience while addressing the underlying challenges posed by C extensions.

Main Takeaways:

  • C extensions pose significant hurdles for Ruby's performance optimization.
  • JRuby + Truffle introduces a novel approach to interpret C extensions, aligning with Ruby’s internal structures without the complexities of C.
  • Ongoing research and development are leading to promising performance benchmarks, indicating significant potential in improving Ruby's execution capabilities without sacrificing compatibility with existing libraries.

Chris concludes by emphasizing the importance of addressing the challenges of C extensions for Ruby's future performance enhancements, suggesting ongoing research efforts in this area.

00:00:14.790 I'm Chris Seaton, and this is a talk about Ruby's C extension problem and how we're solving it. I work for Oracle, where I focus on a new implementation of Ruby called JRuby + Truffle.
00:00:21.430 I'll talk a bit about what JRuby + Truffle is and explain our progress, but first, I want to address the C extension problem, and why it is crucial to solve it.
00:00:32.579 Oracle wants you to know that this is just a research project, and you shouldn't buy anything from Oracle based on this being a real project that you can use. It is purely research.
00:00:43.809 We know we want to make Ruby faster. Many developers run applications and experience performance issues, and some are considering moving to other languages because they are not as fast as they'd like.
00:00:50.140 The main Ruby project, MRI (Matz's Ruby Interpreter), has been pursuing this challenge by trying various optimizations over the past years. Their goal is to make Ruby three times faster by 3.0, if feasible.
00:01:01.480 Meanwhile, JRuby has always aimed to enhance Ruby's performance by running on the JVM, utilizing the optimizations it provides. Other implementations, like Rubinius, use LLVM to improve Ruby's speed, leveraging a JIT written in C++. Recently, some improvements have come from hiring a Ruby fellow to focus on making Ruby faster.
00:01:13.990 For those new to the Ruby community, you might not remember a project called MagLev, which was another alternative implementation designed to speed up Ruby performance. Even IBM is working on OMR to enhance MRI's speed.
00:01:24.340 All these efforts focus on applying optimizations and new ideas about how to represent Ruby programs effectively. However, the traditional and effective way to increase Ruby's performance has been through C extensions.
00:01:36.790 C extensions are designed to allow Ruby programs to run on the Ruby interpreter while enabling users to create extensions to the interpreter using C. You compile these extensions with a C compiler, producing a binary library that extends the capabilities of the Ruby interpreter.
00:01:50.049 This approach effectively introduces new methods that appear as though they are part of the core library, and they execute with nearly the same speed as core library methods.
00:02:00.100 Historically, C extensions have delivered strong performance improvements. For example, the clamp routine clamps a number between a minimum and maximum value. This routine comes from real code used in a library for processing Photoshop files.
00:02:15.670 Unfortunately, this approach can be slow because it creates an array of numbers, sorts it, and indexes to find the middle value. This method, while effective for clamping, suffers from performance overhead.
00:02:28.120 To address this issue, the PSD library offers a C extension called PSD native. In this C function, it makes the parameters explicit and uses simple C logic to determine the clamped value between the two, avoiding unnecessary allocations and sorts.
00:02:41.590 Although C extensions have greatly increased Ruby's performance so far, there are significant problems with them. Many developers have a misconception about how C extensions operate.
00:02:55.630 They believe there is a neat API that cleanly facilitates communication between the Ruby interpreter and the C extension. They assume other implementations can seamlessly swap out the API, allowing for compatibility.
00:03:09.639 In reality, there is no standardized Ruby API; instead, there exists a dumping of Ruby internals into a header file. Developers can access any part of Ruby's internals, leading to chaos, especially for alternative implementations attempting to connect with C extensions.
00:03:25.149 When alternatives like JRuby and Rubinius try to interface with the C extension API, confusion arises due to the myriad ways to manipulate Ruby internals. This unpredictability stifles innovation and complicates performance optimization.
00:03:39.630 This issue isn't exclusive to alternative implementations. MRI also struggles to develop and improve its performance due to the restrictions imposed by the C extension API when attempting to implement optimizations.
00:03:51.190 As MRI seeks to enhance Ruby for version 3.0, the limitations of the C extension API may impede their progress due to the need to satisfy existing C extensions.
00:04:04.209 For instance, let's examine the OpenSSL C extension, which comes from MRI's codebase. This extension should be a best-practice example; however, it exposes several complexities.
00:04:10.110 The function, which retrieves a C pointer to a Ruby string's character data, serves various purposes. OpenSSL, for example, uses this pointer to pass the password to native code functions.
00:04:23.000 This works in MRI because every string is tied to a character pointer. However, in other Ruby implementations like JRuby, strings are represented as Java byte arrays, presenting a mismatch.
00:04:36.790 As optimizations continue to evolve, complications will arise, making access to these internal pointers problematic. Moreover, C extensions provide ways to access internal pointers for arrays, which further complicates representation in various Ruby implementations.
00:04:52.480 For example, the PSD extension retrieves a native array corresponding to pixel data in an image and processes it. This forces Ruby arrays to represent heavyweight Ruby values for numbers, rather than simple, compact representations.
00:05:07.090 Copying directly into the C extension API exposes all the internals of Ruby. This complicates data management for Ruby objects, especially in terms of the underlying structures that represent Ruby's state.
00:05:19.780 There are macros in the C extension API designed to simplify access to these fields, but they often fall short. The complexity of the C extension API can lead to performance degradation.
00:05:32.870 Another drawback of the C extension API is that invoking methods can be slower. In Ruby, when calling a method, we cache the method lookup for later use. However, this is not feasible in C code, resulting in slower method calls in C than in Ruby.
00:05:51.640 In the past, this was not an issue because Ruby's execution model was different, but with newer optimizations, this has become relevant—the native code is often harder for optimizers to inspect, leading to suboptimal performance.
00:06:06.780 C extensions are essentially a black box; they cannot be analyzed as easily as Ruby code. This complicates optimization opportunities, especially for the powerful compiler improvements that MRI aims to implement.
00:06:17.780 To address the C extension problem, several proposed solutions have emerged over the years. Currently, libraries like FFI (Foreign Function Interface) and Fiddle allow direct calls to C functions from Ruby, circumventing the need to write C extensions.
00:06:32.950 While these alternatives exist, the challenge is the vast amount of existing C code. There are about 2.1 billion lines of code in the Ruby gem repository, with about half a billion lines being C extensions.
00:06:44.780 It would be useful if developers wrote C extensions with FFI, but this has not been the case. Thus, we must enable current C extensions to work without forcing developers to transition to FFI.
00:06:55.890 We make attempts to implement the C extension API as effectively as possible while also providing optimizations alongside it. This generally requires extensive copying to manage extensions as needed.
00:07:06.970 For those cases where we represent strings more efficiently internally, exposing them directly to C extension APIs can present challenges. Ruby strings could potentially be gigabytes in size, leading to performance bottlenecks.
00:07:22.670 Attempts were made in the past to adopt this approach, as Rubinius is still doing today. JRuby, when I previously tried it, could only support about 60% of the extensions I was interested in, whereas Rubinius managed to run 90%.
00:07:36.280 However, a significant issue arises when these attempted C extensions fail; they often do not provide clear error messages indicating incompatibility, causing confusion and complicating the debugging process.
00:07:47.880 As progress continues, it becomes evident that C extension work has limitations. While there have been advancements in MRI, issues remain, particularly in relation to the internal structures of Ruby objects.
00:08:01.590 The documentation has stated that developers should avoid directly manipulating these structures to minimize the complexities of the whole process.
00:08:10.290 JRuby, unfortunately, had to abandon their efforts on C extensions. Although they had talented developers working on it, maintaining C extensions proved too complex.
00:08:24.219 As a result, they have removed support for C extensions entirely, which they might revisit in the future depending on the developments within JRuby + Truffle.
00:08:36.780 JRuby encourages developers to write Java extensions instead of C extensions. While this may work, the lack of widespread adoption of FFI limits the potential for this approach.
00:08:48.220 To optimize Ruby while keeping many of its internals constant, IBM's OMR has introduced a new garbage collector (GC) and Just-In-Time (JIT) compiler to Ruby, benefiting MRI without sacrificing compatibility.
00:09:02.050 However, the techniques they can leverage are limited, resulting in modest performance improvements, leading us back to consider JRuby + Truffle as a more robust solution.
00:09:17.240 I'll provide an introduction to our project and how it operates. There is already a Ruby implementation that operates on the JVM called JRuby. However, the JVM itself can present challenges when optimizing code.
00:09:36.020 JRuby can push bytecode to the JVM and attempt to run it efficiently, but its success is limited based on how well the JVM handles the bytecode from JRuby.
00:09:47.430 At Oracle, we aim to take the JIT outside the JVM, rewriting it in Java and exposing it as a library. This allows more precise communication with the JIT compiler, improving performance and optimization.
00:09:59.590 To facilitate the implementation of this concept, we employed a framework on top called Truffle. This framework enables developers to write interoperability between languages, streamlining the process.
00:10:12.460 By leveraging code from MRI, JRuby, and Rubinius, we were able to develop a new Ruby implementation on top of the Graal VM, which combines the JVM with the Graal compiler.
00:10:30.490 Our JRuby + Truffle project is part of a broader effort to enhance Ruby performance.
00:10:39.800 When we deal with expressions like A + B * C, we represent this as an abstract syntax tree (AST). We can compile that AST down into an equivalent module.
00:10:55.880 This compilation produces optimized machine code that mirrors optimized performance approaches that conventional compilers employ. Our output here is x86-64 machine code.
00:11:06.580 Through careful optimization, we can integrate Ruby's behavior within C code contexts, preserving Ruby functionality without unnecessary overhead.
00:11:20.340 For instance, in Ruby, when an overflow occurs (as with arithmetic operations), it seamlessly transitions to a big number representation. We manage these transitions effectively while preserving efficient execution.
00:11:34.240 Our approach is radical: we plan to interpret C code rather than simply compile it. We’ve created a C interpreter that will run C code within our Ruby interpreter, allowing seamless enhancements.
00:11:45.900 By interpreting the C code, we gain more control over how Ruby works and can adapt the interpreter more easily to maintain consistent integration of changes within Ruby.
00:12:02.080 This requires an intermediary representation of the C code that uses LLVM’s capabilities. We compile the C extension into a simplified representation, which we refer to as IR.
00:12:15.540 This IR is considerably less complex than raw C but retains essential logic. Instead of direct access to native pointers, IR references are more abstracted without compromising the functionality.
00:12:30.120 This abstraction allows us to optimize both Ruby and C code simultaneously. The integration ensures that when you call a C function in a Ruby context, we can inline the functions for performance gains.
00:12:42.190 An interesting avenue we've explored is developing parts of the C extension API in Ruby. For functions that convert fixed numbers to actual C integers, we implemented Ruby counterparts.
00:12:55.220 This means that every time a function in the C extension API is called, it returns to Ruby, leveraging Ruby's capabilities to manipulate and optimize how data is managed.
00:13:08.890 For example, regarding strings, by using a representation called ropes, we can avoid excessive memory copying when manipulating string data. Rather than directly managing character pointers, the representation indexes data more effectively.
00:13:20.350 Ultimately, we circumvent many traditional methods and provide the functionality directly, ensuring that developer experience is enhanced while maximizing performance.
00:13:32.640 Previous implementations of C extensions faced instability issues, particularly regarding performance benchmarks. Our new interpretations aim to provide viable comparisons, allowing us to reevaluate the usefulness of C extensions moving forward.
00:13:45.290 Although we present these theoretical approaches, we have actual performance metrics to demonstrate success in the form of benchmarks from libraries like Chunky PNG and Oily PNG.
00:14:01.620 Our native library implementations yielded significantly better results than pure Ruby versions. For instance, C extensions can enhance performance by up to ten times compared to Ruby code.
00:14:14.690 While performance improvements are notable in MRI, approaches in JRuby showed marginal gains, trailing behind those in MRI's performance.
00:14:25.320 With our implementation in JRuby + Truffle, we've achieved performance that is roughly three times faster than traditional MRI running C extensions.
00:14:38.970 People are often skeptical, thinking it counterintuitive that interpreted code can outperform compiled code. This is primarily due to the inherent inefficiencies found in the traditional native code execution model.
00:14:51.060 In our evaluated benchmarks, we turned off inlining to understand where gains were achieved. It was determined that inlining across languages was a primary contributor to performance enhancement.
00:15:05.170 As we continue to refine this process, we maintain high expectations for simplified access and performance maximization. Nevertheless, several limitations remain.
00:15:18.717 For this project, it's essential to access the source code of your C extensions. Without it, we cannot guarantee the desired performance and functionality.
00:15:34.700 The requirement exists primarily due to the need to modify internal behavior for efficiency and control. Thus, the freedom to access C source code is a critical factor in our implementation.
00:15:48.150 Another consideration revolves around managing object pointers. If you're using a native library like LibSSL, you can’t directly reference Ruby objects without introducing potential stability issues.
00:16:01.420 As such, we have built an API to convert objects to native handles and back. This, however, can limit some interactions, and we still have work to do on this front.
00:16:17.270 While I initially criticized FFI for low usage, I must clarify that it remains an excellent method for creating C extensions, providing robust support across various Ruby implementations.
00:16:32.050 In the future, we hope to reconcile the challenge of effectively utilizing FFI in conjunction with JRuby + Truffle to provide a streamlined interface for developers.
00:16:48.040 If you write C extensions today, embracing an FFI-centric approach could pave the way for smoother transitions in implementations.
00:17:01.950 We should also aim to develop robust baseline Ruby versions for libraries like PSD and Chunky PNG to ensure compatibility with our optimizations moving forward.
00:17:15.580 As we consider Java extensions, the approach in JRuby doesn't address the underlying problems faced by C extensions. Both Java and C extensions expose Ruby internals without a well-defined API.
00:17:29.370 JRuby currently lacks a defined Java extension support, but we can take Java extensions and compile them into a manageable Java bytecode interpreter for JRuby + Truffle.
00:17:44.970 This directed approach may yield fruitful long-term applications, drawing insights from work progressed in the JRuby + Truffle environment.
00:17:49.860 We have also considered utilizing LLVM IR that could allow us to enhance the overall performance while maintaining Ruby's core language features.
00:18:08.200 In terms of current JRuby + Truffle development, we've achieved classical research benchmarks that show significant performance improvements over MRI.
00:18:20.000 Our performance is from ten to twenty-five times faster than MRI in classic benchmarks, and up to ten times faster than JRuby.
00:18:31.690 While performance increases for memory allocation-bound applications or big integers can be limited, performance dramatically increases for computationally intense tasks.
00:18:46.000 For computationally intensive tasks like an N-body simulation, we’re seeing speed improvements of around forty times faster than traditional MRI implementations.
00:18:58.520 MRI's current focus on making Ruby three times faster has included efforts centered on an emulator called OptCarrot, which allows for exciting benchmarks.
00:19:12.800 We’ve been able to run OptCarrot about nine times faster than MRI and will continue working to refine performance as Ruby 3.0 approaches.
00:19:23.880 JRuby's current performance is around double that of MRI for these benchmarks, and optimizations across all implementations will change as future improvements come.
00:19:36.390 Additionally, JRuby + Truffle has achieved a significant milestone—passing 99% of the language spec tests with 96% coverage on core specs.
00:19:49.040 While full specs for standards remain a work in progress, we are now able to run Rails applications effectively, with basic support for various components.
00:20:02.650 After years of effort, we can run a basic Rails blog application, demonstrating substantial progress and validating our continued work in this area.
00:20:14.300 However, despite these accomplishments, significant limitations persist. The use of C extensions remains the biggest challenge, impacting the overall functionality of our developments.
00:20:27.670 Currently, many essential libraries such as database drivers and OpenSSL compatibility do not work, blocking progress on most Ruby applications.
00:20:41.960 Testing dependencies like Nokogiri pose significant issues due to the reliance on C extensions, preventing us from executing most applications effectively.
00:20:54.680 While our specifications don’t cover every edge case, we are working to optimize our performance and ensure Ruby functions as intended across mixed scenarios.
00:21:07.650 To experiment with JRuby + Truffle, you can search for Graal OTN (Oracle Technology Network) to access a binary tarball containing the necessary files.
00:21:19.750 You can also find our implementation of Ruby integrated with the newly developed JIT compiler, offering everything you need to explore.
00:21:33.160 For updates, visit my website where I share relevant papers, blog posts, and project updates. You can connect with us via GitHub, Ruby community forums, or on social media.
00:21:46.050 The team behind JRuby + Truffle is one of the largest Ruby implementation teams globally, comprising talented developers dedicated to advancing Ruby technology.
00:22:01.580 Many have made crucial contributions over the years in various capacities, and it’s important to acknowledge their hard work as we continue to make strides in Ruby development.
00:22:16.120 Thank you for your attention. I'm open to any questions regarding the project and its implications for C extensions or Ruby performance.
00:22:30.570 The question raised concerns whether the project requires the C source code for libraries interfacing with Ruby. Indeed, access to the source code is essential for any component using the Ruby C API.
00:22:44.400 Thus, while binary libraries can interface with Ruby, any interaction via the C API necessitates an accessible C source code.
00:22:57.540 Questions arose regarding whether the project implements locking. No locking is performed directly by default, aligning with JRuby and Rubinius's threading approaches, which only lock what is necessary.
00:23:12.340 In investigating memory models, we aim to establish formal rules and guidelines to ensure consistency across Ruby implementations while addressing current challenges.
00:23:25.340 Another inquiry pertained to passing data into native libraries, such as whether copying data is required under native calls. Yes, that’s an anticipated issue we’re actively examining.
00:23:39.340 We analyze how to optimize data passing for various libraries to minimize overhead while ensuring reliability.
00:23:53.220 Questions arose about the results presented based on normalized warm code. Indeed, warm-up is necessary for JIT compilation to reach effective runtime performance.
00:24:07.500 With JRuby + Truffle, we intentionally favor a start-up time trade-off, allowing more extended periods for optimizations, which will yield better performance overall.
00:24:19.180 We are also developing ahead-of-time compilation solutions to bridge the gap for developers needing quickly deployed applications.
00:24:34.710 In terms of memory usage, it's challenging to provide exact figures without running typical applications. We anticipate that while JRuby + Truffle may be heavier at first glance, the long-run performance benefits will outweigh initial loads.
00:24:52.900 For optimal performance, deploying JRuby + Truffle on a powerful server that can service multiple clients efficiently could offer substantial benefits, especially when optimizations are integrated.
00:25:05.790 In summary, COVID-19 C extensions remain a central challenge in optimizing Ruby performance while taking advantage of implemented innovations within JRuby + Truffle.