Andy Pliszka

Introduction to CRuby Source Code

Introduction to CRuby Source Code

by Andy Pliszka

In the talk titled 'Introduction to CRuby Source Code', Andy Pliszka provides an insightful overview of how CRuby's source code can greatly enhance a Ruby developer's understanding and efficiency. The key points covered in the presentation include:

  • Motivation to Explore CRuby: Developers often plateau in their knowledge after a few years of using Ruby. Understanding CRuby opens new avenues to grasp Ruby's object model, metaprogramming, and performance optimization by using C for critical application parts.

  • Building Ruby from Source: Andy walks through the steps to build Ruby from source, emphasizing the importance of using a stable release and performing proper configurations. He explains how to set up OpenSSL, run ./configure, and execute make and make install to compile Ruby successfully.

  • Working with C Extensions: Combining Ruby with C allows leveraging high-level productivity with high-level performance. Libraries like Nokogiri exemplify how C extensions can enhance Ruby applications by handling computationally intensive tasks efficiently.

  • Debugging Techniques: The importance of debugging when hacking on CRuby is highlighted. Utilizing tools like LLDB and GDB is recommended to explore Ruby's internals effectively.

  • Understanding the CRuby Codebase: The structure of the CRuby codebase is introduced, providing insights on how to navigate and comprehend various classes like Fixnum and Array. Andy notes the patterns within the code that are critical for modifying or adding features to Ruby.

  • C Implementation for Performance: The advantages of implementing performance-critical algorithms directly in C are discussed, showing how this can lead to significant speed improvements compared to Ruby solutions, particularly demonstrated through comparisons of Fibonacci implementations.

  • Final Thoughts: The talk concludes with the idea that understanding and working with CRuby can reignite a developer's passion for Ruby, allowing them to explore modifications and enhance performance significantly by rewriting selected sections in C. Andy encourages experimentation with CRuby and mentions that Pivotal is hiring, inviting the audience to engage further.

This presentation serves as an excellent starting point for Ruby developers looking to delve deeper into the workings of CRuby and how to harness its capabilities effectively.

00:00:25.560 Hello everyone, my name is Andy, and I work at P Labs in New York City. Today, I will give you a quick introduction to CRuby source code. First, I'm going to tell you why you should consider looking at CRuby. Then, I will give you a quick overview of the Ruby source code, and at the very end, I'll show you how you can hack it and play with it.
00:00:45.960 So, what is the motivation? I think we have all reached a certain plateau if we have been using Ruby for a couple of years, where we stop learning anything new. If you want to truly understand Ruby's object model, metaprogramming, or even garbage collection, you should look at the source code of CRuby. Many of your questions will be answered, and you will experience an epiphany moment where you actually see what the code does regarding Singleton methods and ghost classes—what's happening behind the scenes. You should also consider CRuby if you're working on an application written in Ruby that deals with data processing that is a bit slow. We all know that Ruby can be slow for large data processing. You could write 99% of your application in Ruby and just rewrite the critical sections—the heavy lifting—in C, gaining code that is 10 to 50 times faster. I will show you how to accomplish this.
00:01:24.000 It's also beneficial to look at CRuby if you want to write C extensions. For example, libraries like Nokogiri and Pygments are C extensions that leverage Linux libraries. By combining Ruby and C, you essentially get the best of both worlds: Ruby's productivity with all the testing frameworks, allowing you to test your code efficiently, while writing performance-critical algorithms in C to take advantage of its efficiency and speed. At a high level, you have high-level modeling coordination of algorithms, analysis, and scripting; on the low level, you have all the algorithm implementations, manipulation of in-memory data structures, and integration with standard mathematical libraries.
00:02:40.000 Now, let's start with Level One, which is building Ruby from scratch. The first thing you need to do is check out Ruby from GitHub. For safety, I recommend checking out the stable release tag—let's say 2.0.0—so that the results are repeatable. If you're on a Mac, install OpenSSL, as you will need it for RubyGems. Run the `./configure` command to configure Ruby's source code for your machine. The last step is running the `configure` utility where you specify the prefix and set it to `my_ruby` in your home folder. This location is where your Ruby from scratch will be installed. You can specify optimization flags with `-O0` to disable optimizations for easier debugging. Make sure to add debug flags to ensure all binaries include C-level debug information.
00:04:06.360 Once you've configured everything, you can build it by running `make`, which compiles all the sources and links them, creating the Ruby binary in the build folder. After building the binaries, it's good practice to run all the unit tests using `make check`. CRuby comes with a comprehensive suite of unit tests—about 13,000 of them. I've had success with this stable release tag, and all tests should pass. If you find that most tests are failing, it’s a good idea to double-check the steps you followed. Once you confirm that Ruby was built correctly, you can install it by running `make install`, which will copy the binaries to your `my_ruby` folder, set up a gem directory, and install a couple of default gems.
00:05:50.720 Once you have installed Ruby from scratch in your specified folder, you need to inform your shell about the new installation. First, set your PATH to include the `bin` directory under your `my_ruby` folder as the first element. You should also configure the GEM_HOME and GEM_PATH to point to your `my_ruby` folder. At this point, your Ruby installation should be correctly set up, and it shouldn't interfere with any existing Ruby Version Managers like RVM or rbenv. You can verify this setup by running `which ruby` to see that it's pointing to your custom Ruby installation. You can also run IRB to make sure it runs from your new Ruby.
00:06:59.680 The final verification step is to check that your gem environment is correct. If you run `gem env`, you should see that all folder paths are pointing to your `my_ruby` installation. By listing your gems, you should see a couple of default gems installed. A good test to make sure everything is working correctly is to install Rails, as it has many dependencies. Once you've successfully installed Rails, you should be able to create a new Rails application and run it. This accomplishment proves that you now have a fully functional Ruby installation compiled from source. What’s impressive is that you can modify those C files at will and experiment with CRuby.
00:09:30.560 Now, let's talk about debugging. When you make changes to CRuby or hack it extensively, it's a good idea to have a debugger set up. You can utilize SE-level debuggers like LLDB, GDB, or even Xcode. There are also plugins available for editors such as Vim and Emacs. For example, on Mac, you would debug a simple `upcase` Ruby script using LLDB by loading the Ruby binary with your script and setting a breakpoint at the beginning of the `upcase` method. When you run it, you will be able to see the internals of Ruby in action. This approach is particularly useful for debugging issues that arise in production.
00:10:05.440 Now that we have covered debugging, let's move on to an overview of the CRuby codebase structure. The root folder contains most of the relevant files, making it easy to navigate. For instance, the Ruby array class is defined in `array.c` while major classes like `string` are in the root folder as well. You can begin exploring and reverse engineering how the code works. The code is generally well-structured and easy to understand if you already know a bit about Ruby.
00:11:54.720 Let’s take a look at something simple, such as `fixnum`. This is defined in the `numeric.c` file. When you look at the structure of this file, notice that it starts with a couple of include statements, followed by C macros and the majority consisting of method definitions that correspond to Ruby methods. Each file will always have an `init` method at the bottom, which initializes relevant classes. For instance, we can locate the definition of the `Fixnum` class as it's defined with `rb_define_class`. Understanding these file structures is crucial for working with the CRuby source.
00:12:53.200 Moving on to the array class, the structure is quite similar. At the top of the file, you will also find a couple of C macros, followed by method definitions in C, and then the initialization method at the bottom. We see that the `Array` class inherits from `Object`. You will notice similar patterns throughout; understanding this can significantly help with grasping how Ruby works at its core.
00:14:16.760 For instance, if you wanted to add new methods to built-in classes like the array class, it’s a two-step process. First, you utilize the helper method `rb_define_method`, specifying the target class, method name, and a pointer to the C function that will implement the method. You will encounter some intricacies because the C function implementing a Ruby method will always take a specific argument structure, leading to the need for some meticulous pointers and conversions. Working with CRuby means being aware of how data types convert between Ruby and C.
00:15:24.080 An excellent performance benefit of working with C is evident when implementing high-complexity algorithms directly. For example, a Fibonacci method might be implemented in pure C instead of Ruby. This approach allows you to capitalize on the performance strengths of C by performing heavy computations without being bogged down by Ruby's slower processing times. When you implement such C functionality, the process involves understanding how memory is handled directly in C as opposed to Ruby’s garbage collection model.
00:16:48.560 Similarly, speed can be critically improved by leveraging C for high-use algorithms, like large number sorting. Ruby allows for flexible use-case scenarios but can slow down execution times with arrays holding varied object types. Thus, in situations involving large datasets, implementing a C-based array structure can yield evident performance benefits. The creation of a dedicated C structure allows efficient storage and access, improving the overall capability of Ruby while executing these algorithms.
00:18:30.240 As shown before, transitioning from Ruby to C code leverages the significant time performance gains inherent in C’s compiled nature. Comparisons of Ruby implementations for Fibonacci calculations show skewed timings; while Ruby might take considerably longer, the C implementation could bring results back in seconds. This extreme disparity showcases the necessity to strategize about performance-critical sections of your application, adopting C implementations where appropriate to maximize efficiency.
00:22:34.320 In conclusion, working with CRuby is not as daunting as it seems. Within half an hour, you can have a Ruby installation you can modify and experiment with. Diving into CRuby provides an avenue for deeper understanding of Ruby itself, especially after years of working primarily in Ruby. You can unearth new features, working with cross-language integrations enabling complex operations and modifications. Importantly, if you ever find that Ruby is the bottleneck for performance in your applications, remember rewriting critical code sections in C can be a very effective fix, giving unprecedented performance gains.
00:24:51.600 Lastly, I would like to mention that Pivotal is hiring, and we have offices in various major cities. If you have any questions, please feel free to ask. Thank you very much!