Ruby Internals

Ruby-red onions: Peeling Back Ruby's Layers in C Extensions

Ruby-red onions: Peeling Back Ruby's Layers in C Extensions

by Emily Stolfo

In her talk "Ruby-red onions: Peeling Back Ruby's Layers in C Extensions" at RubyConf 2014, Emily Stolfo explores the intricacies of writing C extensions for Ruby. She aims to demystify the process and enhance understanding of Ruby's underlying architecture, specifically the C Ruby interpreter (MRI).

Key Points:

- Introduction and Background: Emily introduces herself as a developer for MongoDB's Ruby driver and an adjunct professor teaching Ruby on Rails. She discusses her personal journey of writing a C extension, particularly for Kerberos authentication.

  • The Challenge of C Extensions: Emily emphasizes that while writing C extensions can seem daunting, it is a valuable skill. Many developers, including herself, initially shy away from this task.

  • Learning Experience: Emily recounts her challenge with implementing GSSAPI Kerberos support for MongoDB, highlighting that the task was postponed due to its complexity and low priority in the Ruby community. Despite alternatives like the gssapi gem, the intricacies and issues led her to opt for a direct C extension.

  • Understanding Ruby and C Interfacing: She discusses the unique structure of Ruby as a high-level language operating over a foundation of C code. Emily compares Ruby to an onion, with multiple layers of abstraction that developers need to peel back to understand the interaction between Ruby and C.

  • Key Resources for Writing C Extensions: She points out several resources for learning, including examining existing extensions like Nokogiri, the Ruby source code, and blogs. The best documentation resides in the ext directory of the Ruby source code.

  • Mechanics of Ruby and C Integration: Emily explains the need for specific macros and methods for converting between Ruby objects and C data types. She details how to define classes in C, manage data, and enable transitions from Ruby objects to C structures and vice versa.

  • Packaging and Distributing C Extensions: Steps to package a C extension within a Ruby gem include creating an extconf.rb file, maintaining a gemspec, and using the appropriate C code organization. Emily shares her experience about issues encountered after releasing her initial gem version due to missing dependencies.

  • Conclusions: By the end of her talk, Emily aims to leave the audience feeling empowered to tackle their own C extensions by illustrating that the process enriches one’s understanding of both Ruby and C, revealing deeper insights into Ruby’s architecture.

In summary, Emily's presentation is a detailed guide on writing C extensions for Ruby, providing essential knowledge, valuable insights, and encouraging developers to engage with this powerful capability.

00:00:18.439 Hi everyone! My name is Emily Stolfo, and I work for MongoDB on the Ruby driver for the database. It's the MongoDB Ruby driver. If you've ever used it, you're probably familiar with it. I will soon be working on Mongoid as well, which you may know if you are a Rails developer using MongoDB. I am also an adjunct faculty member at Columbia University where I teach Ruby on Rails. I recently moved to Berlin, so I haven't been teaching this semester, but I probably will come back in the spring and teach some more.
00:01:00.800 Today, I'm going to talk about peeling back Ruby's layers and writing C extensions. Who here has written a C extension? Raise your hand. And who here considers themselves an expert on writing a C extension? Because I'm certainly not. If you’re an expert, this talk might not be for you, but if you're curious about writing a C extension, that's why you're here, right?
00:01:23.560 Over the past year, I had the experience of writing a C extension to provide Kerberos authentication support for our driver. I learned a lot from this experience, both about Ruby itself and the C interpreter (MRI). I gathered insights on Ruby's gem ecosystem—what to do, what not to do—and I will share today what you need to know if you want to write a C extension for Ruby and package it with your gem.
00:01:44.240 I believe that Ruby is unique as a language because it has different implementations. The main ones are the C implementation (MRI) and a Java implementation (JRuby). This variety allows you to write extensions that can work with external libraries. For instance, in the case of Kerberos, which is a specific type of authentication protocol, there are external libraries that perform the complex mathematical algorithms, but they aren’t implemented in Ruby—they're implemented in C or Java. As a Ruby developer, you need to write some sort of glue that facilitates communication between your Ruby code and these external libraries.
00:02:38.959 In January 2013, a ticket was created for the Ruby team at MongoDB, which consisted of three of us at the time. The ticket requested the implementation of GSSAPI Kerberos authentication support. Initially, it was assigned to one of my colleagues, but it became a hot potato that we passed around for about a year without any of us tackling it, as nobody wanted to deal with the ticket.
00:03:00.280 Kerberos authentication isn't popular among Rubyists; it’s commonly found in large enterprises as part of stringent security policies that may require all database authentication to occur using Kerberos. Hence, the task didn’t seem to be a high priority for the Ruby driver. Nevertheless, we needed to comply with standardization across all drivers.
00:03:22.159 After about a year trying to find ways to avoid writing this C extension, we researched alternative solutions. There was a gem called gssapi, which wasn't a C extension. It used a technique called Ruby FFI that allows you to write Ruby code that directly interacts with C code without requiring the traditional C extension. Essentially, this gem was handling Kerberos authentication in Ruby while calling into the external C library, but when I tried using it, I encountered segmentation faults.
00:03:44.560 I consulted our C developers, and after several discussions involving assembly code, it became clear that debugging this flaw would be a waste of time. It would be more efficient if I wrote a C extension myself. I investigated Ruby FFI further, but that approach also led to segmentation faults, affirming that it would be better for me to just write the C extension directly.
00:04:04.239 I ultimately decided to proceed with writing the extension. I learned a lot in the process and completed it over the course of a month, from August into September. By that time, the PHP team had also implemented Kerberos in C, which allowed me to learn from their code implementation using the external library. I released my C extension right before the Barcelona Ruby Conference on September 9th. However, on September 10th, I ended up having to yank it after discovering an issue.
00:04:45.760 In this talk, I’ll explore what it means to write a C extension, why you shouldn't be afraid of it, and what benefits you can gain from writing one. To conceptualize Ruby as an onion, Ruby, particularly MRI, is a high-level language, very elegant and expressive, but under that elegance lies a substantial amount of C code.
00:05:08.560 Think of it like an onion: starting from a core understanding of Ruby objects and C, you'll provide layers of abstractions over the C code until you reach a gem that includes your written extension at its core. I will discuss the Ruby knowledge you gain from writing an extension, how to effectively write the extension, work with Ruby objects, manage the transitioning between C and Ruby data, structure the gem, and the limitations inherent in Ruby gems.
00:05:30.320 So let's start with the knowledge gained from writing a C extension. What resources can you use to learn? Besides looking at source code, numerous extensions exist, like Nokogiri, which is a well-known gem featuring a C extension. You can also refer to documentation in the Ruby source code.
00:05:41.480 The most authoritative documentation is the <ext> directory in the Ruby source code, found in the readme and appendix, which clearly outlines what you need to accomplish as a Ruby developer working with C code. Beyond that, extensive blog material has emerged over the past few years regarding writing Ruby extensions. While these tutorials can be useful, I would still recommend focusing on the official Ruby readme.
00:06:06.160 A notable insight is that when you write a C extension, you are essentially adding new features to Ruby. Ruby is merely a means to write code that will be implemented in another language (in this case, C). By writing C code, you can create abstractions that enable new functionalities in Ruby.
00:06:41.400 Interacting between Ruby and C involves understanding that Ruby objects have types while Ruby variables do not. Conversely, C's data has types while C variables are explicitly typed. To transition between these two systems, you need to employ specific mechanisms to define fundamental constructs and data structures.
00:07:00.679 You'll interact with Ruby objects and typically be handed a structure without a clear type. Each Ruby object has an identifier or flag that tells you how to work with it, and it indicates how to convert the object into compatible C data. Understanding and manipulating these identifiers is essential as they relate to core Ruby objects like nil, arrays, strings, and so on.
00:07:21.360 So, when you encounter a Ruby object in C, you check its type, grab the corresponding integer constant, and then convert the value into C data types using macros such as RSTRING, which translates Ruby strings into C string types. Each Ruby object has a corresponding data type that allows you to interact with them in expected manners.
00:07:51.439 To convert Ruby objects into C data, you use macros that allow you to create a bridge communicating between those languages appropriately. Using functions ensures that you respect the specifications of the Ruby data types, avoiding potential issues caused by directly manipulating those structures.
00:08:11.680 Next, let's discuss transitioning from C to Ruby. For example, if you have a C data structure and want to convert it into a Ruby object, you need to cast the C structure to a value and use specific functions or macros that will transform your C data into the appropriate Ruby format. This approach is particularly valuable when wrapping external C data into custom Ruby objects.
00:08:35.240 The first step involves defining a class in your C code that will hold references to Ruby objects or manage C data. You can successfully wrap a C structure using functions like Data_Wrap_Struct, which specifies the object class, manages the garbage collection aspects, and maintains pointers or references to the C data being wrapped.
00:09:02.560 Once a Ruby object is instantiated, you can define how Ruby can interact with that C data by defining methods within your class definition. The C code effectively acts akin to an interface that allows your Ruby code to utilize the underlying C functionality in an understandable format.
00:09:29.680 Now that we understand the basics of writing a C extension, let’s discuss how to include this extension in your Ruby gem. To include a C extension there are four essential steps I believe you need to execute, as the online documentation may lack clarity in this area.
00:09:54.080 First, you need to create a file called extconf.rb. This file essentially creates a script that runs to check your environment or platform and whether the requisite external libraries are present on the system. This script then generates a Makefile used to build and install the extension.
00:10:17.760 The extconf.rb file script can use the mkmf library to find header files on your system, enabling checks to confirm the inclusion of a specific external library. If the library is found, it creates a Makefile; otherwise, it will abort the installation to prevent further complications.
00:10:43.320 The second task involves possibly utilizing a Rake compiler to facilitate compiling tests or further procedures to use or manipulate your C extension throughout your Ruby testing environment. You may find success in finding documentation on this tool online; it can simplify building extensions.
00:11:10.280 Following that, the gemspec file is important; which is simply a piece of code that describes your gem, with all the relevant metadata, including author details, dependencies, platforms, and more. Crucially, it needs to specify that you’re including a C extension so that Ruby Gems looks for and installs it properly.
00:11:34.720 The gemspec will need to define the relevant files you want it to look for, whether they be C files or header files. It’s also essential to indicate the files necessary for the installation process to run correctly.
00:11:57.440 Finally, you simply need the actual C code available in your gem directory, following conventional placements found in various resources online. Along the way, these steps culminate in defining classes and methods within your C code, rewriting in such a manner that they can be easily called in Ruby.
00:12:09.240 When you define a class in your C extension and want to expose it under Ruby, you’ll need to use specific patterns. Begin with defining the structure that can hold your class definition for the specific C class. This organization will keep your definitions and behaviors orderly, ensuring clarity in interfacing between Ruby and C.
00:12:33.800 The core logic behind defining the methods in C relies upon using macros to associate Ruby method names with C-function implementations according to their argument specifications. This ensures an intuitive connection so that when a method is called in Ruby, it binds to the corresponding C function accurately.
00:12:52.400 So, when defining a method associated with a class, your specific behavior for the underlying class is crucial. This strategy governs how the Ruby on Rails functionality will utilize the C code you’ve written. Each of these elements effectively links the bridge between the two languages.
00:13:07.680 As I previously discussed, the four necessary steps when packaging a C extension with your gem are: creating the extconf.rb, possibly employing a rake compiler, and ensuring that your gemspec tells Ruby Gems you have an extension that needs installation—all while making sure you have the proper C code files available.
00:13:23.920 Releasing a gem inherently involves distributing your code and elements in a clean manner. Revisit the story of how I released this gem: I initially released the extension on September 9th. However, just a day later, I rushed to the airport for Barcelona Ruby conference and unexpectedly, while checking my email due to a delay, I was hit with installation issues regarding dependencies.
00:13:48.240 A notable issue I encountered was the inability of my gem to install on systems that didn’t have the necessary library. The error message precisely indicated that the installation of my gem was aborting due to the absence of that required library, leading me to quickly realize my oversight regarding extension dependencies.
00:14:12.240 Once I understood the implications, I sought a solution. To resolve this, I created an independent gem specifically for the C extension. This provided the flexibility to create appropriate dependencies in my primary Ruby driver. Users would still be directed towards my C extension but would not encounter installation issues for the main Ruby driver if certain dependencies were unmet.
00:14:57.280 Through this experience, I’ve learned how to appropriately approach C extensions and GEM dependencies. With this newfound knowledge, I hope you find it easier to tackle your own extensions as you embark on your development journey.
00:15:19.520 Code organization leads to understanding the Ruby API and using it within your C specifications. Overall, you will aim to structure it in such a way as to maximize clarity and facilitate ease of use. I sincerely hope you all now have a better grasp of the world of Ruby extensions.
00:15:44.000 It’s critical to recognize that through this process, you’ll enhance your comprehension of Ruby by engaging with its core mechanisms. I hope you find the prospects of writing C extensions less daunting with this guidance.
00:15:58.000 Thank you so much for your time!