Performance Tuning

Summarized using AI

Building C Extensions in Ruby

André Medeiros • February 19, 2014 • Earth

The video titled Building C Extensions in Ruby, presented by André Medeiros at RubyConf AU 2014, explores the integration of C extensions to enhance Ruby's performance, particularly for demanding processing tasks. The talk begins with an analogy to aviation to emphasize the iterative process of development and the necessity of C extensions for speeding up Ruby in certain heavy computational contexts. The main points addressed throughout the video include:

  • Rationale for C Extensions: Ruby, while powerful, may not be suitable for high-performance tasks like parsing data or processing images. C extensions allow developers to leverage existing C libraries to enhance functionality and performance without abandoning Ruby.
  • Simplicity of Building Extensions: Medeiros explains that creating C extensions is accessible, encouraging developers with basic C knowledge to participate. A hands-on example, bundle gem, showcases initial setup—emphasizing that understanding a few C concepts is sufficient to start.
  • Memory Management and Method Definition: Key considerations for memory allocation and method creation are outlined, including the necessity to inform Ruby about memory usage. Examples of C method declarations are provided, illustrating how to define methods that function similarly to Ruby's methods (like say_hi).
  • Good Practices: A set of guidelines for building C extensions is introduced: define the API shape early, return 'self' to allow method chaining, implement bang methods for performance, and recognize the differences in conventions between Ruby and C.
  • Debugging and Advanced Topics: Medeiros discusses debugging using gdb and the advantages of concurrency in C extensions compared to Ruby's Global VM Lock (GVL). He addresses garbage collection strategies and shares insights from his experience transitioning from C to Ruby, particularly when working with OpenCV.

In conclusion, using C extensions in Ruby can significantly boost application performance while maintaining Ruby's simplicity and readability. The insights shared by Medeiros, alongside practical examples and a focus on good practices, empower developers to explore the functionalities of C to enhance their Ruby applications.

Building C Extensions in Ruby
André Medeiros • February 19, 2014 • Earth

RubyConf AU 2014: http://www.rubyconf.org.au

From time to time, when building Ruby apps, you realise there are no libraries available for what you need. Even worse, Ruby doesn't quite perform as quickly as we would expect in certain areas. There are, however, a lot of high performance, mature technologies built in C that can easily be ported to be used with Ruby. By doing this, we get to keep using our favourite language, opening it to a plethora of applications that were not possible before, and still keep things snappy.
In this talk, I will walk you through the ins and outs of building Spyglass, an OpenCV binding for Ruby. I will also talk in detail about some gotchas (memory management, lack of threading), good practices (C objects as first class citizens, how to properly test extensions), why mkmf needs to be retired and some great examples of extensions you probably already use and should be looking at.

RubyConf AU 2014

00:00:06.820 Good morning! As I was very well introduced, my name is André Medeiros. The conference people put my name as 'Medeira,' and that’s okay because a lot of my close friends can't even spell my name right when it's there. Anyway, today we're going to talk about building C extensions in Ruby. The first question you might ask is, 'Why?'
00:00:29.689 Whenever I try to approach this subject, I like to draw a parallel with aviation, which is a passion of mine. The reason we got into aviation was to get from point A to point B faster. However, just like in software development, the process of aviation wasn't quite right the first time; it’s a bit of an iterative process involving experimentation and learning until you arrive at the desired outcome. I just let the video finish. This is kind of like Ben's mail server earlier.
00:01:12.959 The point I want to make is that there’s a lot of iteration involved in building C extensions. In the end, you do eventually get there. One of the key points I want to emphasize is that Ruby is not a slow language. There are many people who argue otherwise, but it is not slow when we consider everything that we get for free. However, it’s not always optimal for heavy processing tasks. For instance, if you're dealing with significant amounts of statistical analysis, parsing, or processing images and video, Ruby may not be the first language you'd choose. This point illustrates why C extensions were introduced to Ruby; they allow us to speed up heavy processing without abandoning our favorite language.
00:02:48.370 The exciting aspect is that creating C extensions is not hard at all. In fact, during a workshop I conducted on Wednesday, the feedback I received was from a participant who felt almost cheated because it was incredibly simple to get started on a C extension. Another exciting point is that the Ruby C API feels a lot like Ruby itself. There's a common notion that C is an incredibly difficult language. While technically it can be a bit more challenging if you're coming from a solely Ruby background, the Ruby C API makes it more accessible. If you don't know C, that's fine too—you don't need to master the whole language; the first five or six chapters of any C book would provide you with enough knowledge to start building your first C extension.
00:03:50.400 When you are building an extension, there are a couple of ways to approach the problem depending on what you're building. Firstly, you can treat C types as first-class citizens, which is beneficial for gems that require back-and-forth communication. One example is Nokogiri; even if you don't have it in your gem file, there's a high likelihood that another gem you're using does. This approach means that there’s always a Ruby object in memory that holds a reference to a C object. Alternatively, there's a method where Ruby comes first, which works well for one-sided communication, such as MySQL, where you send queries to the database and get results back.
00:04:52.220 Getting started with building C extensions is simple. Most of you have likely done this already with the command 'bundle gem,' as it’s the standard way to start any gem these days. However, since building C extensions involves more than your standard gem, you will need to add a few extra Rake tasks. The rake compiler gem includes all the clean and compile tasks necessary to get your gem built and running. Creating an 'ext' folder where your C code lives is essential; this folder should be located at the root of your gem directory. Additionally, you'll need to create your extension configuration file.
00:05:44.560 This configuration file will generally resemble a Rakefile because Ruby's build system has a small library called 'mkmf,' which helps create a Makefile for you with minimal effort. You don't have to worry about maintaining or going crazy trying to configure it. This is a very basic configuration. If you look at other gems, you might find Makefiles with four to five hundred lines of code, depending on the features you’re trying to implement. This is essentially your first couple of steps.
00:06:34.410 The next step is to create boilerplate code. Every gem that has a C extension needs to be initialized by Ruby because we may need to define modules, classes, or methods; otherwise, it would be pretty useless. Begin by creating your first C file where you include the Ruby headers. This inclusion allows access to the full breadth of Ruby’s C API. You'll then define your class, which can be named anything, such as 'SpeedyGem,' and it can be a global. Finally, your 'init_speedy_gem' function will serve as the entry point for your gem, and this function should match the name of your gem. This is how defining a class in C looks.
00:07:19.830 It's not as simple as Ruby, but it certainly isn't rocket science either. With this setup, you will have a class that you can call in Ruby. Even though it might not do much yet, it’s a necessary step. The next important aspect to consider is memory management. Although not overly complex compared to other C programs, you do need to tell Ruby how to allocate memory and the function to free it. Ruby will then manage the lifecycle of your allocated memory and will inform you when it no longer needs it.
00:08:03.300 This is essential because memory leaks are typically the biggest challenges that arise. To work with your methods, you will also need to retrieve your C variables. Thankfully, this is also straightforward using standard Ruby C API methods. You can extract your C types from Ruby objects, allowing you to perform operations on them. After this, you can start defining your own methods.
00:08:48.130 Here is a very basic 'say_hi' method that accepts one parameter, which is the name. Notice that we always pass 'self' as the first parameter. In C, which is functional, not object-oriented, 'self' must always be the first parameter. This method is defined quite simply and, when called with an instance of 'SpeedyGem,' will return 'Hello, [name you passed].' Naturally, there are other checks you will need to incorporate, such as verifying types. This process is straightforward; you can use the check type command to enforce expectations on parameters.
00:09:37.380 Here are a few examples of types you could check, such as floats, fixnums, strings, and hashes. Essentially, any base Ruby type has a corresponding check type function. Enforcing your own classes, however, is a bit more involved, but it remains less complicated than you might think. You can also make the method flexible. For example, if you want to accept either a float or a string, there is a method to achieve that.
00:10:26.470 As for method parameters, the flexibility mirrors that of Ruby itself. You can, of course, define parameters with a fixed number, just like the previous example where one parameter, 'name,' must be provided. With optional parameters, the function signature adjusts slightly to account for them, verifying how many parameters were passed and their values. Ruby contains a helpful method for checking and extracting these parameters.
00:11:14.560 From there, you can also test if they are nil or evaluate their truthiness, assigning default values if desired. Finally, the typical behavior works as intended. Let’s go through key functionalities in Ruby’s C API, starting with strings. Creating a string in C usually requires significantly more lines than Ruby, but C offers straightforward capabilities for handling strings.
00:12:01.620 Creating arrays in C is just as simple, as Ruby allows great flexibility in array types. It's more akin to Ruby's nature since you can input mixed data types within arrays. Hashes in C are also easy to create; however, creating symbols may prove somewhat more complex than creating strings. It’s crucial to refrain from slipping into the habit of creating strings instead of symbols.
00:12:42.220 You can even work with blocks while constructing a C extension. For example, in a GUI front-end, you may want to set up a callback that gets executed when the user interacts with the interface, such as clicking a mouse or pressing a key. You can save that block as an instance variable or retain it somewhere for later execution. Essentially, anything accessible in Ruby is also usable within your C extensions.
00:13:16.130 Lastly, when it comes to creating symbols in C, just remember this straightforward function format. The Ruby C API is expansive and offers a lot more than what we've covered here in a brief introduction. For those interested, I’ve prepared a cheat sheet on GitHub that includes a variety of Ruby C API features.
00:14:25.760 Now, let’s turn to good practices. When building C extensions, keep several guidelines in mind. Firstly, define your API's shape before diving into the code. Consider the numerous C libraries available that you could potentially port or create. It's beneficial to have a clear understanding of how your API should function before you begin coding. Therefore, I recommend trying out some Ruby code that captures your vision, even if it’s non-functional.
00:15:18.389 The second recommendation is to return 'self' as much as possible. This approach allows for effortless chaining of operations, enhancing usability. The third guideline is to provide bang methods as frequently as you can. Given that working with large data sets is likely, in Ruby, calling a non-bang method typically means you're getting a copy of the object rather than the original modified object. By offering bang methods, you ensure that when called, the actual objects are altered, minimizing memory overhead.
00:16:01.320 Finally, it’s important to remember that Ruby doesn't always correlate directly with C libraries. This means that there will be moments when conventions differ, leading to potential confusion. For instance, when I ported OpenCV to Ruby, I discovered that an image is referred to as a 'matrix' in OpenCV. This terminology felt overly academic for my purposes, so I chose to simply call it an 'image.' The reason behind porting OpenCV into Ruby was to leave the complexities of C behind.
00:17:28.000 That's it for today. Are there any questions? Yes, in regards to Cliff, there was a significant push to utilize Foreign Function Interfaces (FFI) rather than C extensions. How do you choose between the two? Well, I tend to prefer C extensions because they offer me one less layer to worry about. It's more about ensuring direct control over the process and understanding what’s going on. If something goes wrong, you know it's your fault and not an FFI issue.
00:22:11.300 Could you go back to the GitHub link? I will upload these slides, but yes, here it is.
00:22:43.780 For debugging C extensions, I use gdb as you would with regular debuggers. It’s possible to attach gdb to a running Ruby process. A good practice is to set up a console Rake task that executes your gem with appropriate fixtures. If you attach gdb to the PID of the running Ruby process and cause an exception, you can use gdb to backtrace and identify the exact issue in your code.
00:23:09.200 In collaborating on two C extension projects, I found concurrency to be one of its advantages, allowing you to work outside of Ruby's GVL. Alongside that, I encountered challenges relating to garbage collection, which I’d like advice on.
00:23:48.870 In terms of garbage collection, you have two options—one is passing a method for Ruby’s mark routine. This lets Ruby know which objects will eventually be collected. In most cases, you won't need to do this unless you're sharing large memory blocks across various variables.
00:24:35.840 Regarding changes to Ruby's C API: you're right about the lack of changelogs. In over a decade of experience, I’ve witnessed minimal changes in the C API. Many of the C extensions I’ve created have compiled without issue from Ruby 1.9.3 to 2.1.0. Although there could be variances in libraries or patches, most of the C API remains stable.
00:25:36.260 About OpenCV, running it within Ruby doesn't impact performance much. The overhead primarily comes from creating Ruby objects around OpenCV. Other instructions execute directly to the metal. My motivation to bridge Ruby with OpenCV stemmed from wanting the beauty of Ruby after spending three months in C++, and I wrote the bridge over a weekend.
Explore all talks recorded at RubyConf AU 2014
+17