Talks

Building Native Extensions. This Could Take A While...

"Native gems" contain pre-compiled libraries for a specific machine architecture, removing the need to compile the C extension or to install other system dependencies. This leads to a much faster and more reliable installation experience for programmers.

This talk will provide a deep look at the techniques and toolchain used to ship native versions of Nokogiri and other rubygems with C extensions. Gem maintainers will learn how to build native versions of their own gems, and developers will learn how to use and deploy pre-compiled packages.

RubyKaigi Takeout 2021: https://rubykaigi.org/2021-takeout/presentations/flavorjones.html

RubyKaigi Takeout 2021

00:00:01.120 Hello, my name is Mike and this presentation is about native C extensions in RubyGems.
00:00:04.560 I'll explain what a native extension is, but then I'll focus on a particular challenge for C extensions: how to safely and reliably use third-party C libraries like libyaml or libexml2.
00:00:07.919 I want to show you simple working examples of three different approaches to using third-party libraries.
00:00:10.000 One of those strategies, pre-compiled extensions, is extremely powerful but tricky to implement effectively.
00:00:14.480 I will show you how to do it, and then we'll discuss the risks and benefits. After watching this talk, you'll know better when and how to write and package your own C extensions, and you'll understand the difficult trade-offs being made today by Ruby C extension maintainers.
00:00:22.240 What is a C extension? I like to think that we create extensions every day by simply writing Ruby code. For instance, should I extend Ruby to have a class named Mike? Great! Here it is! I've extended the virtual machine with a new class. That's not so hard. A C extension is the same concept, except we write our code in C rather than Ruby.
00:00:51.199 Here is that same extension written in C. That’s a little more complicated, but not too bad. However, the problem is that MRI can't read that C code directly. We have to first get it into a form that can be required, like any other Ruby file, and this is where it can get challenging.
00:01:01.760 We first need to compile that C code into an object file and then go through a process known as linking to combine it with Ruby's C library. Only then is it in the correct form to be required, which we call a shared object.
00:01:48.079 Let's look at a simple gem I've created named 'isolated', named because it is entirely self-contained and doesn't call any third-party libraries. You might choose to write a C extension like this as a performance optimization if you have some CPU-intensive work to do. The BCrypt gem is a good example of this kind of C extension. It's iterating over cryptographic math, which is simply faster in C than it would be in Ruby.
00:02:05.680 The isolated gem contains a little bit of C code to perform CPU-intensive work, just like our 'Mike is great' example, but now with an additional singleton method defined. Because it's a small extension, it compiles and installs rather quickly, and when we peek at the directory where the gem has been installed, we see a file named 'isolated.so'. That's the C extension, the final compiled version of our C code, and it gets required by 'isolated.rb' just like any other Ruby file.
00:02:40.160 Now, there’s quite a bit happening to make even a simple gem like isolated install smoothly. Let's walk through what happens during gem install step by step, from the source code all the way to the shared library.
00:03:07.520 Here I've got the isolated source code checked out locally. The gemspec tells us where in the project we can find the extension directory. This extension directory contains all the C code as well as the extension configuration file, which is always named 'extconf.rb'. Now, 'extconf.rb' is a Ruby script, and it has one job, which is to write a precise recipe for compiling and linking the extension.
00:03:40.079 In the case of isolated, the 'extconf.rb' is as simple as it gets. 'mkmf' is short for 'make makefile', a Ruby module that's shipped with the standard library. It defines 'create_makefile' as well as a handful of other methods for advanced configuration; some of which we'll see in later examples. The 'create_makefile' method is what actually writes the recipe for compiling and linking. Because we haven't configured anything, it'll adopt a set of reasonable defaults.
00:04:28.319 The first section of our output recipe will be compilation, and by default, 'create_makefile' will compile any C file in the current directory. So, our recipe will start out looking something like this: it reads, 'please compile isolated.c into the object file isolated.o' and ensures to include Ruby's header files.
00:04:52.280 If we had more C files, each would be compiled individually into a similarly named object file with a dot 'o' at the end. The recipe will also contain the second section, which is to link the object files with all of Ruby's C libraries and create the shared object. This command reads 'please create a shared object by combining our object files with Ruby's library and the system's C and math libraries.' This recipe is written out by 'create_makefile' in the form of, unsurprisingly, a makefile.
00:05:22.119 Let's run this script and take a look at the results. Make is the great grandparent of rake and is commonly used to manage recipes like this with many steps. We're not going to look very hard at the makefile today, but it's good to know that 'create_makefile' has done a lot of work for us. It's taken the generic high-level description in the extconf and turned it into a rather long but precise recipe.
00:05:55.680 We can also see that this makefile is specific to both my CPU architecture and operating system, but also the minor version of Ruby that I'm running. The extconf is generic, but this makefile will only work on a Linux x86_64 system running Ruby 3.0.
00:06:21.440 Now finally, at the end of our journey, we'll run make and we'll see that it has compiled and linked our code, and we have the final shared object we expected: 'isolated.so'. As I mentioned before, a shared object can be required by Ruby as if it were a Ruby file.
00:06:58.320 Once it's required, we can verify everything worked by calling the method we defined in our C extension. Congratulations! We've extended Ruby by writing some C code. Now go ahead and run 'gem install rce isolated' and verify for yourself that everything works perfectly.
00:07:01.600 So let's summarize what just happened during the gem install. Step one: the gem utility downloads the gem file and extracts it. Step two: the gemspec declares where the extconf file is. Step three: the extconf file generates a makefile. Step four: the make utility compiles our code. Step five: the make utility links our code and generates that shared object which is then just required by the gem like any other file.
00:07:37.679 I mentioned earlier that one goal of a C extension might be to optimize performance; this is the case for BCrypt. However, there's another more common reason to write a C extension: to interface with a third-party library.
00:08:06.560 The reason I'm giving this presentation is that I've spent a lot of time over the past few years working with third-party libraries and calling them from Ruby. I've been the primary maintainer of Nokogiri for over a decade; Aaron Patterson once called me 'Nokogiri Mother' in a conference talk, and I kinda liked it.
00:08:32.480 If you're not familiar, Nokogiri is a Ruby library for parsing XML or HTML documents. The formal specifications for XML and HTML are extensive. It would have been a monumental task to implement all of that from scratch in Ruby. Furthermore, I'm pretty lazy, so instead, Nokogiri relies on existing open-source libraries and interacts with them through a C extension.
00:08:49.760 Many Ruby gems use C extensions solely to integrate with a third-party library. Some examples include Psych, SQLite3, RMagick, and gRPC. You can probably think of one or two others.
00:09:07.280 Most of these gems have a thin wrapper of Ruby and C that works together to make the library's features available as idiomatic Ruby.
00:09:28.719 You might be wondering if a C extension is the only way to call third-party libraries... Heck no! The other popular method is Ruby's Foreign Function Interface, or FFI. FFI allows Ruby code to call libraries directly without the need to use a C extension as an intermediate layer. FFI is a powerful tool, but choosing when to use it... that's a topic for another talk.
00:09:54.560 Things can go wrong when installing a C extension, even relatively simple ones. If you want proof, here's a search for old BCrypt installation issues. Now, this isn't BCrypt's fault. I'm not trying to blame them. So let's talk a minute about what can go wrong on the user side while they're building a C extension.
00:10:28.960 The first thing users need is a compiler toolchain. That includes the compiler, the linker, a make tool of some kind and appropriate system header files and libraries. This is such a common problem that the Nokogiri documentation site has a separate section just on getting those basics installed.
00:10:49.919 Most Linux distros will offer a meta package like 'build essentials' that will take care of this for you. On Windows, the Ruby community is very lucky to have Lars Kanis and, before him, Luis Lavina maintaining the Ruby Installer Devkit, which ships a version of Ruby along with its entire integrated toolchain. This is great stuff! On macOS, you've got the Xcode command-line tools, which come with their own installation problems.
00:11:12.080 Users also need a working Ruby development environment. This usually isn't a problem for folks who've already installed the C toolchain, but it can be a hurdle for newbies, particularly when error messages are vague.
00:11:29.200 Finally, your users will need a consistent environment. Once a gem is built, your C extension can only run on that architecture, that minor version of Ruby, with those system library versions. You can't copy your gems from your Mac development machine to your Linux production server, and you can't upgrade Ruby and still use the gems you installed previously.
00:11:39.440 Okay, enough talk; let's do this! For the next few minutes, I'm going to show you a series of gems that each illustrate a different way to integrate with a third-party library. These gems will escalate in complexity, but each will build on the prior one, so we'll get there in small steps.
00:12:05.600 All of the code I'm presenting here is available in a GitHub project I started named 'ruby c extensions explained'. Here's a short link for it, so if you want to follow along, please clone the repo and feel free to run this on your own system. All these gems actually work.
00:12:24.720 Some gems, like SQLite3 and RMagick, require that the user install third-party libraries on their system ahead of time. We'll call this the system strategy. I've created a gem named 'rce_system' to demonstrate how to find and use a library that’s already installed. The library we'll use for all of these examples is libyaml.
00:12:46.239 Looking at 'system.c', first, we're calling the 'yaml_get_version' function and returning the result in a string. That’s pretty simple, but it’s enough to demonstrate that the integration works. Looking at the extconf, it's very similar to the isolated gem, except that it has a new stanza: 'find_header' and 'find_library'. These are helper methods that will search your system's standard directories for files, and if it finds them, it will ensure that the compile step pulls in the header file and the link step pulls in the library.
00:13:16.239 We specifically ask 'create_makefile' to look for 'yaml.h' because that's what our C code includes, and then we also ask it to look for a library named 'libyaml' and check that it has a method 'yaml_get_version' because that's what we're calling.
00:13:48.160 Now, we don't need to do this for every C function that we call; we just need to provide one method from the library so that 'find_library' can successfully test that linking will succeed.
00:14:01.120 So, if all that went well, the recipe will look something like this: you'll see in the compile line there's an additional directory for header files and in the link line there's a directory for a library with '-lyaml', which indicates that it will pull in libyaml.
00:14:35.680 On most systems, everything is installed in standard directories, and this gets simplified a bit to this form. Let's run through this process manually before we try to install the gem. Oh snap, I don't have libyaml installed! Let me go Google for a few minutes and figure out how to get past this, and we'll come back and try again.
00:15:30.560 Okay, we're back. I've got it installed! Now I'll run the extconf generator 'makefile'. I can see that extconf is there! I'll run 'make' to generate 'system.so'. I can see that it actually has a reference to 'libyaml.so' in there, which means that I can probably require it and run our method to call our C function, demonstrating that all this works. That's fantastic!
00:15:55.680 Go ahead and install 'rce_system' yourself! You can see that it takes a little bit longer than the isolated gem did but not too bad. So great, everything worked! What tends to break about the system strategy?
00:16:11.679 Well, installing third-party libraries can be frustrating for users, as we just saw. Different systems may install libraries into non-standard directories. You may get a newer or older version of the library than you were expecting, and in the case of SQLite, there may be features compiled in or not compiled that you need.
00:16:30.319 So, maybe you as a gem maintainer are okay with these trade-offs. In that case, you need to improve your documentation to ensure installation works. You need to handle edge cases in your extconf for different installation directories, and you need to add complexity to your code to handle features being turned on or off. This can get really challenging in the real world across many platforms. Take a look at the RMagick or the SQLite3 extconf files if you want to see how ugly this can get.
00:19:07.680 If you're not okay with these trade-offs and you decide you want a more reliable installation experience, or you want to guarantee a specific version of a library, have I got a strategy for you!
00:19:31.200 Some gems, like Nokogiri and Psych, package the third-party library's C files. This means the gem is redistributing the library, so be careful with licensing implications if you do this. Packaging third-party libraries is a bit more work for gem maintainers to set up but could simplify your codebase and make installation more reliable.
00:19:54.560 The easiest way to package up external libraries is simply to drop the source files into your extensions directory. I've got another gem named 'package_source' ready to show you where the C code is the same as our system gem. The only difference is where we're getting libyaml. Looking at the extensions directory here, you can see a new subdirectory containing all of the C and header files from libyaml, along with the license. Make sure you understand the legal implications of redistributing the code.
00:20:39.680 The new code in our extconf does the following: the vpath tells 'makefile' to look in the yaml subdirectory. We add all the source files explicitly and then tell 'makefile' where it can find libyaml's header files.
00:20:57.160 Looking at our pseudo-recipe now, the compile phase will do something like this: we have additional '-I' arguments for the yaml include, and each file is listed individually. The link phase will be identical to the system gem, except with more object files.
00:21:24.239 Running through this manually, we see everything works as expected. That was easy! Go gem install 'rce_package_source' and see that it takes a few seconds longer than system but does work.
00:21:46.359 It takes eight seconds, and system took something like three seconds. So this strategy works pretty well for simple cases. However, looking at the recipe, we can see that there are some limitations because the libyaml code is being treated as if it were part of our Ruby C extension. That’s limiting because the same compilation step must be shared across the extension and the third-party library, and we need to be careful about filename collisions between the two file sets as well.
00:22:33.679 Many third-party libraries have their own build process based on tools like Autoconf or CMake. For those libraries, it might be really challenging to just copy the C files and let 'make makefile' compile them. In those situations, it is possible to package the whole upstream tarball and use that at installation time, so once again, I've got a working example named 'package_tarball'.
00:23:12.560 Looking at the project directory, we can see a new directory named 'ports' that contains the upstream tarball for libyaml. An important tool I want to introduce here is 'mini_portile,' an API to take a tarball of source code and compile it into a library we can link against. This was created by Luis Lavina many years ago.
00:23:57.920 So, the new block in the extconf is this 'mini_portile' block. It's a bit busy to look at, but let's break it down. First, we download the libyaml tarball, verify the checksum, and then extract the files from the tarball. Next, we configure libyaml's build system, compile it, and link the object files together. Finally, we configure 'makefile' to find the headers and the library.
00:24:30.239 This last step is one of the advantages of using a standard Autoconf distribution; you can usually leverage pkg-config to set all your compiler and linker flags, which makes your extconf pretty clean. When we run extconf, we'll see that all this is happening to build the libyaml before the makefile is created, so that the compile and link steps will know how to find what we just built.
00:25:05.760 This is different from the package source, where it was all built by the makefile. Everything works! If you run 'gem install rce_package_tarball', I'm not going to do animated gifs anymore because it's starting to take a long time. This takes 18 seconds, which is significantly slower than 'package_source', and that’s because 'mini_portile' is running additional compilation steps to build the library.
00:25:59.440 Now, the benefit of packaging sources like this is that maintainers can rely on a known version of the library being available, which may simplify both your code and the installation process. However, the downside is that maintainers now have an additional responsibility to keep that library up-to-date and secure for your users.
00:26:19.680 Now, some gems, like Psych, adopt a hybrid approach to give users the option to either use the system library or fall back to using the packaged version. Finally, as our gem install timing demonstrates, packaging a third-party library as a tarball can take significantly longer to install than using the system or package source strategies because it goes through that additional configuration step.
00:26:54.560 Nokogiri 1.11 was the first version in which we shipped pre-compiled libraries for most major platforms—that's Windows, Linux, and macOS, both x86 and ARM. Along with the Java extension, the pre-compiled gems account for 45 million installations; that's over 95 percent of all Nokogiri installations this year. Would it surprise you to know that I'm able to do all this from my Linux machine? It's a true story!
00:27:46.080 Let me introduce you to a rake compiler dock maintained by Lars Kanis. This introduces a Docker-based build environment that uses rake compiler to cross-compile for all of these major platforms. Essentially, what this means is I can run the normal gem build process in a Docker container on my Linux machine to get Windows and macOS libraries.
00:28:04.319 This is really powerful stuff. Once we assume that we can cross-compile reliably, the remaining problems boil down to modifying how we build the gem file and making sure we test adequately on all platforms.
00:28:30.640 So, once again, I've got an example gem set up to show you, named 'pre-compiled'. It is a straight copy of 'package_tarball' that we've modified a little bit. The first modification is the extension task, which is now rather complex.
00:28:55.200 We've got some local variables declaring which versions of Ruby and which platforms we want to cross-compile to. We set an environment variable to let 'recompiler' know what we're doing. We signal to 'recompiler' to turn on cross-compiling features and create some additional rake tasks.
00:29:22.720 We have some code here to signal to our extconf that we're cross-compiling. Finally, this block only runs when we're cross-compiling and we use it to modify the gemspec and remove the stuff we don't need in the native gems.
00:29:42.960 Next, we have a couple of new rake tasks to coordinate the tasks performed in the Docker container. The idea is that I run the top task on the host and the bottom task runs inside the Docker container.
00:30:02.880 You might notice that the worker in the Docker container is both compiling the extension and packaging the gem using the gemspec just modified by our extension task.
00:30:38.720 Let’s look at the changes being made to extconf. We’ve added lines to ensure we use the cross-compiling compiler and not the default compiler. This is where we check the flag passed to us; while we don't need this, it’s useful to know we can pass information to our extconf about when we're cross-compiling.
00:30:55.280 The rest of the extconf changes are related to configuring libyaml at build time. We need to set the fPIC option so we can mix static and shared libraries together, which probably should always be set.
00:31:22.080 Additionally, the 'subdirs' environment variable is something that’s very specific to libyaml, indicating to turn off the tests during the build because, while we can compile binaries for macOS, we cannot actually run them.
00:31:42.400 We have one more small change to make before we wrap this up. First, let's review the layout of the finished packaged gem. You will see that we have four C extensions in this gem, one for each minor version of Ruby that we want to support.
00:32:21.679 Remember, the C extension is specific to an architecture, a version of Ruby. Hence, if we're running Ruby 3.0.1, we need to load the extension located in the 3.0 directory.
00:32:55.680 Let's ensure we do that in our 'precompiled.rb'. Previously, it looked like this, and it now looks like this. What we're doing is first checking to see if the precompiled extension file is present. If not, we will fall back to the compiled version, given that it was installed during installation.
00:33:14.080 That’s it! That’s all it took to upgrade 'package_tarball' to 'pre-compiled'—although one note is that we still must build the vanilla Ruby version for people who aren't using Linux, macOS, or Windows. I'm looking at you, FreeBSD users.
00:33:30.080 That entire process of building that gem remains unchanged. You still do it with just a 'rake gem' or a 'gem build' command. Go ahead and try it: 'gem install rce pre-compiled' if you're on Windows, Linux, or macOS. It will install in like a second; everyone else will take a few more seconds to install the package tarball.
00:33:49.120 This strategy isn’t perfect. Remember what I said earlier about a compiled C extension being specific to a Ruby version, the platform, and system libraries. The pre-compiled strategy mostly takes care of the first two but still has edge cases around system libraries.
00:34:07.680 The big issue is that Linux’s libc is not the same as Linux musl, which is Alpine Linux, and we’ve had to work around this a few times in Nokogiri. I'm certain there are more edge cases that will emerge as users add more platforms.
00:34:37.440 I'm also willing to bet money that you could break this by setting some Ruby compile-time flags on your system. I’m honestly surprised it works as well as it does.
00:35:03.600 So the lesson here is to ensure you have an automated test pipeline that will build a gem and test it on a target platform. This takes time to set up, but it will save you in the long run.
00:35:34.640 What’s next for me? I would love to expand on this project. I wish to include a lot of the information here in the README. I would love to extract some of these pre-compiled patterns into a gem to make it more accessible. Furthermore, I would love to work with gem maintainers interested in pre-compiling any of their gems, so please reach out!
00:36:22.080 I want to say thank you to Luis, Lars, Couhey, David, and Aaron; without you, this would not have happened. Thank you!
00:36:50.560 Thank you.