Wrapping Rust in Ruby

00:00:18.439 Hello and welcome. My name is Sella, and I am part of the program committee. It's my honor to introduce our next speaker, Garen Torikian. Garen has been a Ruby programmer for close to 20 years, and it is still his favorite language. He maintains the Common Marker gem as well as many other open-source projects. In his day job, he is the CTO at Yto, a messaging platform for support teams. In his spare time, he writes fiction and non-fiction. Please join me in giving him a warm RubyConf welcome, Garen Torikian.

00:00:58.920 Hello! Hi, everybody. Thank you all for being here, and thank you for watching the live stream. A special hello to anyone watching this video in the future. I hope your world is better than the one we are in right now. This is my first conference talk in a long time and my first RubyConf presentation, so I apologize if I'm a little rusty. Today, I'm going to talk to you about Rust and Ruby. As Sella mentioned, I am the CTO at Yto, and I'm @gjtan almost everywhere on the web. In this context, CTO is a pretty meaningless title; there are only three of us. The only reason I'm CTO is that I happen to be the best programmer, but that's also a meaningless title because it just means I'm the worst salesperson at Yto. For the purposes of this talk, think of me as the Chief Translation Officer.

00:02:03.399 Enter this talk with a mindset of translation, communicating from one language to another. Yto is a customer support tool built from the ground up to connect data systems and people. Messages can come into a support inbox via email, and support agents can forward that conversation to GitHub, Linear, Slack, or any of the other platforms we're connected to. Our mascot is Billy the Yeti, who represents the character in the middle of our operations. As you can imagine, Yto makes a ton of API calls to facilitate these integrations, but we also perform a lot of text manipulation.

00:02:31.120 We need to be able to convert between GitHub Markdown, Linear Markdown, and whatever Slack uses into HTML, so that it can be sent back out as an email to the customer. No matter the source of the message, we deliver it through Yto to another platform. Let's start with a story. Around 2015, the CommonMark project took off. If you're not familiar with CommonMark, it's essentially Markdown with a specification. If you had never written Markdown before 2015, let me tell you, it was a wild time. You could not trust any text input because you didn't know how it would render. So, a group of people came together and created a specification to ensure that, no matter the programming language or location on the web, you could write Markdown as expected.

00:05:00.320 For me, I was motivated to write the CommonMarker gem partly because there were no other CommonMark gems at the time. When I tried to write the gem in accordance with the specification, every time I benchmarked my code, I found it to be pretty slow. When I submitted this talk, I stated a controversial opinion in the abstract. Like many opinions, this one was meant to elicit a response. I wanted half of you to start clapping and applauding, while the other half crossed their arms and expected me to justify this statement. Normally, this would be the part of the talk where I show charts and benchmarks backing up my statement, but I'm not going to do that because that's not what today's talk is about.

00:05:46.439 Instead, I want to give you a topology. Many people are working on Ruby's runtime to improve its performance. If fact, the language is considered slow. However, people with actual PhDs are dedicated to improving Ruby's performance. We're looking forward to the Wi jet landing, and we've heard from Prism parsing this week. There’s JRuby on the JVM, TruffleRuby, Artichoke, and various other efforts to speed up Ruby. I'm incredibly grateful to all of them, and we should all be grateful to them. Yet, benchmarks still show that Ruby is slow relative to compiled languages.

00:06:01.680 So, knowing all that, what did I do to make CommonMarker fast? Simple: I just used C, which is a compiled language. The team behind the CommonMark spec developed a C library called libcmark. This library takes in CommonMark text, creates a representation of it, and then outputs HTML. Here’s what that looks like in the context of a Ruby gem: a Ruby user passes a string into the gem. That goes into some underlying glue code, which converts the string from Ruby into a C-format string. That C string goes into libcmark, which does its processing and returns the result back to the glue code. The glue code then converts that data back into Ruby data. So, all the Ruby user has to do is install the gem and get all this performance improvement for free.

00:06:54.360 We’ve taken all that heavy lifting out of Ruby and put it into C. Let’s look at some examples of what that glue code actually looks like in practice. These are all taken from the actual gem. First, we define a function in C that converts between Ruby and C data types. Ruby has no types, which is probably not going to change anytime soon, and that’s fine. To represent anything in Ruby, the C API uses something called VALUE, which can be anything—numbers, strings, etc. Since VALUE can be anything in your C code, you need to check if it’s a string. If it is, you convert it into a C string. Now that we have a C string, we can start working with libcmark.

00:08:46.720 We take the converted string and pass it to the library, which processes it and returns HTML. Now our glue code has to prepare that string to send back to Ruby. We should remember to free any memory we've allocated and close any references we've opened. The glue code then sends that data back out to Ruby. I once heard someone, possibly Tender Love at GitHub, say that dealing with C in this context can be humorous. One of Ruby's strengths is its tight coupling with C; it’s very easy to write C extensions. It feels somewhat Ruby-like even while writing C code. So maybe another way to say that original controversial opinion is that Ruby alone is slow. I think that's a subjective statement that we can collectively agree on.

00:09:55.760 Whenever Rubyists need to perform time-sensitive tasks, we often dip into C. Other libraries do this too. Nokogiri uses C for its HTML parsing, PG and MySQL for database connections, and many other libraries like Liquid for template parsing, Psych for YAML parsing, OJ for JSON processing, and Bcrypt also rely on C. Puma and gRPC are used for various message serialization tasks; this means there is C code running whenever you think you're just running Ruby code. There are numerous examples and tutorials that will explain how great it is to access the power of C, often emphasizing the process of writing this glue code to convert data in and out.

00:10:36.840 It's great, and I've written C code for Java and Node, but neither compares to how easy it is with Ruby. However, it's important to note that you're not just in charge of your C glue code. Libraries like libcmark are not maintained by you, and you don't review pull requests or keep up with security implications unless someone tells you. I don't hate C; I’ve been writing it since I was 15. I believe everyone should know C, as it’s a great educational tool for understanding memory management. However, even though Rubyists agree that Ruby sometimes needs a speed boost from C, why should we continue to rely on this when Ruby can serve as a DSL for C?

00:11:24.839 As a counterargument, consider the phenomena of memory safety issues in C. People who work with C often express their concerns over security issues associated with it. Moreover, there is a trend in our industry moving away from C due to these concerns. A notable example comes from a security vulnerability in Curl that was mitigated if it had been written in a memory-safe language like Rust. If even a professional C developer makes mistakes, what hope do I have as a humble Rubyist?

00:12:04.360 Now, let’s stop discussing C and explore the marriage of Ruby and C. Here's a familiar sight for those running and installing C extensions: when a C extension crashes, you often end up with a backtrace displaying files and line numbers that represent the stack trace. However, this doesn’t necessarily point to where the problem occurred; there can be memory mishandling in one function that manifests much later in your code.

00:12:27.280 Debugging memory issues can be very frustrating. There are tools like GDB, Valgrind, and AddressSanitizer to assist with this, but they can be difficult to set up and use. Additionally, each platform you want your Ruby gem to support needs to consider the CPU and the operating system of the end-user. This means your Ruby C extension developer must account for various OS and CPU idiosyncrasies. OS updates can be moving targets, and you really hope you’ve got those escape hatches covered correctly. For those who have dabbled lightly in Ruby C extensions, you may be familiar with some initial dragons of development. Working in C often involves tasks like concatenating strings or expanding arrays, which are all tied to memory management issues.

00:13:57.280 Many of the tools available for C programming stem from an era with heavily command-line-driven interfaces that may feel antiquated. This last point is one that often goes ignored in many tutorials and is quite irritating: has this ever happened to you? You install a gem, and while it promises to build a native extension, it crashes. What happens is your computer downloads Ruby code and C code from RubyGems. The person installing the gem must also be able to compile that C code. Even if my machine compiled the C code and pushed it to RubyGems, I need to ensure that your machine can likewise compile and run that code.

00:15:03.480 In many cases, this isn't a huge problem for developers like us who have installed various toolchains. However, it remains a legitimate frustration. Users can configure their machines in ways that might not meet expectations, creating issues during installation. Wouldn’t it be great if we could eliminate memory issues typically associated with using C by utilizing a memory-safe language like Rust? Wouldn't it be fantastic to write highly performant code in a modern language? Absolutely, it would! That’s the focus of the remainder of this talk.

00:15:54.040 I belong to a community on GitHub called Oxidize-RB, which includes Rust and Ruby enthusiasts. We are a fairly active group, so feel free to join or lurk, and don’t hesitate to ask questions—we are super friendly! My goal for this talk is to demonstrate how simple it is to integrate Rust into Ruby. I do not say this lightly; attempts to incorporate Rust into Ruby have been ongoing since at least 2015, with many projects getting started and then abandoned. However, recent years have made this integration increasingly feasible.

00:16:37.679 I should also mention what I will not cover today. You technically do not need to use C or even Rust as your glue code. If you want to tap into any faster native language, you can, but I won't delve into those other languages today. Additionally, both the Rust and Ruby communities place great value on having up-to-date and well-organized documentation. You can create static sites that combine documentation from both languages into one coherent page. I also won’t talk about concurrency aspects, such as Rust's Tokio or Async Ruby, and I will explain why later. If you're looking forward to that, just temper your expectations.

00:17:09.680 Now, let's explore how easy it is to write Rust and Ruby. I've come up with a project idea: I want to take a string and convert each character's case to alternating cases. If you haven’t figured it out yet, I am online all the time so I can quickly create silly SpongeBob memes. The first step is to find a Rust crate that accomplishes what I need. I found one in just 35 seconds, and since it works, I will use it. To initiate our project, I will call 'bundler gem' with the name of my gem, but instead of stopping there, I’ll also add '--ex=rust.' This indicates to Bundler that I want to create a Rust-backed gem.

00:18:41.279 This capability has been available in Bundler since early January 2023. It generates all the necessary files and folder structures to get Rust integrated into my Ruby gem. This initial process puts us approximately 65% of the way toward completion. If you were expecting a slow burn, nope—we’re ready. Bundler also generates a TOML file for us. TOML is used by Rust to manage its dependencies, similar to Ruby’s Gemfile. Since all of this TOML was created for us, I merely need to add my new Rust crate as a dependency at the bottom.

00:19:30.239 Next, I'll switch back to Ruby for a moment. This is how I envision the API of my gem: I want to create a class that accepts a string and allows me to call a method called 'alternate_case,' which will alternate the case of the string. To achieve this, the skeleton file created by Bundler includes an initialization method where I will also define the 'alternate_case' method. This method will call a function we have yet to define called 'to_alt_case,' which will invoke the underlying glue code in the Rust library.

00:19:54.000 With that foundational setup, we’re ready to write some glue code. To accomplish this, we're going to rely on a tool called Magnus. Bundler has integrated Magnus for us; it's an open-source project officially supported by Bundler. It handles much of the magic required to bind Rust code with Ruby. Without diving into the weeds, I want to illustrate that this snippet constitutes the entirety of the Rust code we need. Starting at the bottom right, we have the Magnus init declaration, which defines a method as an entry point, similar to a main function in other languages. For those familiar with writing C extensions, you might find this structure familiar.

00:21:05.320 In this init method, we will expose two methods: a singleton called 'new' and an instance method called 'to_alt_case.' The integer following these method names indicates the number of arguments each method is expected to take. At the top of this code, we import our crates for Magnus and our Ruby gem, ChangeCase. We also define the structure of what we want our Ruby ChangeCase class to look like in Rust. In Rust, this is done using structs, which are analogous to classes in Ruby. Magnus wrap indicates to our build system that this struct should be connected to our Ruby ChangeCase class.

00:21:52.600 The struct mirrors our Ruby code and consists of a single field, 'str', which is a string. Next, we will implement the two methods we previously defined. The 'initialize' method will create the struct, and behind the scenes, Magnus will take that Ruby string and convert it into an appropriate Rust format. Similarly, the 'to_alt_case' method takes the Ruby string, processes it through Magnus, and sends it to the Rust library we discussed earlier. Once the Rust library returns its string, Magnus converts it back into a Ruby string and returns this to the user. All the work we just described is accomplished in one line of Rust.

00:22:56.760 Here’s a side-by-side comparison of writing this gem in Rust versus C. As we noticed with CommonMarker, a significant portion of the C code is dedicated to converting between Ruby and C types. There's a lot of type-checking that we have to perform. C programming tends to be more verbose regarding memory management, but with Rust, we avoid manually managing memory in this case because we haven't dealt with that complexity. If you're using VS Code, you'll find fantastic features when working with Magnus. Automatic documentation, symbol navigation, and practical examples all exist here, which is far less prevalent for C. In short, it’s very straightforward to dive into.

00:24:05.360 The Magnus library can handle a variety of types, not just strings or numbers. You can use 'options' as nillable arguments, and vectors can be flattened out into arrays. The library allows for calling Ruby functions from within Rust, which can be helpful for fetching constants or configurations you'd prefer to keep in Ruby. You can also raise Ruby errors in a Rust-like manner, for example, a type error if you encounter an unexpected type. By using Rust, we've addressed the first dragon of memory safety. Now we can move on and address the second.

00:24:56.159 For this, we'll examine another VS Code extension called CodeLLDB. As you might expect, it integrates VS Code with the LLDB debugger. In practice, it enables users to create a debugging environment in approximately five lines of JSON. You can utilize the VS Code UI to run tests or sample scripts, and when the debugger halts, we can inspect the memory and dependency libraries. Overall, setting up this environment takes only about 30 seconds, so it's super quick and efficient.

00:26:02.840 Now we arrive at the final dragon of our journey. Even though your local machine may have Rust installed and you're able to build and compile your project, you need to consider how others who install it can use it. We do not want Ruby users to be required to install Rust or any toolchains as a prerequisite for using our gem. So, how do you build for platforms you don't have access to? Feel free to shout out if you know! The answer is the magic of Docker.

00:27:00.840 We love it for good reason! Earlier, I showed an error that occurs during distribution of C-based packages. Although that error is 100% legitimate, for many C gems, the problem has been effectively resolved. Mike, also known as Flavor Jones, one of the maintainers of Nokogiri, presented a brilliant talk on distributing C gems a few years back. He explained the distribution challenges clearly and discussed an enhancement to the release pipeline where Nokogiri is pre-compiled using Docker containers before being uploaded to RubyGems.

00:27:48.439 Each Docker image represents an operating system and CPU architecture pair that he aimed to target. This way, a user can download a pre-compiled gem tailored to their specific platform without performing any compilation work themselves. We let cloud computers handle all the heavy lifting, and the end-user simply downloads the final result. If both the computer in the cloud and the user are running Linux x86, voila!

00:29:07.080 Building on these concepts, we can apply the same principles for Rust in Ruby. I have configured a GitHub Action to build versions of my gem for each platform. When the process of compiling Rust finishes, those versions are automatically released to RubyGems. RubyGems will segment each CPU and OS architecture as a separate installer, which is a fantastic feature. I don't understand why many C gems aren’t utilizing this yet.

00:30:14.679 In technical terms, all of that pre-compilation could also be done with C extensions. What I haven't observed yet is a comprehensive suite of CI actions that are applicable to multiple Ruby C extension projects. The Oxidize-RB group is working to address this gap. Much of the essential Rust language glue work is already in progress, but opportunities still exist to standardize and enhance the developer experience.

00:30:40.240 To circle back to the beginning, this is what a Rust-backed Ruby gem that converts CommonMark now looks like. Most of the language integration works well because Rust, Magnus, and deployment integrations build upon existing C avenues for integration. Many individuals have spent extensive time ensuring C extensions work for Ruby, and now Rust can build upon those principles with ease. Regardless of the programming language you prefer, I personally believe the code on the left, written in C, cannot compare to the simplicity of the Rust on the right.

00:31:40.480 The only aspect missing from this code snippet is accepting options and hashes, along with toggling various settings in the underlying Rust CommonMark conversion library. I no longer have to worry if someone forgot to provide me a string; Magnus will raise an error for me. If I need to increase or decrease a structure, I can do so in a Rusty manner, ensuring I have not introduced any security flaws. Of course, it isn’t always bright and sunny; environmental differences may arise when working in Rust, especially concerning CPU architectures such as 32-bit versus 64-bit and ARM.

00:32:40.280 However, I firmly believe these challenges are considerably more manageable in Rust. The responsibility for addressing architecture-related issues lies with the Rust core team, rather than the C library or the Ruby extension author. I will also note that the primary challenge with Rust bindings is understanding where data is located. This relates to navigating Ruby's garbage collection processes. If you have written enough Rust code, you know that the language will provide warnings if you're doing something incorrectly. Therefore, while having some understanding of memory is necessary, you should want to know how it works, even in Ruby.

00:33:41.760 Being aware of memory allocations helps you understand where things happen on the heap or stack. Such knowledge will assist you in avoiding issues with Ruby’s garbage collector. The final point of concern is that the Magnus API, while stable, has several areas that are still undeveloped. Many of these involve async fibers and other features. There’s ongoing work to implement these, but it's just a small open-source project and does not have a large team behind it.

00:34:11.839 If you're interested, there are opportunities to contribute. My ultimate goal is to convince you to reconsider relying on C extensions. Take a look at your codebase and see how much C you're actually using. Here are some links to the various resources I mentioned earlier. That’s the landing page for the Oxidize-RB group, where you can get involved. Additionally, there’s a QR code for yo.app that provides more information. Finally, I've uploaded the complete demo app for the ChangeCase Ruby project, which includes the GitHub actions and everything you need to get started. Thank you very much!