Ruby VM
Building a Ruby: Artichoke is a Ruby Made with Rust

Summarized using AI

Building a Ruby: Artichoke is a Ruby Made with Rust

Ryan Lopopolo • November 18, 2019 • Earth

The video discusses 'Artichoke', a new Ruby implementation built with Rust, presented by Ryan Lopopolo during RubyConf 2019. Artichoke aims to be compatible with MRI Ruby 2.6.3 and provides a VM-agnostic framework for Ruby Core and Standard Library, allowing for experimentation with different VM implementations. Key points covered in the presentation include:

  • Introduction to Artichoke: Artichoke is a Ruby VM targeting WebAssembly that focuses on core Ruby functionality and standard library integration.
  • Inspiration and Development Journey: The project began as a hackathon idea aimed at creating a Ruby-based application, which evolved into a comprehensive Ruby implementation. The speaker emphasizes adopting an iterative development approach, akin to the 'New Jersey' style, which prioritizes building functional capabilities over perfection.
  • Goals of Artichoke:
    • WebAssembly Support: Executing Ruby code within a WebAssembly environment, enhancing cross-platform capabilities.
    • Execution of Untrusted Code: Enabling the safe execution of user-supplied Ruby code, as seen in various applications like browsers and game engines.
    • Single Binary Packaging: Simplifying the deployment of Ruby applications by bundling all dependencies into a single executable binary.
  • Technical Implementation: The use of Rust allows for efficient compiling to WebAssembly and creates small binaries due to static linking. Demonstrations included interactive demos that exhibit Artichoke's capabilities in running Ruby code within the browser.
  • Core Features and Enhancements: Artichoke implements core Ruby features through multiple backend strategies, supporting various functionalities like environment access and IO management, while ensuring safety and performance through Rust's features.
  • Future Plans: The project aims to extract Ruby core components from MRuby, enhance file system support, and continue refining the core architecture.

The talk concludes with a call for community involvement, inviting developers to contribute to the ongoing development of Artichoke on GitHub. The audience is encouraged to participate and help shape the future of this innovative Ruby implementation.

Building a Ruby: Artichoke is a Ruby Made with Rust
Ryan Lopopolo • November 18, 2019 • Earth

RubyConf 2019 - Building a Ruby: Artichoke is a Ruby Made with Rust by Ryan Lopopolo

Artichoke is a new Ruby implementation. Artichoke is written in Rust and aspires to be compatible with MRI Ruby 2.6.3. Artichoke is a platform that allows experimenting with VM implementations while providing a VM-agnostic implementation of Ruby Core and Standard Library. This talk will discuss Artichoke’s history, architecture, goals, and implementation.

#confreaks #rubyconf2019

RubyConf 2019

00:00:12.260 Hi everyone.
00:00:13.920 We welcome you to the last day of RubyConf.
00:00:15.840 We made it! Thank you for coming today.
00:00:24.570 I'm here to talk to you about Artichoke.
00:00:26.700 Artichoke is a new Ruby VM that I've been working on for the last nine months.
00:00:31.530 I'm guessing the first time many of you heard of Artichoke was during the SyntaxError game show on Monday.
00:00:35.129 So awesome! Thank you for following up with your excitement; I appreciate it.
00:00:40.019 This deck is available at artichoke.github.io/rubyconf/2019. If you want to tweet or follow along, there are a bunch of interactive WebAssembly demos in the deck that are pretty fun to play with. You can also create your own demos at artichoke.run.
00:01:01.399 Before I get started, though, I would like to thank three people in particular who provided me with a ton of support along the way to getting as far as I have in the implementation and leading up to RubyConf. I want to thank Daniel, Steven, and Wobba. Thank you very much; I appreciate all your support.
00:01:14.640 So, what is Artichoke? Well, Artichoke is a Ruby implementation that targets Ruby 2.6.3. This includes the core and the standard library. It runs and targets WebAssembly.
00:01:28.109 Here is a WebAssembly demo that you'll see a bunch of in the deck. It embeds an iframe of Artichoke.run and can be used to execute Ruby code. Let's demonstrate that right now.
00:01:46.020 We've got this code that sets up a properties object, which is kind of a wrapper around a hash. We shove some keys into it, generate some JSON, and test it with a very nice regex validation to ensure the JSON contains what we expect and print it out. There's a lot going on here, which we'll dive into over the course of the talk.
00:02:10.229 But I guess one thing to note is that we have parts of the standard library integrated here, which is a little unobvious given that this is WebAssembly and there's no file system. However, it’s important to ask the question: Why would I want to build a Ruby?
00:02:30.300 I didn't set out to build a Ruby about ten months ago. I was trying to put together a hackathon project and wanted to build a Rube Goldberg machine, like the one you see in this music video. We were trying to build a React code editor populated with Ruby code, which we would then post to a Rust web server to hopefully evaluate the code and generate some CSS that we would pipe back to the browser to restyle the code editor.
00:02:43.560 But I never quite got that far because, just like the apocryphal game developer who becomes more interested in building the game engine, I became more interested in the Ruby bindings themselves. And thus began a long journey to build a Ruby. Along the way, I implemented a Rack-compatible web server capable of running a Sinatra web application.
00:03:18.870 This user-driven focus on feature completeness is something that has pushed the project forward. In 'The Worse is Better' essay, the author posits that there are two styles of software development. One, called the MIT style or crown jewel style, is design-focused, creating the perfect solution that's not completed until it's deemed perfect. And trying to build a Ruby in that manner seems like the wrong approach because there’s so much to do.
00:03:56.220 There are many idiosyncrasies in MRI, making it nearly impossible to get it right from the outset. Instead, Artichoke embraces the New Jersey approach: build the worst thing that can possibly work and refine it over time based on use cases, such as building a Sinatra running server to hone in on the correct implementation. That being said, we are still in the early days, and there is lots to do. I hope you are excited enough by the end of this talk to check out our GitHub and consider contributing to some of the tickets.
00:04:37.090 All of the tickets in the Artichoke set of repositories are tagged with the difficulty level we believe they might present for implementation. The ones you might be most interested in are tagged as 'easy' and are 'Help Wanted'. I encourage you to check them out.
00:05:03.069 I should mention that Artichoke is written in both Rust and Ruby, so we're looking for contributions from individuals with backgrounds in both languages. Additionally, Rust, at least so far, has proven to be quite approachable. This is only my second Rust project, and I've been able to crank out about thirty thousand lines of code, so it isn't so bad.
00:05:39.339 We've also found that some contributors who are more comfortable with Ruby start implementing parts of the core and standard library in Ruby and then graduate to implementing others in Rust, especially those that are easier to do in Rust or performance-critical.
00:06:06.629 That was a little bit of background. Now, I want to cover the high-level goals for Artichoke, exploring why these are worthwhile goals and the tools we can use to implement them. Our first goal is to build for WebAssembly, which you've seen already in the demo I showed earlier. The next goal is to execute untrusted code, and we will explore that further in the next few slides.
00:06:37.990 Finally, we want to package Ruby applications as single binaries for ease of deployment and auditability. A bit about WebAssembly: it is a virtual machine runtime that can execute code compiled from high-level languages like Rust and C++. WebAssembly is great because it is sandboxed by default.
00:06:48.660 One concrete example is that memory out-of-bounds accesses trap instead of causing undefined behavior or potentially crashing your program. This is beneficial for some of the other goals we've outlined, such as executing untrusted code. Additionally, WebAssembly is multi-platform; it is for more than just browsers, even though it can run there as well.
00:07:13.840 There are several WebAssembly runtimes, most notably Node, that allow you to execute WebAssembly code on environments like Linux servers. The big takeaway is that WebAssembly allows us to execute Ruby in the browser, which I think is pretty cool.
00:07:40.270 Next, let's discuss untrusted code. What does it mean, what types of applications have it, and why would we want to support it? I think of untrusted code as a program that offers code execution as a service. Some examples of such applications include Mozilla Firefox, which executes both WebAssembly and JavaScript in the browser, and potentially from uncertain sources. Additionally, game engines typically expose scripting capabilities because it is easier to iterate on level design in a scripting language than it is to recompile C++. Even platforms like Shopify allow users to inject and configure the platform by running code they supply.
00:08:55.680 Even Redis has an eval command that executes user-supplied Lua scripts within the context of the database. It would be nice to have Ruby available for some of these use cases too.
00:09:59.919 With the aim of developing single binary applications, we are looking for straightforward hermetic deployments. What does that mean? It means that a Ruby installation today relies on many different parts of the file system.
00:10:25.600 Just a standard Ruby install loads several shared objects and has files from the standard library scattered on disk in the Lib directory, which complicates bootstrapping new deployments, both on Linux and Windows. It also makes deploying a Docker container trickier than necessary.
00:10:46.550 Additionally, applications themselves typically load Ruby code, gems, config, and assets. It would be great if we could bundle these all together into one binary, simplifying the deployment of our applications. To summarize, our goals for this project are to build for WebAssembly, execute untrusted code, and package single binary applications in an easy-to-deploy manner.
00:11:24.620 What tools do we have available to make this a reality in Artichoke? The first and perhaps the most obvious, given the talk's title, is that we can use Rust for fun and profit in making our Ruby align with these goals.
00:11:49.950 One easy win comes from the fact that Rust has a native WebAssembly back-end, which means we can easily write Rust code and compile it to WebAssembly in a browser-consumable format. Building for WebAssembly in Rust is as simple as invoking just two lines of code, making it easy to integrate into a build process for compiling a Ruby application for the web.
00:12:25.150 Demo time again! One powerful aspect of Rust—and notably its native WebAssembly compilation—is that several parts of the ecosystem can be built for WebAssembly. Here, we have a program that includes a gem called Artichoke Web.
00:12:42.460 This gem connects to a Rust crate, or library, called Standard Web, allowing us to access JavaScript APIs in Rust. As demonstrated, we established a gem that binds to a Rust library exposing the location object in JavaScript. The Artichoke.run playground stores the program text in the URL's location hash that is served.
00:13:18.860 We can use these bindings to standard web to create a coin that utilizes the location hash of this iframe for storage. If we run this program, it will print out itself! Tada!
00:14:06.580 When we include a crate dependency, we don’t end up needing a bunch of shared objects at runtime; instead, they become linked directly into our application. This allows us to achieve a single binary distribution.
00:14:29.400 By taking that single binary and dropping it into a 'from scratch' Docker container, we can set up a full Rails installation right there. Plus, on Linux, Rust supports building against musl libc, which enhances hermeticity, enabling single binary applications without any dependencies on the host system.
00:15:00.790 So it’s all great! We've got this Rust toolchain to make our lives easier in meeting some of our goals. However, that doesn't directly build Ruby—so how do we convert these Rust tools into something that resembles Ruby?
00:15:27.680 To me, the most recognizable part of Ruby is its core. Ruby core consists of many classes and modules that form the API you use every day—things like the env object, arrays, files, regexes, and match data. To build a Ruby, we need to implement all these components.
00:16:08.100 Ruby core will consist of core objects backed by multiple implementations of the same interface. These interfaces and what they expose can be configured at compile time.
00:17:06.570 Take env, for instance. It's how you access the system environment in Ruby. It can be somewhat unintuitive how env operates within the context of WebAssembly, since this isn’t a UNIX-based platform and there’s really no environment to speak of.
00:17:38.520 However, this program will run by leveraging an alternative backend for env. Artichoke exposes two implementations for env. The first communicates with the system environment, enabling easy modifications of your path or any of your needed system environment variables. The second one is backed by a hash map and has the same API as the typical env, allowing programs expecting the standard behavior to function as intended.
00:18:34.700 This design allows us to create a Ruby that can run in untrusted contexts by selectively turning features of the API on and off based on compile-time flags. An interesting aspect for building a Ruby that may not have access to file descriptors is the concept of capturable IO in WebAssembly. While there is no file descriptor for print statements, throughout our demos we have managed to print outputs to the screen.
00:19:41.640 This works because the interpreter exposes multiple IO strategies that can optionally capture output into a buffer as it generates data, allowing later extraction of this buffered data from various interpreter components. Here’s some code that generates output, capturing issues and ensuring our X-wings are ready to attack the Death Star, with all IO captured and displayed on the screen.
00:20:28.630 Additionally, we have the ability to add and remove different IO parts to enhance the security guarantees of the Ruby ecosystem we’re building. Certain IO methods, like IO#popen and Kernel#open with a pipe, pose security risks in a WebAssembly environment or game engine—you wouldn’t want to grant access to these methods.
00:20:47.710 Fortunately, these potentially dangerous APIs can be disabled at compile time, as they're treated as optional features rather than mandatory.
00:21:12.220 Furthermore, Artichoke has an embedded virtual file system that provides functionalities such as requiring files—even when there isn’t an actual disk present. The JSON implementation in Artichoke utilizes the pure JSON implementation available in Ruby.
00:21:37.510 The Ruby files are loaded from the Lib directory and, during the build process, we embed these sources into the Artichoke binary. The interpreter accesses them through a virtual file system. This virtual file system seamlessly supports both Ruby sources present on disk and files extending from the binary, aiding our objectives for single binary applications.
00:22:17.210 Now, let’s require some JSON, turn it to JSON, and print it out. Voilà! We have JSON!
00:22:52.830 Additionally, we have another example showcasing that pattern of multiple backend capabilities. Ruby’s regex implementation is quite pervasive, enabling text matching and evaluation. Artichoke supports three regex implementations. Notably, one draws on the Ruby regex library, which allows us to validate patterns against Ruby-supported configurations.
00:23:37.380 Most of the time, after validation, we delegate regex tasks to a Rust library that performs faster for patterns without backtracking, permitting us to remove a C dependency when building Artichoke, which simplifies the process.
00:24:32.260 The regex implementation efficiently finds matches by employing a deterministic finite automaton algorithm, which sounds complicated, yet I was happy to derive from Rust’s substantial ecosystem instead of trying to implement the functionality myself.
00:25:05.040 If we repeat the same test using regex with our testing corpus, Artichoke showcases enhanced performance again, reinforcing that leveraging the Rust infrastructure provides notable performance advantages for us.
00:25:53.730 One of the unique features that Artichoke has is this concept of a source array. Running just the first two lines of this program in any other Ruby interpreter could lead to significant memory allocation issues.
00:26:23.350 The array supports the same multiple backend design as the rest of the core, allowing aggregate arrays containing specific sub-arrays. The result is a remarkable reduction in memory usage, exemplifying successful memory management through our architecture.
00:26:48.190 Regarding upcoming work for Artichoke, we’ll primarily focus on the core. True to the New Jersey style of development, Artichoke doesn’t implement its own VM or parser; it depends on MRuby for those aspects. Our efforts will revolve around extracting Ruby core components from MRuby to enable the development of our VM and parser.
00:27:50.580 Promising future enhancements include implementing a hash with a small vector backend, allowing small key-value collections to leverage the ability of computers to access linear memory efficiently.
00:28:44.180 We'll also work on range implementations that allow Ruby's range expressions (like 1 to 1000) to be managed without necessitating the creation of a full array for all elements.
00:29:24.620 Moreover, we seek to enhance the file system support to encompass an operating system backend, moving beyond the in-memory file system we use now.
00:29:58.160 That concludes everything I have! I hope you're looking forward to exploring Artichoke on GitHub and possibly submitting a pull request or two!
00:30:05.590 Please check out Artichoke.run!
00:30:07.190 Thank you very much!
00:30:09.300 If I could summarize the question, it's how do you think about enabling compilation to WebAssembly versus being a Ruby interpreter that can target WebAssembly?
00:30:27.260 Currently, I think we are definitely a Ruby interpreter that targets WebAssembly. There's a shell of a project on GitHub called Jasper, which we would love contributions for, aimed at bundling a full application into a WebAssembly bundle. Right now, that would primarily involve concatenating the sources into the virtual file system. But if we extend that vision, it could potentially encompass static compilation to WebAssembly in the future.
00:31:26.490 I have heard of the rubinius standard library but haven’t explored it yet. Most of the work on Ruby implementations has come from bootstrap processes using MRuby core, which targets the ISO Ruby standard rather than 2.6.
00:31:47.210 That said, sustaining compatibility has necessitated some surgical modifications. Regarding how Artichoke deals with third-party gems, when building the Rack clone that ran Sinatra, I incorporated vendor libraries into the repo with modifications to run on the specific interpreter version I was utilizing.
00:32:20.270 I then embedded these gem sources into the binary. While this method isn't particularly sustainable, it's why initiatives like Jasper are crucial for gathering the full array of sources from your Ruby application and packing them up effectively.
00:33:11.430 It's important to note that WebAssembly lacks threads, and the MRuby interpreter we employ isn’t thread-safe. To facilitate this, I’ve implemented a blocking version of thread and mutexing, allowing simulated thread actions to access the complete API.
00:33:47.930 This setup enables operations like accessing thread locals in a manner similar to how we manipulate env—even when no environment is available.
00:34:19.210 Thank you!
Explore all talks recorded at RubyConf 2019
+84