Writing concurrent libraries for all Ruby runtimes

by Petr Chalupa

In the talk 'Writing Concurrent Libraries for All Ruby Runtimes' presented by Petr Chalupa at RubyConf 2015, the focus is on creating concurrent classes that leverage the full capability of multicore processors within Ruby environments. The presentation begins with an introduction to the Concurrent Ruby gem, which offers both high-level and low-level concurrent abstractions suitable for MRI, Ruby 1.9, and forthcoming support for Rubinius.

Key points discussed include:
- Concurrent Ruby Overview: This library includes features such as chainable futures, channels, actors, and transactional memory. It utilizes synchronization primitives to facilitate the development of concurrent applications.
- Jariabek Project: An experimental backend for JRuby that acts as a self-optimizing Abstract Syntax Tree (AST) interpreter to improve execution performance.
- Understanding Concurrency Challenges: Chalupa explains the complexity of concurrency, focusing on how compilers and processors can reorder execution while maintaining sequential consistency, which can create unexpected behavior if not properly managed.
- Building a Future Abstraction: A simple Future class is created to represent a computation that is not yet complete. Chalupa discusses the API and illustrates its use for background processing using worker threads that manage job queues.
- Implementation Details: The presentation explores performance implications and the importance of synchronization, particularly in MRI where the Global Virtual Machine Lock (GVL) is utilized. Chalupa demonstrates how to enhance the basic Future implementation to avoid unnecessary synchronization overhead.
- JRuby Considerations: The importance of ensuring that instance variables are safely shared across threads is emphasized, and elements of the Ruby core API in JRuby are adapted to maintain concurrency without compromising performance.
- Performance Benchmarking: Performance comparisons across implementations highlight the improved speed and efficiency of concurrent code in JRuby.

Chalupa concludes by emphasizing that developers should start with the Concurrent Ruby gem for concurrent programming needs and offers assistance for those looking to implement or extend functionalities. The talk highlights the ongoing evolution of concurrency in Ruby, alongside a promise to maintain compatibility with future Ruby releases.
Overall, the key takeaways stress the importance of understanding concurrency complexities and utilizing libraries that abstract those challenges effectively.

00:00:14.440 Good evening everyone. Thank you for coming. My name is Petr Chalupa.

00:00:20.000 I work on Concurrent Ruby, which is basically a collection of high-level and low-level concurrent abstractions. It is open-source, meaning we are not trying to force any particular solution on you.

00:00:33.830 You can choose the best solution that works for your specific problem. We support MRI, Ruby 19, and are currently working on support for Rubinius, as well. This gem is already used in several projects, including Rails 5, Sucker Punch, and Line Flow, and we have just released version 1.0.

00:00:47.570 From the high-level abstractions, we have several features like chainable futures, Go-inspired channels, actors, closure-inspired agents, transactional memory, and atomic references. We also have some low-level synchronization primitives like countdown latches, semaphores, cyclic barriers, and more, which can help create various kinds of abstractions easily.

00:01:09.049 The second project I want to introduce is Jariabek. Jariabek is an experimental backend for JRuby, and it is part of the same repository, meaning it is also open-source.

00:01:17.720 Essentially, it is an Abstract Syntax Tree (AST) interpreter with a key feature: it is a self-optimizing interpreter. This means that as the interpreter executes code, it profiles which branches were taken and what types were involved. Consequently, the nodes can specialize and, after some time, we can assume that it has stabilized.

00:01:41.980 Once we have that stable tree of nodes, we can feed it to a GraalVM compiler, which can produce highly optimized machine code. If some disciplinary assumptions fail, we can invalidate the compiled code, reverting to the interpreter, but allowing it to run off the more specialized code again for subsequent compilations.

00:02:02.810 The gist of it is that Jariabek aims to be quite fast, and we will review some results later. It supports all Ruby implementations without restrictions and even tackles trickier aspects like debugging and the object space. These features are always on and do not introduce overhead if you are not using them.

00:02:17.720 We have approximately 90% compatibility based on Ruby specifications, and as I mentioned, it resides within the same repository as JRuby, which is included by default. So if you add option X to the path, you can already test micro-benchmarks or very small projects.

00:02:36.110 Now, let’s move to the main topic of this talk, which is implementing a concurrent abstraction. To begin with, I’d like to discuss why concurrency can be challenging. The main reason is that processors and compilers are allowed to reorder how our code is executed, as long as they maintain sequential consistency. You might think it's beneficial to prohibit this behavior, but that’s actually undesirable because it allows compilers and processors to perform many optimizations.

00:04:02.150 However, the consequence of this reordering is that another thread may see strange or unexpected values. Since one thread executes code in a different order than what we wrote, it can lead to issues when it comes to memory access.

00:04:35.840 To reason effectively about this concurrency, we must account for all possible valid orders in which our code can be executed. To construct these orders, we need a framework to help us make sense of them, which we refer to as a memory model. I will touch on this more later.

00:05:12.360 Of course, this has a significant impact on implementations. For example, when the Ruby 2.0.0+ is optimized, it can execute your Ruby code with fewer instructions, so you can often observe these effects of reordering more frequently in practice.

00:05:49.949 Let me show you an example of what can happen under these circumstances. Consider a simple class with one field called 'value.' If we create two instances of this class, we have two writes to memory: the first and another one. If we read them both later, the compiler can optimize the execution.

00:07:02.960 Because of this optimization, it may switch the order of these writes. If that happens, you might think at first that the instance was filled correctly when, in reality, it was not. This can result in unexpected behavior.

00:07:35.050 This leads us to the actual implementation part. I will begin with a simple abstraction called ‘Future,’ which essentially serves as a reference for a computation that hasn’t yet been completed. It features a very simple API: you can create a new Future, fulfill it with a value, or retrieve that value using a blocking method if the future was not yet fulfilled.

00:08:12.640 Alternatively, you can check if the Future is completed, which is a non-blocking operation.

00:08:39.849 Let's see a simple way this can be applied. With Futures, we can build simple background processing. At the top, there's a simple helper that prints timestamps alongside its outputs. We implement background processing by creating a shared queue for jobs, and then we have two worker threads that continuously pop job pairs from the queue.

00:09:03.399 These worker threads compute the result and fulfill the Future with that result, finally printing the result whenever it gets computed. This allows us to construct a simple async helper that executes Ruby blocks or blocks of code in a non-blocking fashion.

00:09:32.680 This method returns immediately; it gives back a Future instance. You can then create the Future and call ‘value’ which will block your thread until the job is computed.

00:10:00.980 Here, we create an array of five jobs, each multiplying its index by two. The call returns almost instantly, and this check ensures that the array contains the Futures.

00:10:21.070 At the end of this process, the thread will block until all the values are computed, as we can see when we run this example. Note that we placed a sleep in the implementation to simulate computing time.

00:10:38.560 Now let's take a look at the first implementation of Future. For this, we’ll use tools that are already present in the Ruby standard library.

00:11:04.730 For example, we use mutexes and condition variables. The mutex essentially acts as a lock, while condition variables allow a thread to block until a certain condition is met.

00:11:30.310 In our case, the condition would be that the Future is fulfilled. We also utilize instance variables to store the current value of the Future. The lock allows you to create a synchronized method, ensuring that only one thread can access that block of code at a time.

00:11:56.310 Additionally, it has the property that when one thread makes changes within this section, another thread entering the same section always sees the changes made by the first thread.

00:12:20.929 With this in place, we can implement the Future class and focus on the complete method. We begin by reading the current value.

00:12:39.660 To do this, we protect it with the synchronization to verify if it’s pending or not.

00:13:00.039 In the value method implementation, we again utilize the critical section to ensure the operation is atomic. We only block the thread here if the future has not been completed.

00:13:23.310 As soon as the Future is fulfilled, it wakes up the blocked thread using the broadcast method on the condition.

00:13:41.420 This is why it's essential to have these critical sections. Let’s take a look at how this performs under different implementations, benchmarking complete, value, and fulfill methods.

00:14:18.840 We will simulate five million operations for each micro benchmark. Initially, we’ll focus on the first approach.

00:15:00.330 Next, let’s consider how to enhance the performance of this. Starting with the MRI implementation, we notice that synchronization can be quite costly.

00:15:20.480 Avoiding this synchronization would be beneficial, especially since MRI employs a Global Virtual Machine Lock (GVL). By examining the source code, we find that the GVL actually functions similarly to Ruby mutexes.

00:15:35.850 This means that when a thread releases the GVL and another acquires it, other threads will always see all changes made by the first thread. This results in instance variables acting effectively volatile.

00:15:54.400 However, this behavior is undocumented, and it's surprising to note that many Ruby codes and libraries depend on it, whether intentionally or not.

00:16:07.640 So let’s take a closer look at the specific implementation focusing on MRI. We'll still need to incorporate mutexes and condition variables, so we can protect our reads.

00:16:32.530 In our complete implementation, we need not protect the reading from this variable. Instead, we can simply read this volatile value and verify if it’s pending.

00:16:57.420 In the value implementation, we can avoid going through the synchronized block by first checking if the value is completed or not.

00:17:20.100 If it is complete, we return immediately. If not, we will still have to navigate through the critical section again to check the status.

00:17:38.780 Thus, if we're invoking fulfill, it's crucial to understand why we can't simply move that check out as we did with the value method.

00:18:00.170 This because it represents an exceptional path, meaning if we moved it out, then all correct calls to fulfill are subjected to a double check.

00:18:25.640 Now let’s assess the performance improvement. As we can observe, the implementations and timings are significantly enhanced.

00:18:48.920 Now it's time to examine how we could improve concurrency specifically for JRuby.

00:19:02.850 We’ll employ the Rubinius implementation since JRuby also implements parts of the Ruby core API.

00:19:18.860 In these three implementations, the instance variables are not volatile, and method calls are not protected.

00:19:36.260 This means that if we recall that initial Future implementation, the constructor was not correctly addressing potential issues with uninitialized instance variables.

00:19:57.820 Therefore, we need to resolve this to guarantee that final instance variables are assigned only once in initializers.

00:20:13.800 By ensuring that they are consistently shared across threads, we can avoid unexpected uninitialized accesses.

00:20:31.350 Let’s take a look at our example regarding Rubinius implementation. Here we again need a lock and condition to manage the slow path.

00:20:47.710 Additionally, we will be storing the value safely because we need to prevent it from appearing uninitialized.

00:21:05.500 To do this, we will use an atomic reference that encapsulates semantics ensuring that every assignment is visible.

00:21:25.990 The complete method reads the value using the appropriate semantics, ensuring that it reflects the most current data.

00:21:44.180 In the value implementation, we follow the same algorithm established previously, checking if the value is already set.

00:22:03.980 If it has been set, we return immediately; if not, we enter the critical section and block the thread until it is fulfilled.

00:22:21.220 In the JRuby implementation, the format is similar; we simply switch the Ruby-specific components for the JRuby-specific counterparts.

00:22:51.440 For securing the value custody, we utilize an atomic reference from the Java concurrent atomic package, and we establish a full memory barrier to ensure orderly execution.

00:23:14.119 With this memory barrier in place, we can follow through with our entire implementation.

00:23:26.100 The complete method reads the volatile variable, reflecting what the current value is.

00:23:39.630 The remaining flow through the value implementation and fulfillment retains its structure since we're ensuring that we always check conditions correctly.

00:23:51.900 As for the fulfill process in JRuby, it maintains the order established in earlier points as it intelligently replaces Ruby mutex calls with Java's intrinsic synchronization.

00:24:05.130 This allows it to minimize overhead in regard to blocking on fulfill calls, ensuring performance remains steady.

00:24:22.380 In conclusion, JRuby implementation has shown its capabilities to handle concurrent code efficiently, enhancing the relative performance across all implementations.

00:24:41.790 Considering all our observed implementations at this point, it becomes clear that each iteration demonstrates distinct variations in performance.

00:25:06.420 We have found that JRuby provides significant speed improvements, which is promising for concurrent execution.

00:25:30.200 To summarize, if you require a concurrent solution, always refer to the concurrent-ruby gem first since it offers many of the tools already implemented.

00:25:45.040 If you do not find your requirements there, please reach out to us; we can assist with incorporating the functionality you need.

00:26:01.550 If you are embarking on drafting a new concurrent abstraction, use our offered layer as it provides portability across all Ruby implementations.

00:26:21.310 We will keep this maintained and evolve it along with future releases of Ruby and JRuby.

00:26:39.580 You can learn more about Concurrent Ruby by following the provided links, or by reaching out to me directly through social media.

00:27:02.600 That concludes my presentation. Thank you for your attention, and I look forward to your questions.

00:32:17.680 As for your question about the definition of volatile in this context, it pertains to ensuring visibility during concurrency.

00:32:22.910 In Java, it guarantees that once a variable is written, any thread reading it sees all changes leading to that variable.

00:32:29.060 The distinctions made by other programming languages, like Clojure or Elixir, indeed take different approaches, often isolating state or imposing various guarantees.

00:32:50.150 Thank you once again. If there are further questions, please seek me out.