00:00:14.440
Good evening everyone. Thank you for coming. My name is Petr Chalupa.
00:00:20.000
I work on Concurrent Ruby, which is basically a collection of high-level and low-level concurrent abstractions. It is open-source, meaning we are not trying to force any particular solution on you.
00:00:33.830
You can choose the best solution that works for your specific problem. We support MRI, Ruby 19, and are currently working on support for Rubinius, as well. This gem is already used in several projects, including Rails 5, Sucker Punch, and Line Flow, and we have just released version 1.0.
00:00:47.570
From the high-level abstractions, we have several features like chainable futures, Go-inspired channels, actors, closure-inspired agents, transactional memory, and atomic references. We also have some low-level synchronization primitives like countdown latches, semaphores, cyclic barriers, and more, which can help create various kinds of abstractions easily.
00:01:09.049
The second project I want to introduce is Jariabek. Jariabek is an experimental backend for JRuby, and it is part of the same repository, meaning it is also open-source.
00:01:17.720
Essentially, it is an Abstract Syntax Tree (AST) interpreter with a key feature: it is a self-optimizing interpreter. This means that as the interpreter executes code, it profiles which branches were taken and what types were involved. Consequently, the nodes can specialize and, after some time, we can assume that it has stabilized.
00:01:41.980
Once we have that stable tree of nodes, we can feed it to a GraalVM compiler, which can produce highly optimized machine code. If some disciplinary assumptions fail, we can invalidate the compiled code, reverting to the interpreter, but allowing it to run off the more specialized code again for subsequent compilations.
00:02:02.810
The gist of it is that Jariabek aims to be quite fast, and we will review some results later. It supports all Ruby implementations without restrictions and even tackles trickier aspects like debugging and the object space. These features are always on and do not introduce overhead if you are not using them.
00:02:17.720
We have approximately 90% compatibility based on Ruby specifications, and as I mentioned, it resides within the same repository as JRuby, which is included by default. So if you add option X to the path, you can already test micro-benchmarks or very small projects.
00:02:36.110
Now, let’s move to the main topic of this talk, which is implementing a concurrent abstraction. To begin with, I’d like to discuss why concurrency can be challenging. The main reason is that processors and compilers are allowed to reorder how our code is executed, as long as they maintain sequential consistency. You might think it's beneficial to prohibit this behavior, but that’s actually undesirable because it allows compilers and processors to perform many optimizations.
00:04:02.150
However, the consequence of this reordering is that another thread may see strange or unexpected values. Since one thread executes code in a different order than what we wrote, it can lead to issues when it comes to memory access.
00:04:35.840
To reason effectively about this concurrency, we must account for all possible valid orders in which our code can be executed. To construct these orders, we need a framework to help us make sense of them, which we refer to as a memory model. I will touch on this more later.
00:05:12.360
Of course, this has a significant impact on implementations. For example, when the Ruby 2.0.0+ is optimized, it can execute your Ruby code with fewer instructions, so you can often observe these effects of reordering more frequently in practice.
00:05:49.949
Let me show you an example of what can happen under these circumstances. Consider a simple class with one field called 'value.' If we create two instances of this class, we have two writes to memory: the first and another one. If we read them both later, the compiler can optimize the execution.
00:07:02.960
Because of this optimization, it may switch the order of these writes. If that happens, you might think at first that the instance was filled correctly when, in reality, it was not. This can result in unexpected behavior.
00:07:35.050
This leads us to the actual implementation part. I will begin with a simple abstraction called ‘Future,’ which essentially serves as a reference for a computation that hasn’t yet been completed. It features a very simple API: you can create a new Future, fulfill it with a value, or retrieve that value using a blocking method if the future was not yet fulfilled.
00:08:12.640
Alternatively, you can check if the Future is completed, which is a non-blocking operation.
00:08:39.849
Let's see a simple way this can be applied. With Futures, we can build simple background processing. At the top, there's a simple helper that prints timestamps alongside its outputs. We implement background processing by creating a shared queue for jobs, and then we have two worker threads that continuously pop job pairs from the queue.
00:09:03.399
These worker threads compute the result and fulfill the Future with that result, finally printing the result whenever it gets computed. This allows us to construct a simple async helper that executes Ruby blocks or blocks of code in a non-blocking fashion.
00:09:32.680
This method returns immediately; it gives back a Future instance. You can then create the Future and call ‘value’ which will block your thread until the job is computed.
00:10:00.980
Here, we create an array of five jobs, each multiplying its index by two. The call returns almost instantly, and this check ensures that the array contains the Futures.
00:10:21.070
At the end of this process, the thread will block until all the values are computed, as we can see when we run this example. Note that we placed a sleep in the implementation to simulate computing time.
00:10:38.560
Now let's take a look at the first implementation of Future. For this, we’ll use tools that are already present in the Ruby standard library.
00:11:04.730
For example, we use mutexes and condition variables. The mutex essentially acts as a lock, while condition variables allow a thread to block until a certain condition is met.
00:11:30.310
In our case, the condition would be that the Future is fulfilled. We also utilize instance variables to store the current value of the Future. The lock allows you to create a synchronized method, ensuring that only one thread can access that block of code at a time.
00:11:56.310
Additionally, it has the property that when one thread makes changes within this section, another thread entering the same section always sees the changes made by the first thread.
00:12:20.929
With this in place, we can implement the Future class and focus on the complete method. We begin by reading the current value.
00:12:39.660
To do this, we protect it with the synchronization to verify if it’s pending or not.
00:13:00.039
In the value method implementation, we again utilize the critical section to ensure the operation is atomic. We only block the thread here if the future has not been completed.
00:13:23.310
As soon as the Future is fulfilled, it wakes up the blocked thread using the broadcast method on the condition.
00:13:41.420
This is why it's essential to have these critical sections. Let’s take a look at how this performs under different implementations, benchmarking complete, value, and fulfill methods.
00:14:18.840
We will simulate five million operations for each micro benchmark. Initially, we’ll focus on the first approach.
00:15:00.330
Next, let’s consider how to enhance the performance of this. Starting with the MRI implementation, we notice that synchronization can be quite costly.
00:15:20.480
Avoiding this synchronization would be beneficial, especially since MRI employs a Global Virtual Machine Lock (GVL). By examining the source code, we find that the GVL actually functions similarly to Ruby mutexes.
00:15:35.850
This means that when a thread releases the GVL and another acquires it, other threads will always see all changes made by the first thread. This results in instance variables acting effectively volatile.
00:15:54.400
However, this behavior is undocumented, and it's surprising to note that many Ruby codes and libraries depend on it, whether intentionally or not.
00:16:07.640
So let’s take a closer look at the specific implementation focusing on MRI. We'll still need to incorporate mutexes and condition variables, so we can protect our reads.
00:16:32.530
In our complete implementation, we need not protect the reading from this variable. Instead, we can simply read this volatile value and verify if it’s pending.
00:16:57.420
In the value implementation, we can avoid going through the synchronized block by first checking if the value is completed or not.
00:17:20.100
If it is complete, we return immediately. If not, we will still have to navigate through the critical section again to check the status.
00:17:38.780
Thus, if we're invoking fulfill, it's crucial to understand why we can't simply move that check out as we did with the value method.
00:18:00.170
This because it represents an exceptional path, meaning if we moved it out, then all correct calls to fulfill are subjected to a double check.
00:18:25.640
Now let’s assess the performance improvement. As we can observe, the implementations and timings are significantly enhanced.
00:18:48.920
Now it's time to examine how we could improve concurrency specifically for JRuby.
00:19:02.850
We’ll employ the Rubinius implementation since JRuby also implements parts of the Ruby core API.
00:19:18.860
In these three implementations, the instance variables are not volatile, and method calls are not protected.
00:19:36.260
This means that if we recall that initial Future implementation, the constructor was not correctly addressing potential issues with uninitialized instance variables.
00:19:57.820
Therefore, we need to resolve this to guarantee that final instance variables are assigned only once in initializers.
00:20:13.800
By ensuring that they are consistently shared across threads, we can avoid unexpected uninitialized accesses.
00:20:31.350
Let’s take a look at our example regarding Rubinius implementation. Here we again need a lock and condition to manage the slow path.
00:20:47.710
Additionally, we will be storing the value safely because we need to prevent it from appearing uninitialized.
00:21:05.500
To do this, we will use an atomic reference that encapsulates semantics ensuring that every assignment is visible.
00:21:25.990
The complete method reads the value using the appropriate semantics, ensuring that it reflects the most current data.
00:21:44.180
In the value implementation, we follow the same algorithm established previously, checking if the value is already set.
00:22:03.980
If it has been set, we return immediately; if not, we enter the critical section and block the thread until it is fulfilled.
00:22:21.220
In the JRuby implementation, the format is similar; we simply switch the Ruby-specific components for the JRuby-specific counterparts.
00:22:51.440
For securing the value custody, we utilize an atomic reference from the Java concurrent atomic package, and we establish a full memory barrier to ensure orderly execution.
00:23:14.119
With this memory barrier in place, we can follow through with our entire implementation.
00:23:26.100
The complete method reads the volatile variable, reflecting what the current value is.
00:23:39.630
The remaining flow through the value implementation and fulfillment retains its structure since we're ensuring that we always check conditions correctly.
00:23:51.900
As for the fulfill process in JRuby, it maintains the order established in earlier points as it intelligently replaces Ruby mutex calls with Java's intrinsic synchronization.
00:24:05.130
This allows it to minimize overhead in regard to blocking on fulfill calls, ensuring performance remains steady.
00:24:22.380
In conclusion, JRuby implementation has shown its capabilities to handle concurrent code efficiently, enhancing the relative performance across all implementations.
00:24:41.790
Considering all our observed implementations at this point, it becomes clear that each iteration demonstrates distinct variations in performance.
00:25:06.420
We have found that JRuby provides significant speed improvements, which is promising for concurrent execution.
00:25:30.200
To summarize, if you require a concurrent solution, always refer to the concurrent-ruby gem first since it offers many of the tools already implemented.
00:25:45.040
If you do not find your requirements there, please reach out to us; we can assist with incorporating the functionality you need.
00:26:01.550
If you are embarking on drafting a new concurrent abstraction, use our offered layer as it provides portability across all Ruby implementations.
00:26:21.310
We will keep this maintained and evolve it along with future releases of Ruby and JRuby.
00:26:39.580
You can learn more about Concurrent Ruby by following the provided links, or by reaching out to me directly through social media.
00:27:02.600
That concludes my presentation. Thank you for your attention, and I look forward to your questions.
00:32:17.680
As for your question about the definition of volatile in this context, it pertains to ensuring visibility during concurrency.
00:32:22.910
In Java, it guarantees that once a variable is written, any thread reading it sees all changes leading to that variable.
00:32:29.060
The distinctions made by other programming languages, like Clojure or Elixir, indeed take different approaches, often isolating state or imposing various guarantees.
00:32:50.150
Thank you once again. If there are further questions, please seek me out.