00:00:15.980
Hello, this is the time to talk about parallel programming in Ruby 3 with Guild. My English is not very good, so you can download this presentation; I uploaded the link on Twitter. Today, I want to discuss several items, starting with a brief introduction to Ruby 2.6 and the improvements it brings. I will also highlight some contributions of mine and focus on the main topic: Guild.
00:00:42.750
I proposed the idea of Guild at RubyConf 2016, and I want to share with you the progress we have made. As a programmer, I have changed jobs several times, but my mission remained the same: to improve the performance of Ruby's runtime and garbage collection. Currently, I am a member of the Cookpad team, so please stop by our Cookpad booth during this Ruby conference. Additionally, I am a father, and I was the youngest attendee at Ladies' Code Tokyo.
00:01:04.230
In this presentation, I will show you various performance improvements introduced in Ruby 3. The most significant change is the introduction of the MJIT compiler, but there are also several other optimizations that will benefit your applications. For example, method call performance has improved by 1.4 times, and the speed of block passing has increased by 2.6 times compared to all previous Ruby versions.
00:01:17.510
One of the noteworthy features is the introduction of the transient heap, which is a new memory management mechanism. I won’t delve into the details of Ruby's memory management now, but typically, we use malloc and free functions to allocate and free memory. However, traditional methods have performance issues related to speed and space, primarily due to fragmentation. The transient heap addresses these memory issues.
00:01:43.070
While I won’t provide too many details today, as we only have 40 minutes, I want to convey the concept of the transient heap. It utilizes generational garbage collection techniques. Generally speaking, I can't fully implement a moving memory technique due to the conservative garbage collection we use; however, with some limitations and specific hacks, I introduced this moving technique to maintain compatibility with existing code.
00:02:05.419
The transient heap speeds up allocation for ephemeral memories—those that are not long-lived. If you allocate memory and free it immediately, it's considered ephemeral. Currently, we support transient heap for array objects, struct, and a small number of other hot objects. Strings are a major target for the transient heap, but fully supporting them is a future task.
00:02:30.480
The following illustrates how effective allocation and garbage collection are with the transient heap compared to versions without it. The x-axis displays the number of elements in an array, while the y-axis shows the speed improvements. We observe no performance improvements with arrays of 0 to 3 elements as we use alternative optimized techniques. However, we can see significant performance improvements for 4 or more elements.
00:02:55.759
We see about a 50% speed increase in allocation and deallocation for small objects. For larger objects containing more than 80 elements, we do not observe speedups because we do not utilize the transient heap for large data structures. In summary, the transient heap can significantly enhance the performance of Ruby 3, allowing developers to improve application speed.
00:03:10.890
However, the performance improvements can depend on the specific application, so I encourage you to try it on your own applications. Now, let's return to the main topic: Guild. Guild is a new concurrency abstraction that eliminates the sharing of mutable objects in Ruby 3. The current specifications and implementation details are still a work in progress, and we have only preliminary implementations available.
00:03:40.360
We highly welcome your comments or contributions if you have any interest in concurrent programming. The motivation behind Guild stems from two main concerns: productivity and parallel computing. Many developers struggle with creating thread-safe programs largely due to shared mutable objects, which complicate concurrent programming for Ruby.
00:04:01.810
Ruby is a productive language, and my goal is to maintain that productivity while simplifying concurrent programming. The second motivation involves harnessing multiple cores; modern computers have numerous cores, and we need to utilize them efficiently. As of now, Ruby doesn't adequately support concurrent programming, and my objective is to create a convenient way for developers to take advantage of multiple cores with Guild.
00:04:27.170
I proposed the Guild idea at RubyConf 2016. The concept is straightforward: the challenges of threading in programming primarily arise from the risks associated with sharing mutable objects. Guild addresses this by prohibiting the sharing of mutable objects between Guilds. This elementary concept can significantly improve productivity in Ruby by providing a structured way to manage concurrency.
00:04:55.990
I will now outline the current design of Guild and discuss ongoing topics. The interpreter can manage multiple Guilds, with each Guild operating on its own thread. While multiple Guilds cannot run in parallel, threads belonging to different Guilds can execute concurrently. By creating multiple Guilds, we can achieve parallel programming in Ruby.
00:05:17.149
Here is a simple example of creating two Guilds. The syntax is similar to Ruby's standard threading model; you can pass blocks to the Guild methods, allowing each block to execute in parallel within their respective Guild. However, sharing mutable objects across these Guilds is not allowed.
00:05:41.340
Nevertheless, there are specific objects that can be shared between Guilds, which we define as shareable objects. By distinguishing between shareable and non-shareable objects, developers can write programs that utilize mutable states without worrying about thread safety. Since mutable objects cannot be shared between Guilds, it’s impossible to create thread-unsafe programs within Guild data rate.
00:06:04.539
Most objects are thread-local for conventional concurrent programs. Thus, even while only a few objects are shared, it’s essential to focus on these shared objects to develop thread-safe concurrent programs. The distinguishing characteristic of non-shareable objects is that they encompass most of the typical objects we encounter, such as strings, arrays, and hashes, which are also mutable.
00:06:23.710
This means that in one Guild, we can interact with non-shareable objects without the challenge of concurrency. If you use just one Guild, you maintain compatibility with existing Ruby applications. We categorize shareable objects into four types: immutable objects, cross-module objects, special mutable objects, and isolated objects.
00:06:44.470
An important assumption in Guild is that shareable objects only refer to shareable objects. If a shareable object points to a non-shareable object, that could lead to concurrency issues. Sharing an immutable object is straightforward since it cannot be mutated, thus eliminating any data race conditions. However, we need to carefully consider what constitutes an immutable object.
00:07:07.510
For instance, if an array is frozen but contains mutable objects, it is not truly immutable, as it allows mutation through these references. We need strategies to ensure that objects intended to be immutable truly remain so, potentially by introducing deeper freeze mechanisms. Other types of objects, like strings and symbols, are naturally immutable and easier to manage.
00:07:34.120
Cross-module objects can also be shared, but they require careful handling as classes and modules can be mutable. If we allow references to mutable objects from classes or modules, we need to establish rigorous protocols to manage these objects. I will not go into more detail about this today due to time constraints, but it is a critical consideration in Guild’s design.
00:08:06.060
Regarding special mutable objects, sometimes we need to create structures that allow sharing data, such as shared arrays or hashes. To do so, we need to implement special protocols for accessing these structures efficiently and safely. For example, we may need to use locks or transactional handling to ensure data integrity.
00:08:28.270
However, the overhead of adding such protocols may deter their use in most applications. This is acceptable since we expect that only a few shared objects will be necessary in most concurrency scenarios.
00:08:41.430
The last category, isolated proc objects, introduces additional complexities. These proc objects can reference local variables, which pose a challenge when sharing across Guilds. To address this, we can implement a proc isolate method that creates a duplicate proc object without local references.
00:09:02.320
This method allows data encapsulation and helps maintain the integrity of data within Guilds. As a result, isolated proc objects can be passed between Guilds without exposing their local context. This mechanism allows for easier sharing of code without the risk of data races.
00:09:16.890
Now, I want to share some information regarding Guild and its implementation. Other programming languages employ similar ideas to those in Guild, creating limitations on state sharing. Languages like JavaScript and Erlang have implemented such mechanisms, promoting similar approaches to concurrency.
00:09:37.150
I now want to discuss the design of communication between Guilds for sharing information effectively. The destination is determined by the identity of the Guild object, such as a process, and when sending shareable objects, only the differences are transmitted, which optimizes performance.
00:09:55.900
If non-shareable objects are involved, we implement copy and move methods. This enables a simple server-client program using Guild communication API. When sending objects, one Guild can copy or move objects while ensuring protocols are followed.
00:10:17.850
In the case of copying, the original objects along with their children are duplicated. Alternatively, moving objects means they can no longer be accessed from the original space, significantly improving the performance of resource-heavy tasks.
00:10:31.930
This enhances overall efficiency, especially when dealing with large I/O objects or complex data structures like strings. The idea is to use a master-worker model in Guild, where we can scale workloads across multiple processes.
00:10:49.560
So far, I've shared preliminary implementations, but they do have bugs. It's essential to refine and improve synchronization among components to ensure seamless operation when running multiple Guild instances.
00:11:05.150
Garbage collection presents an ongoing challenge since we utilize a single object space across implementations. Addressing bugs during garbage collection process is necessary before offering stable releases.
00:11:17.839
As we work towards the 3.6 release of Ruby, we are dedicating efforts to refine Guild, enhance performance, introduce special protocols for synchronization, and ascertain protocols for inter-guild communication. Continuous improvement is necessary, and there is ample opportunity to optimize performance.
00:11:44.900
The naming of Guild, while meaningful at the onset, is still a work in progress. We're currently evaluating alternatives to ensure its proper representation within the Ruby environment.
00:12:02.439
I have prepared a demonstration involving a machine with 40 virtual CPUs, which means we are using two physical CPUs with hyper-threading enabled. This allows for significant parallel processing where we can simultaneously execute multiple tasks.
00:12:27.960
The demonstration involves calculating Fibonacci numbers using Guild, showcasing the potential speed-ups we can achieve compared to traditional single-threaded execution methods. By utilizing Guild's parallel processing capabilities, we can expect a performance increase by as much as 15 to 16 times.
00:12:50.599
Another workload will focus on multiple tasks using different performance measures. In this case, we can see that our serial execution takes about three hours, but with Guild, we need only 30 minutes, demonstrating a truly significant enhancement.
00:13:14.545
We experience notable performance improvements regarding Guild’s implementation; for example, calculating a smaller set of Fibonacci numbers results in negligible enhancement due to overhead.
00:13:32.700
However, as we broaden the scale of the workloads, the opportunities for optimization expand significantly.
00:13:44.650
The last demonstration focuses on computational tasks using a framework similar to the master-worker model. Here, we can assess execution times and performance ratios effectively.
00:14:02.020
In summary, the presentation covered Ruby 3.6 updates, my contributions concerning Guild, and the discussions surrounding its implementation and demonstration. Although we didn't have time for Q&A, I welcome any questions or comments you may have.