Memory Management

Parallel programming in Ruby3 with Guild

Parallel programming in Ruby3 with Guild

by Koichi Sasada

This presentation by Koichi Sasada at RubyConf 2018 introduces parallel programming in Ruby 3 through a new concurrency abstraction named Guild. The presentation begins with a brief overview of Ruby 2.6's performance improvements, including enhancements to method call speed and block passing efficiency, which set the stage for discussing the new features in Ruby 3. The most notable change is the introduction of the MJIT compiler and transient heap memory management, which addresses performance issues related to traditional memory allocation methods.

Key points discussed include:
- Transient Heap: A memory management technique that speedily allocates ephemeral memory, providing about 50% faster allocation and garbage collection for small objects while improving performance for array operations with four or more elements.
- Guild Overview: Guild is designed to simplify concurrent programming in Ruby by eliminating mutable object sharing, thereby reducing the complexities and risks associated with thread safety. Guild allows multiple Guilds to operate concurrently on different threads, promoting productivity while leveraging multi-core architectures.
- Guild Design: Each Guild can manage its own thread, prohibiting mutable object sharing to mitigate concurrency issues. This is achieved through shareable and non-shareable object distinctions, with immutable objects being easier to manage.
- Communication Between Guilds: Efficient sharing of information is enabled through protocols that only transmit changes, optimizing performance, especially for large data structures and I/O-heavy operations.
- Demonstration of Performance: The presenter showcased real-world applications, such as calculating Fibonacci numbers, demonstrating a performance improvement of up to 15 to 16 times when using Guild compared to traditional methods. A typical workload that takes three hours serially could be reduced to about 30 minutes using Guild.

In conclusion, Sasada emphasizes ongoing efforts to refine Guild to ensure stable releases and enhance performance ahead of Ruby's 3.6 update. He welcomes feedback and contributions from the community to develop this promising concurrency model further. Though not covered in detail due to time constraints, discussions on the challenges of garbage collection and synchronization were noted as areas needing continued focus.

00:00:15.980 Hello, this is the time to talk about parallel programming in Ruby 3 with Guild. My English is not very good, so you can download this presentation; I uploaded the link on Twitter. Today, I want to discuss several items, starting with a brief introduction to Ruby 2.6 and the improvements it brings. I will also highlight some contributions of mine and focus on the main topic: Guild.
00:00:42.750 I proposed the idea of Guild at RubyConf 2016, and I want to share with you the progress we have made. As a programmer, I have changed jobs several times, but my mission remained the same: to improve the performance of Ruby's runtime and garbage collection. Currently, I am a member of the Cookpad team, so please stop by our Cookpad booth during this Ruby conference. Additionally, I am a father, and I was the youngest attendee at Ladies' Code Tokyo.
00:01:04.230 In this presentation, I will show you various performance improvements introduced in Ruby 3. The most significant change is the introduction of the MJIT compiler, but there are also several other optimizations that will benefit your applications. For example, method call performance has improved by 1.4 times, and the speed of block passing has increased by 2.6 times compared to all previous Ruby versions.
00:01:17.510 One of the noteworthy features is the introduction of the transient heap, which is a new memory management mechanism. I won’t delve into the details of Ruby's memory management now, but typically, we use malloc and free functions to allocate and free memory. However, traditional methods have performance issues related to speed and space, primarily due to fragmentation. The transient heap addresses these memory issues.
00:01:43.070 While I won’t provide too many details today, as we only have 40 minutes, I want to convey the concept of the transient heap. It utilizes generational garbage collection techniques. Generally speaking, I can't fully implement a moving memory technique due to the conservative garbage collection we use; however, with some limitations and specific hacks, I introduced this moving technique to maintain compatibility with existing code.
00:02:05.419 The transient heap speeds up allocation for ephemeral memories—those that are not long-lived. If you allocate memory and free it immediately, it's considered ephemeral. Currently, we support transient heap for array objects, struct, and a small number of other hot objects. Strings are a major target for the transient heap, but fully supporting them is a future task.
00:02:30.480 The following illustrates how effective allocation and garbage collection are with the transient heap compared to versions without it. The x-axis displays the number of elements in an array, while the y-axis shows the speed improvements. We observe no performance improvements with arrays of 0 to 3 elements as we use alternative optimized techniques. However, we can see significant performance improvements for 4 or more elements.
00:02:55.759 We see about a 50% speed increase in allocation and deallocation for small objects. For larger objects containing more than 80 elements, we do not observe speedups because we do not utilize the transient heap for large data structures. In summary, the transient heap can significantly enhance the performance of Ruby 3, allowing developers to improve application speed.
00:03:10.890 However, the performance improvements can depend on the specific application, so I encourage you to try it on your own applications. Now, let's return to the main topic: Guild. Guild is a new concurrency abstraction that eliminates the sharing of mutable objects in Ruby 3. The current specifications and implementation details are still a work in progress, and we have only preliminary implementations available.
00:03:40.360 We highly welcome your comments or contributions if you have any interest in concurrent programming. The motivation behind Guild stems from two main concerns: productivity and parallel computing. Many developers struggle with creating thread-safe programs largely due to shared mutable objects, which complicate concurrent programming for Ruby.
00:04:01.810 Ruby is a productive language, and my goal is to maintain that productivity while simplifying concurrent programming. The second motivation involves harnessing multiple cores; modern computers have numerous cores, and we need to utilize them efficiently. As of now, Ruby doesn't adequately support concurrent programming, and my objective is to create a convenient way for developers to take advantage of multiple cores with Guild.
00:04:27.170 I proposed the Guild idea at RubyConf 2016. The concept is straightforward: the challenges of threading in programming primarily arise from the risks associated with sharing mutable objects. Guild addresses this by prohibiting the sharing of mutable objects between Guilds. This elementary concept can significantly improve productivity in Ruby by providing a structured way to manage concurrency.
00:04:55.990 I will now outline the current design of Guild and discuss ongoing topics. The interpreter can manage multiple Guilds, with each Guild operating on its own thread. While multiple Guilds cannot run in parallel, threads belonging to different Guilds can execute concurrently. By creating multiple Guilds, we can achieve parallel programming in Ruby.
00:05:17.149 Here is a simple example of creating two Guilds. The syntax is similar to Ruby's standard threading model; you can pass blocks to the Guild methods, allowing each block to execute in parallel within their respective Guild. However, sharing mutable objects across these Guilds is not allowed.
00:05:41.340 Nevertheless, there are specific objects that can be shared between Guilds, which we define as shareable objects. By distinguishing between shareable and non-shareable objects, developers can write programs that utilize mutable states without worrying about thread safety. Since mutable objects cannot be shared between Guilds, it’s impossible to create thread-unsafe programs within Guild data rate.
00:06:04.539 Most objects are thread-local for conventional concurrent programs. Thus, even while only a few objects are shared, it’s essential to focus on these shared objects to develop thread-safe concurrent programs. The distinguishing characteristic of non-shareable objects is that they encompass most of the typical objects we encounter, such as strings, arrays, and hashes, which are also mutable.
00:06:23.710 This means that in one Guild, we can interact with non-shareable objects without the challenge of concurrency. If you use just one Guild, you maintain compatibility with existing Ruby applications. We categorize shareable objects into four types: immutable objects, cross-module objects, special mutable objects, and isolated objects.
00:06:44.470 An important assumption in Guild is that shareable objects only refer to shareable objects. If a shareable object points to a non-shareable object, that could lead to concurrency issues. Sharing an immutable object is straightforward since it cannot be mutated, thus eliminating any data race conditions. However, we need to carefully consider what constitutes an immutable object.
00:07:07.510 For instance, if an array is frozen but contains mutable objects, it is not truly immutable, as it allows mutation through these references. We need strategies to ensure that objects intended to be immutable truly remain so, potentially by introducing deeper freeze mechanisms. Other types of objects, like strings and symbols, are naturally immutable and easier to manage.
00:07:34.120 Cross-module objects can also be shared, but they require careful handling as classes and modules can be mutable. If we allow references to mutable objects from classes or modules, we need to establish rigorous protocols to manage these objects. I will not go into more detail about this today due to time constraints, but it is a critical consideration in Guild’s design.
00:08:06.060 Regarding special mutable objects, sometimes we need to create structures that allow sharing data, such as shared arrays or hashes. To do so, we need to implement special protocols for accessing these structures efficiently and safely. For example, we may need to use locks or transactional handling to ensure data integrity.
00:08:28.270 However, the overhead of adding such protocols may deter their use in most applications. This is acceptable since we expect that only a few shared objects will be necessary in most concurrency scenarios.
00:08:41.430 The last category, isolated proc objects, introduces additional complexities. These proc objects can reference local variables, which pose a challenge when sharing across Guilds. To address this, we can implement a proc isolate method that creates a duplicate proc object without local references.
00:09:02.320 This method allows data encapsulation and helps maintain the integrity of data within Guilds. As a result, isolated proc objects can be passed between Guilds without exposing their local context. This mechanism allows for easier sharing of code without the risk of data races.
00:09:16.890 Now, I want to share some information regarding Guild and its implementation. Other programming languages employ similar ideas to those in Guild, creating limitations on state sharing. Languages like JavaScript and Erlang have implemented such mechanisms, promoting similar approaches to concurrency.
00:09:37.150 I now want to discuss the design of communication between Guilds for sharing information effectively. The destination is determined by the identity of the Guild object, such as a process, and when sending shareable objects, only the differences are transmitted, which optimizes performance.
00:09:55.900 If non-shareable objects are involved, we implement copy and move methods. This enables a simple server-client program using Guild communication API. When sending objects, one Guild can copy or move objects while ensuring protocols are followed.
00:10:17.850 In the case of copying, the original objects along with their children are duplicated. Alternatively, moving objects means they can no longer be accessed from the original space, significantly improving the performance of resource-heavy tasks.
00:10:31.930 This enhances overall efficiency, especially when dealing with large I/O objects or complex data structures like strings. The idea is to use a master-worker model in Guild, where we can scale workloads across multiple processes.
00:10:49.560 So far, I've shared preliminary implementations, but they do have bugs. It's essential to refine and improve synchronization among components to ensure seamless operation when running multiple Guild instances.
00:11:05.150 Garbage collection presents an ongoing challenge since we utilize a single object space across implementations. Addressing bugs during garbage collection process is necessary before offering stable releases.
00:11:17.839 As we work towards the 3.6 release of Ruby, we are dedicating efforts to refine Guild, enhance performance, introduce special protocols for synchronization, and ascertain protocols for inter-guild communication. Continuous improvement is necessary, and there is ample opportunity to optimize performance.
00:11:44.900 The naming of Guild, while meaningful at the onset, is still a work in progress. We're currently evaluating alternatives to ensure its proper representation within the Ruby environment.
00:12:02.439 I have prepared a demonstration involving a machine with 40 virtual CPUs, which means we are using two physical CPUs with hyper-threading enabled. This allows for significant parallel processing where we can simultaneously execute multiple tasks.
00:12:27.960 The demonstration involves calculating Fibonacci numbers using Guild, showcasing the potential speed-ups we can achieve compared to traditional single-threaded execution methods. By utilizing Guild's parallel processing capabilities, we can expect a performance increase by as much as 15 to 16 times.
00:12:50.599 Another workload will focus on multiple tasks using different performance measures. In this case, we can see that our serial execution takes about three hours, but with Guild, we need only 30 minutes, demonstrating a truly significant enhancement.
00:13:14.545 We experience notable performance improvements regarding Guild’s implementation; for example, calculating a smaller set of Fibonacci numbers results in negligible enhancement due to overhead.
00:13:32.700 However, as we broaden the scale of the workloads, the opportunities for optimization expand significantly.
00:13:44.650 The last demonstration focuses on computational tasks using a framework similar to the master-worker model. Here, we can assess execution times and performance ratios effectively.
00:14:02.020 In summary, the presentation covered Ruby 3.6 updates, my contributions concerning Guild, and the discussions surrounding its implementation and demonstration. Although we didn't have time for Q&A, I welcome any questions or comments you may have.