Memory Management

Ractor Enhancements, 2024

Ractor Enhancements, 2024

by Koichi Sasada

In this video presented at RubyKaigi 2024, Koichi Sasada discusses the enhancements and developments in the Lure system, a Ruby on Rails component designed to facilitate parallel computing while mitigating the bugs associated with traditional threading. The talk focuses on three major aspects: the implementation of the 'require' method for child threads, the introduction of a timeout feature, and performance issues related to memory management. The following key points were outlined:

  • Introducing Lure System: The Lure system was introduced in Ruby 3.0 to enable safe parallel computing by separating objects into shareable and unshareable categories, minimizing object sharing to prevent critical bugs.
  • Limitations: Despite its strong concurrency features, the Lure system currently imposes restrictions that prevent the use of important features like 'require' and 'timeout' in child threads, impacting usability and performance.
  • Lure#interactive_exec: To address the 'require' limitation, Sasada introduces a new method, 'Lure#interactive_exec', enabling expressions to be executed in the main Lure thread by child Lures asynchronously, although this carries some risks.
  • Timeout Feature: Sasada proposes enhancements to the current timeout mechanism, suggesting the creation of a dedicated timeout monitor for each Lure to facilitate communication between threads. However, potential scalability issues arise with a large number of Lures.
  • Performance Challenges: Discussion highlights how increased Lure usage leads to more frequent garbage collection, suggesting that while performance can improve significantly (up to 70 times), garbage collection can introduce additional challenges, especially with memory management.

In conclusion, Sasada calls for community engagement and support to implement these new features and improvements in the Lure system, ultimately aiming to enhance the capabilities of Ruby for concurrent programming. The session reflects ongoing initiatives to refine memory management and optimize the programming environment for Ruby developers.

00:00:07.520 Hi, good afternoon. This talk is the last one at the LUIC on the first day.
00:00:14.960 Today, I want to present a progress report on the Lure system for this year.
00:00:21.119 I'm Koichi Sasada from Stores.
00:00:27.359 I will discuss some improvements and implementations for significant features in ROIs.
00:00:34.160 These features include require and timeout, and I will also present a survey analyzing memory management in the Lure system.
00:00:41.200 I'll highlight some potential future pull requests as well.
00:00:49.079 Let me introduce myself again. My name is Koichi Sasada, and I work at Stores.
00:00:57.879 I am very happy to be here this year, and I am the author of the YB machine, garbage collection in the Lure system, and the scheduler system.
00:01:06.320 Furthermore, I am the director of the Ruby Association.
00:01:13.600 So, let's start the talk. Lure is a Ruby on Rails component introduced in Ruby 3.0 and is designed to enable parallel computing in Ruby.
00:01:22.560 Ruby has threads, but threads do not run in parallel on MRI (Matz's Ruby Interpreter) because parallel computing with threads can introduce critical bugs.
00:01:30.560 To prevent such issues, we designed the Lure system.
00:01:35.759 This system features robust concurrent programming.
00:01:42.720 Its robust nature enables bug-free parallel computing due to no object sharing or very limited object sharing.
00:01:52.079 However, limiting object sharing between Lure threads requires introducing strong limitations to the Ruby language.
00:01:57.840 We separate all objects into unshareable objects and shareable objects. Most Ruby objects, such as strings, arrays, and most user-defined classes and modules, are unshareable.
00:02:13.959 We also have some special shareable objects like immutable objects, some special classes, modules, and Lure objects themselves.
00:02:22.840 Because of this separation, we define that shareable objects can be shared between Lure threads, while unshareable ones cannot.
00:02:31.000 Although this is a simple rule, maintaining it necessitates introducing many strict regulations.
00:02:37.879 For example, we cannot define constants containing unshareable objects.
00:02:43.560 A common Ruby example would be 'C = large_string', but since string objects are unshareable, this is a problem.
00:02:49.440 The main Lure means that only the main Lure can access constant C.
00:02:53.280 Other non-main threads, which we call child threads, cannot access this constant.
00:03:06.760 The same restriction applies to global variables, which are also prohibited from being accessed by child threads.
00:03:13.680 Due to these strict rules, we lack many important features such as require or timeout.
00:03:19.560 We also observe critical performance degradation in memory management and possibly other issues.
00:03:27.479 Today, I will show you how we can enable these important features in Lure.
00:03:35.560 First, let's discuss the require issue.
00:03:42.600 Currently, the Lure system cannot require any features for child threads because require accesses global state.
00:03:55.800 Many features rely on various rules that limit sharing.
00:04:00.680 As a result, we have prohibited require from child threads.
00:04:06.960 However, in many situations, we need the ability to require in child threads. For instance, some methods depend on features that need to be required.
00:04:21.200 Take the PP method, for example; it required the PP library at its first call.
00:04:30.680 Unfortunately, due to our limitation, we cannot call PP.
00:04:39.880 This limitation affects the usability of our programming environment.
00:04:45.680 Additionally, the old ROIs concern the need for access to constants.
00:04:53.280 If we aim to support the old ROIs, we need to allow requires from child Lures.
00:05:06.960 So, the conclusion is that we need to support requires from child Lures while ensuring that all require calls belong to the Main Lure.
00:05:17.120 To enable the LI method in child threads, we will introduce a new method, 'Lure#interactive_exec'.
00:05:21.200 This method invokes the expression on the main Lure.
00:05:26.360 For example, 'main_lure' will execute the block provided.
00:05:35.200 This method interrupts the main thread and runs this assignment to the global variable.
00:05:41.360 This method processes the expression asynchronously.
00:05:48.079 It means that the method does not wait for the result of the expression between blocks.
00:06:02.960 This method is a powerful feature, enabling various systems to be built around it.
00:06:10.520 However, this method also carries risks, much like handling traps and sending signals.
00:06:15.680 The interrupt mechanism can disrupt blocking calls, like a lead method waiting on network calls.
00:06:22.400 When an interrupt signal occurs, it might wake up the lead method.
00:06:31.000 Therefore, the `Lure#interactive_exec` method does similar things.
00:06:40.160 This figure illustrates how interactions happen in Lure.
00:06:47.520 First, the child Lure calls this method, then interrupts the main thread.
00:06:54.720 It runs the expression without waiting for its result.
00:07:01.680 After calling this method, the rest of the logic continues to run.
00:07:05.680 Using this Lure#interactive_exec method, we can accomplish the Lure require method.
00:07:15.120 By calling Lure#interactive_exec, we create a new thread to require on the main thread.
00:07:21.200 Child Lures need to wait for the result of this require.
00:07:27.040 This avoids potential deadlocking scenarios.
00:07:33.680 The diagram demonstrates this Lure require process.
00:07:40.000 Child Lure calls this method, runs the logic on the main Lure, and then waits for the required feature.
00:07:48.520 Most of the time, the require will succeed, returning true or false.
00:07:55.600 However, sometimes, it will raise load errors or exceptions, which we need to handle.
00:08:04.480 Thus, we need to check for various types of errors that may occur.
00:08:09.680 Now let’s look into how we prepare our Lure require method.
00:08:15.120 The Lure could require successfully by using the parent Lure ID.
00:08:21.200 If the current Lure is not the main Lure, we need to add a line to our require method.
00:08:30.360 There shouldn't be an issue with the logic we outlined for this process.
00:08:37.239 However, we need to consider overriding the require method in libraries that developers use.
00:08:47.960 Libraries like RubyGems or others may override require to provide custom functionality.
00:09:04.200 This means if we change the require method, custom libraries might not behave as expected.
00:09:11.600 Therefore, each library overriding require needs to call the Lure require method.
00:09:18.560 To achieve this, it is crucial to communicate such requirements to library developers.
00:09:24.840 Another approach could introduce a module to check for such requirements and ensure no conflicts arise.
00:09:32.960 This would allow us to create a 'Lure aware require' module ensuring consistent behavior.
00:09:42.960 However, the challenge lies in ensuring that the ancestor tree contains this Lure aware module.
00:09:50.720 I welcome any discussion about how to merge this feature effectively.
00:10:00.400 Now, shifting gears from Lure, I want to touch on the timeout feature.
00:10:05.040 The current timeout mechanism creates a one-time timeout monitor thread.
00:10:13.040 This provides other threads the ability to ask for the exception to be raised if a timeout of one second is met.
00:10:22.360 However, this communication flow currently only works on our thread system.
00:10:30.280 Thus, child Lures cannot communicate with the timeout monitor in other Lures.
00:10:38.560 The existing timeout method is, therefore, not supported in Lure.
00:10:46.720 A simple solution would be to create a timeout monitor for each Lure.
00:10:56.760 This means two Lures would each have their own respective timeout monitor threads.
00:11:03.880 This is relatively easy and should take about thirty minutes to implement, but...
00:11:14.080 If we scale this approach to thousands of Lures, we could end up with thousands of timeout monitors, which is not ideal.
00:11:24.520 Alternatively, we could create a new communication path that allows child Lures to reach the main Lure's timeout thread.
00:11:32.240 However, implementing this is quite challenging.
00:11:41.120 In my last presentation two years ago, I discussed how to introduce a timer thread.
00:11:48.720 This thread would manage timer events like I/O interrupts.
00:11:56.000 I propose using a native thread for this timer management.
00:12:03.920 The main Lure and other Lures can request to register or unregister timeout events.
00:12:12.760 This design is still up for discussion, but it’s a starting point for timeout management.
00:12:20.840 The new timeout_exec method accepts a duration in seconds.
00:12:29.840 We would also need to define what occurs when a timeout occurs.
00:12:36.480 With the introduction of this feature, we could facilitate timeout management through Lures.
00:12:45.680 Most timeout management systems do not throw timeout errors but handle register and unregister procedures.
00:12:53.560 I performed some benchmarks, where I initiated a new task that should take zero seconds.
00:13:01.960 Repeating this process a million times on the current thread system took five minutes, whereas the native timing approach only required three seconds.
00:13:09.560 It's not significantly faster, but still an improvement.
00:13:16.840 The slowdown stems from how we interact with the hardware clock.
00:13:24.040 Switching to another API that works better yielded a speedup.
00:13:32.120 This new API allows for some error tolerance, up to four milliseconds, which suffices for our purposes.
00:13:39.200 The result is an approximate two times improvement in performance.
00:13:45.360 In the final five minutes, I want to discuss performance issues we've encountered.
00:13:55.560 I usually demonstrate this example by creating 50,000 Lure objects and sending messages to each in succession.
00:14:01.840 This allows us to measure how much time is required to circulate around this linked structure.
00:14:10.840 Using M threads can provide significant time savings.
00:14:17.840 We have seen performance increases between 10 to 70 times, depending on whether garbage collection is enabled or not.
00:14:24.200 However, creation time when instantiating 50,000 Lures also poses a performance challenge.
00:14:32.840 Last time we noticed this with garbage collection enabled, it significantly slowed down the process.
00:14:38.560 Comparing Lures without garbage collection shows how detrimental it can be.
00:14:45.680 Currently, we see that the one garbage collection cycle is slower than corresponding non-Lure systems.
00:14:51.760 After running some additional benchmarks, we honed in on the problem.
00:14:57.760 Removing the extra layers, we examined performance from the Lure system.
00:15:05.960 The task was to create additional arrays.
00:15:12.000 The expectation was that it should scale effectively, but instead, we noticed excessive garbage collection.
00:15:20.520 Task counts weren't proportional; in fact, they were higher on the Lure system.
00:15:26.040 We must understand the interaction these counts have with the system as a whole.
00:15:32.160 In particular, each Lure that allocates memory affects garbage collection across the system.
00:15:39.760 In conclusion, we observe that increasing Lures inevitably leads to more garbage collection.
00:15:47.360 With the increased demands on management in Lure threads, sustaining efficiency is more complex.
00:15:54.520 We must focus on ensuring manageable garbage collection, monitoring, and mitigation strategies.
00:16:00.920 This presentation proposes new methods to implement required timeout features and enhance memory management in Lucre.
00:16:07.760 I hope you will support us as we work towards these improvements.
00:16:14.320 Thank you very much.