Ractor Enhancements, 2024

00:00:07.520 Hi, good afternoon. This talk is the last one at the LUIC on the first day.

00:00:14.960 Today, I want to present a progress report on the Lure system for this year.

00:00:21.119 I'm Koichi Sasada from Stores.

00:00:27.359 I will discuss some improvements and implementations for significant features in ROIs.

00:00:34.160 These features include require and timeout, and I will also present a survey analyzing memory management in the Lure system.

00:00:41.200 I'll highlight some potential future pull requests as well.

00:00:49.079 Let me introduce myself again. My name is Koichi Sasada, and I work at Stores.

00:00:57.879 I am very happy to be here this year, and I am the author of the YB machine, garbage collection in the Lure system, and the scheduler system.

00:01:06.320 Furthermore, I am the director of the Ruby Association.

00:01:13.600 So, let's start the talk. Lure is a Ruby on Rails component introduced in Ruby 3.0 and is designed to enable parallel computing in Ruby.

00:01:22.560 Ruby has threads, but threads do not run in parallel on MRI (Matz's Ruby Interpreter) because parallel computing with threads can introduce critical bugs.

00:01:30.560 To prevent such issues, we designed the Lure system.

00:01:35.759 This system features robust concurrent programming.

00:01:42.720 Its robust nature enables bug-free parallel computing due to no object sharing or very limited object sharing.

00:01:52.079 However, limiting object sharing between Lure threads requires introducing strong limitations to the Ruby language.

00:01:57.840 We separate all objects into unshareable objects and shareable objects. Most Ruby objects, such as strings, arrays, and most user-defined classes and modules, are unshareable.

00:02:13.959 We also have some special shareable objects like immutable objects, some special classes, modules, and Lure objects themselves.

00:02:22.840 Because of this separation, we define that shareable objects can be shared between Lure threads, while unshareable ones cannot.

00:02:31.000 Although this is a simple rule, maintaining it necessitates introducing many strict regulations.

00:02:37.879 For example, we cannot define constants containing unshareable objects.

00:02:43.560 A common Ruby example would be 'C = large_string', but since string objects are unshareable, this is a problem.

00:02:49.440 The main Lure means that only the main Lure can access constant C.

00:02:53.280 Other non-main threads, which we call child threads, cannot access this constant.

00:03:06.760 The same restriction applies to global variables, which are also prohibited from being accessed by child threads.

00:03:13.680 Due to these strict rules, we lack many important features such as require or timeout.

00:03:19.560 We also observe critical performance degradation in memory management and possibly other issues.

00:03:27.479 Today, I will show you how we can enable these important features in Lure.

00:03:35.560 First, let's discuss the require issue.

00:03:42.600 Currently, the Lure system cannot require any features for child threads because require accesses global state.

00:03:55.800 Many features rely on various rules that limit sharing.

00:04:00.680 As a result, we have prohibited require from child threads.

00:04:06.960 However, in many situations, we need the ability to require in child threads. For instance, some methods depend on features that need to be required.

00:04:21.200 Take the PP method, for example; it required the PP library at its first call.

00:04:30.680 Unfortunately, due to our limitation, we cannot call PP.

00:04:39.880 This limitation affects the usability of our programming environment.

00:04:45.680 Additionally, the old ROIs concern the need for access to constants.

00:04:53.280 If we aim to support the old ROIs, we need to allow requires from child Lures.

00:05:06.960 So, the conclusion is that we need to support requires from child Lures while ensuring that all require calls belong to the Main Lure.

00:05:17.120 To enable the LI method in child threads, we will introduce a new method, 'Lure#interactive_exec'.

00:05:21.200 This method invokes the expression on the main Lure.

00:05:26.360 For example, 'main_lure' will execute the block provided.

00:05:35.200 This method interrupts the main thread and runs this assignment to the global variable.

00:05:41.360 This method processes the expression asynchronously.

00:05:48.079 It means that the method does not wait for the result of the expression between blocks.

00:06:02.960 This method is a powerful feature, enabling various systems to be built around it.

00:06:10.520 However, this method also carries risks, much like handling traps and sending signals.

00:06:15.680 The interrupt mechanism can disrupt blocking calls, like a lead method waiting on network calls.

00:06:22.400 When an interrupt signal occurs, it might wake up the lead method.

00:06:31.000 Therefore, the `Lure#interactive_exec` method does similar things.

00:06:40.160 This figure illustrates how interactions happen in Lure.

00:06:47.520 First, the child Lure calls this method, then interrupts the main thread.

00:06:54.720 It runs the expression without waiting for its result.

00:07:01.680 After calling this method, the rest of the logic continues to run.

00:07:05.680 Using this Lure#interactive_exec method, we can accomplish the Lure require method.

00:07:15.120 By calling Lure#interactive_exec, we create a new thread to require on the main thread.

00:07:21.200 Child Lures need to wait for the result of this require.

00:07:27.040 This avoids potential deadlocking scenarios.

00:07:33.680 The diagram demonstrates this Lure require process.

00:07:40.000 Child Lure calls this method, runs the logic on the main Lure, and then waits for the required feature.

00:07:48.520 Most of the time, the require will succeed, returning true or false.

00:07:55.600 However, sometimes, it will raise load errors or exceptions, which we need to handle.

00:08:04.480 Thus, we need to check for various types of errors that may occur.

00:08:09.680 Now let’s look into how we prepare our Lure require method.

00:08:15.120 The Lure could require successfully by using the parent Lure ID.

00:08:21.200 If the current Lure is not the main Lure, we need to add a line to our require method.

00:08:30.360 There shouldn't be an issue with the logic we outlined for this process.

00:08:37.239 However, we need to consider overriding the require method in libraries that developers use.

00:08:47.960 Libraries like RubyGems or others may override require to provide custom functionality.

00:09:04.200 This means if we change the require method, custom libraries might not behave as expected.

00:09:11.600 Therefore, each library overriding require needs to call the Lure require method.

00:09:18.560 To achieve this, it is crucial to communicate such requirements to library developers.

00:09:24.840 Another approach could introduce a module to check for such requirements and ensure no conflicts arise.

00:09:32.960 This would allow us to create a 'Lure aware require' module ensuring consistent behavior.

00:09:42.960 However, the challenge lies in ensuring that the ancestor tree contains this Lure aware module.

00:09:50.720 I welcome any discussion about how to merge this feature effectively.

00:10:00.400 Now, shifting gears from Lure, I want to touch on the timeout feature.

00:10:05.040 The current timeout mechanism creates a one-time timeout monitor thread.

00:10:13.040 This provides other threads the ability to ask for the exception to be raised if a timeout of one second is met.

00:10:22.360 However, this communication flow currently only works on our thread system.

00:10:30.280 Thus, child Lures cannot communicate with the timeout monitor in other Lures.

00:10:38.560 The existing timeout method is, therefore, not supported in Lure.

00:10:46.720 A simple solution would be to create a timeout monitor for each Lure.

00:10:56.760 This means two Lures would each have their own respective timeout monitor threads.

00:11:03.880 This is relatively easy and should take about thirty minutes to implement, but...

00:11:14.080 If we scale this approach to thousands of Lures, we could end up with thousands of timeout monitors, which is not ideal.

00:11:24.520 Alternatively, we could create a new communication path that allows child Lures to reach the main Lure's timeout thread.

00:11:32.240 However, implementing this is quite challenging.

00:11:41.120 In my last presentation two years ago, I discussed how to introduce a timer thread.

00:11:48.720 This thread would manage timer events like I/O interrupts.

00:11:56.000 I propose using a native thread for this timer management.

00:12:03.920 The main Lure and other Lures can request to register or unregister timeout events.

00:12:12.760 This design is still up for discussion, but it’s a starting point for timeout management.

00:12:20.840 The new timeout_exec method accepts a duration in seconds.

00:12:29.840 We would also need to define what occurs when a timeout occurs.

00:12:36.480 With the introduction of this feature, we could facilitate timeout management through Lures.

00:12:45.680 Most timeout management systems do not throw timeout errors but handle register and unregister procedures.

00:12:53.560 I performed some benchmarks, where I initiated a new task that should take zero seconds.

00:13:01.960 Repeating this process a million times on the current thread system took five minutes, whereas the native timing approach only required three seconds.

00:13:09.560 It's not significantly faster, but still an improvement.

00:13:16.840 The slowdown stems from how we interact with the hardware clock.

00:13:24.040 Switching to another API that works better yielded a speedup.

00:13:32.120 This new API allows for some error tolerance, up to four milliseconds, which suffices for our purposes.

00:13:39.200 The result is an approximate two times improvement in performance.

00:13:45.360 In the final five minutes, I want to discuss performance issues we've encountered.

00:13:55.560 I usually demonstrate this example by creating 50,000 Lure objects and sending messages to each in succession.

00:14:01.840 This allows us to measure how much time is required to circulate around this linked structure.

00:14:10.840 Using M threads can provide significant time savings.

00:14:17.840 We have seen performance increases between 10 to 70 times, depending on whether garbage collection is enabled or not.

00:14:24.200 However, creation time when instantiating 50,000 Lures also poses a performance challenge.

00:14:32.840 Last time we noticed this with garbage collection enabled, it significantly slowed down the process.

00:14:38.560 Comparing Lures without garbage collection shows how detrimental it can be.

00:14:45.680 Currently, we see that the one garbage collection cycle is slower than corresponding non-Lure systems.

00:14:51.760 After running some additional benchmarks, we honed in on the problem.

00:14:57.760 Removing the extra layers, we examined performance from the Lure system.

00:15:05.960 The task was to create additional arrays.

00:15:12.000 The expectation was that it should scale effectively, but instead, we noticed excessive garbage collection.

00:15:20.520 Task counts weren't proportional; in fact, they were higher on the Lure system.

00:15:26.040 We must understand the interaction these counts have with the system as a whole.

00:15:32.160 In particular, each Lure that allocates memory affects garbage collection across the system.

00:15:39.760 In conclusion, we observe that increasing Lures inevitably leads to more garbage collection.

00:15:47.360 With the increased demands on management in Lure threads, sustaining efficiency is more complex.

00:15:54.520 We must focus on ensuring manageable garbage collection, monitoring, and mitigation strategies.

00:16:00.920 This presentation proposes new methods to implement required timeout features and enhance memory management in Lucre.

00:16:07.760 I hope you will support us as we work towards these improvements.

00:16:14.320 Thank you very much.