00:00:07.520
Hi, good afternoon. This talk is the last one at the LUIC on the first day.
00:00:14.960
Today, I want to present a progress report on the Lure system for this year.
00:00:21.119
I'm Koichi Sasada from Stores.
00:00:27.359
I will discuss some improvements and implementations for significant features in ROIs.
00:00:34.160
These features include require and timeout, and I will also present a survey analyzing memory management in the Lure system.
00:00:41.200
I'll highlight some potential future pull requests as well.
00:00:49.079
Let me introduce myself again. My name is Koichi Sasada, and I work at Stores.
00:00:57.879
I am very happy to be here this year, and I am the author of the YB machine, garbage collection in the Lure system, and the scheduler system.
00:01:06.320
Furthermore, I am the director of the Ruby Association.
00:01:13.600
So, let's start the talk. Lure is a Ruby on Rails component introduced in Ruby 3.0 and is designed to enable parallel computing in Ruby.
00:01:22.560
Ruby has threads, but threads do not run in parallel on MRI (Matz's Ruby Interpreter) because parallel computing with threads can introduce critical bugs.
00:01:30.560
To prevent such issues, we designed the Lure system.
00:01:35.759
This system features robust concurrent programming.
00:01:42.720
Its robust nature enables bug-free parallel computing due to no object sharing or very limited object sharing.
00:01:52.079
However, limiting object sharing between Lure threads requires introducing strong limitations to the Ruby language.
00:01:57.840
We separate all objects into unshareable objects and shareable objects. Most Ruby objects, such as strings, arrays, and most user-defined classes and modules, are unshareable.
00:02:13.959
We also have some special shareable objects like immutable objects, some special classes, modules, and Lure objects themselves.
00:02:22.840
Because of this separation, we define that shareable objects can be shared between Lure threads, while unshareable ones cannot.
00:02:31.000
Although this is a simple rule, maintaining it necessitates introducing many strict regulations.
00:02:37.879
For example, we cannot define constants containing unshareable objects.
00:02:43.560
A common Ruby example would be 'C = large_string', but since string objects are unshareable, this is a problem.
00:02:49.440
The main Lure means that only the main Lure can access constant C.
00:02:53.280
Other non-main threads, which we call child threads, cannot access this constant.
00:03:06.760
The same restriction applies to global variables, which are also prohibited from being accessed by child threads.
00:03:13.680
Due to these strict rules, we lack many important features such as require or timeout.
00:03:19.560
We also observe critical performance degradation in memory management and possibly other issues.
00:03:27.479
Today, I will show you how we can enable these important features in Lure.
00:03:35.560
First, let's discuss the require issue.
00:03:42.600
Currently, the Lure system cannot require any features for child threads because require accesses global state.
00:03:55.800
Many features rely on various rules that limit sharing.
00:04:00.680
As a result, we have prohibited require from child threads.
00:04:06.960
However, in many situations, we need the ability to require in child threads. For instance, some methods depend on features that need to be required.
00:04:21.200
Take the PP method, for example; it required the PP library at its first call.
00:04:30.680
Unfortunately, due to our limitation, we cannot call PP.
00:04:39.880
This limitation affects the usability of our programming environment.
00:04:45.680
Additionally, the old ROIs concern the need for access to constants.
00:04:53.280
If we aim to support the old ROIs, we need to allow requires from child Lures.
00:05:06.960
So, the conclusion is that we need to support requires from child Lures while ensuring that all require calls belong to the Main Lure.
00:05:17.120
To enable the LI method in child threads, we will introduce a new method, 'Lure#interactive_exec'.
00:05:21.200
This method invokes the expression on the main Lure.
00:05:26.360
For example, 'main_lure' will execute the block provided.
00:05:35.200
This method interrupts the main thread and runs this assignment to the global variable.
00:05:41.360
This method processes the expression asynchronously.
00:05:48.079
It means that the method does not wait for the result of the expression between blocks.
00:06:02.960
This method is a powerful feature, enabling various systems to be built around it.
00:06:10.520
However, this method also carries risks, much like handling traps and sending signals.
00:06:15.680
The interrupt mechanism can disrupt blocking calls, like a lead method waiting on network calls.
00:06:22.400
When an interrupt signal occurs, it might wake up the lead method.
00:06:31.000
Therefore, the `Lure#interactive_exec` method does similar things.
00:06:40.160
This figure illustrates how interactions happen in Lure.
00:06:47.520
First, the child Lure calls this method, then interrupts the main thread.
00:06:54.720
It runs the expression without waiting for its result.
00:07:01.680
After calling this method, the rest of the logic continues to run.
00:07:05.680
Using this Lure#interactive_exec method, we can accomplish the Lure require method.
00:07:15.120
By calling Lure#interactive_exec, we create a new thread to require on the main thread.
00:07:21.200
Child Lures need to wait for the result of this require.
00:07:27.040
This avoids potential deadlocking scenarios.
00:07:33.680
The diagram demonstrates this Lure require process.
00:07:40.000
Child Lure calls this method, runs the logic on the main Lure, and then waits for the required feature.
00:07:48.520
Most of the time, the require will succeed, returning true or false.
00:07:55.600
However, sometimes, it will raise load errors or exceptions, which we need to handle.
00:08:04.480
Thus, we need to check for various types of errors that may occur.
00:08:09.680
Now let’s look into how we prepare our Lure require method.
00:08:15.120
The Lure could require successfully by using the parent Lure ID.
00:08:21.200
If the current Lure is not the main Lure, we need to add a line to our require method.
00:08:30.360
There shouldn't be an issue with the logic we outlined for this process.
00:08:37.239
However, we need to consider overriding the require method in libraries that developers use.
00:08:47.960
Libraries like RubyGems or others may override require to provide custom functionality.
00:09:04.200
This means if we change the require method, custom libraries might not behave as expected.
00:09:11.600
Therefore, each library overriding require needs to call the Lure require method.
00:09:18.560
To achieve this, it is crucial to communicate such requirements to library developers.
00:09:24.840
Another approach could introduce a module to check for such requirements and ensure no conflicts arise.
00:09:32.960
This would allow us to create a 'Lure aware require' module ensuring consistent behavior.
00:09:42.960
However, the challenge lies in ensuring that the ancestor tree contains this Lure aware module.
00:09:50.720
I welcome any discussion about how to merge this feature effectively.
00:10:00.400
Now, shifting gears from Lure, I want to touch on the timeout feature.
00:10:05.040
The current timeout mechanism creates a one-time timeout monitor thread.
00:10:13.040
This provides other threads the ability to ask for the exception to be raised if a timeout of one second is met.
00:10:22.360
However, this communication flow currently only works on our thread system.
00:10:30.280
Thus, child Lures cannot communicate with the timeout monitor in other Lures.
00:10:38.560
The existing timeout method is, therefore, not supported in Lure.
00:10:46.720
A simple solution would be to create a timeout monitor for each Lure.
00:10:56.760
This means two Lures would each have their own respective timeout monitor threads.
00:11:03.880
This is relatively easy and should take about thirty minutes to implement, but...
00:11:14.080
If we scale this approach to thousands of Lures, we could end up with thousands of timeout monitors, which is not ideal.
00:11:24.520
Alternatively, we could create a new communication path that allows child Lures to reach the main Lure's timeout thread.
00:11:32.240
However, implementing this is quite challenging.
00:11:41.120
In my last presentation two years ago, I discussed how to introduce a timer thread.
00:11:48.720
This thread would manage timer events like I/O interrupts.
00:11:56.000
I propose using a native thread for this timer management.
00:12:03.920
The main Lure and other Lures can request to register or unregister timeout events.
00:12:12.760
This design is still up for discussion, but it’s a starting point for timeout management.
00:12:20.840
The new timeout_exec method accepts a duration in seconds.
00:12:29.840
We would also need to define what occurs when a timeout occurs.
00:12:36.480
With the introduction of this feature, we could facilitate timeout management through Lures.
00:12:45.680
Most timeout management systems do not throw timeout errors but handle register and unregister procedures.
00:12:53.560
I performed some benchmarks, where I initiated a new task that should take zero seconds.
00:13:01.960
Repeating this process a million times on the current thread system took five minutes, whereas the native timing approach only required three seconds.
00:13:09.560
It's not significantly faster, but still an improvement.
00:13:16.840
The slowdown stems from how we interact with the hardware clock.
00:13:24.040
Switching to another API that works better yielded a speedup.
00:13:32.120
This new API allows for some error tolerance, up to four milliseconds, which suffices for our purposes.
00:13:39.200
The result is an approximate two times improvement in performance.
00:13:45.360
In the final five minutes, I want to discuss performance issues we've encountered.
00:13:55.560
I usually demonstrate this example by creating 50,000 Lure objects and sending messages to each in succession.
00:14:01.840
This allows us to measure how much time is required to circulate around this linked structure.
00:14:10.840
Using M threads can provide significant time savings.
00:14:17.840
We have seen performance increases between 10 to 70 times, depending on whether garbage collection is enabled or not.
00:14:24.200
However, creation time when instantiating 50,000 Lures also poses a performance challenge.
00:14:32.840
Last time we noticed this with garbage collection enabled, it significantly slowed down the process.
00:14:38.560
Comparing Lures without garbage collection shows how detrimental it can be.
00:14:45.680
Currently, we see that the one garbage collection cycle is slower than corresponding non-Lure systems.
00:14:51.760
After running some additional benchmarks, we honed in on the problem.
00:14:57.760
Removing the extra layers, we examined performance from the Lure system.
00:15:05.960
The task was to create additional arrays.
00:15:12.000
The expectation was that it should scale effectively, but instead, we noticed excessive garbage collection.
00:15:20.520
Task counts weren't proportional; in fact, they were higher on the Lure system.
00:15:26.040
We must understand the interaction these counts have with the system as a whole.
00:15:32.160
In particular, each Lure that allocates memory affects garbage collection across the system.
00:15:39.760
In conclusion, we observe that increasing Lures inevitably leads to more garbage collection.
00:15:47.360
With the increased demands on management in Lure threads, sustaining efficiency is more complex.
00:15:54.520
We must focus on ensuring manageable garbage collection, monitoring, and mitigation strategies.
00:16:00.920
This presentation proposes new methods to implement required timeout features and enhance memory management in Lucre.
00:16:07.760
I hope you will support us as we work towards these improvements.
00:16:14.320
Thank you very much.