Reducing Implicit Allocations During Method Calling

00:00:12.080 Hello everyone. In this presentation, I'm going to discuss the changes we made in Ruby 3.3 and 3.4 to reduce implicit allocations during method calling. My name is Jeremy Evans. I'm a Ruby committer focused on fixing bugs in Ruby. I'm also the author of 'Polished Ruby Programming', which was published in English a few years ago and in Japanese last year. This book aims at intermediate Ruby programmers and focuses on teaching principles of Ruby programming as well as trade-offs to consider when making implementation decisions.

00:00:50.360 To reduce implicit allocations, the first step is to measure them. If you don't know how many objects a method call implicitly allocates, you cannot determine whether it's possible to reduce the number of allocations. Here is a method that shows how many arrays and hashes are allocated inside a block. I don't have enough time to fully explain how it works, but the important part is that you can call the object's allocated method with a block, and it will print the number of arrays and hashes allocated by the block. Using this object's allocated method, we can examine how many objects are allocated for certain types of method calls.

00:01:34.240 The first allocation reduction optimization we're going to examine is called keyword Splat separation. This technique was implemented by Sadan early in the Ruby 3.3 development cycle. Let's consider this code example. We call an empty method that takes no arguments, create an empty array and a hash, and we use the Splat syntax on the array and then the keyword Splat on the hash when calling the method. Any guesses as to how many objects are allocated by this method call in Ruby 3.2? It turns out Ruby 3.2 allocates four arrays and one hash for this method call. Considering that passing arguments to other methods is commonly done via Splats and keyword Splats, this unnecessary object allocation can result in significant slowdowns.

00:02:25.920 Ruby 3.3 fixes this by avoiding any allocation for this type of call. We can examine why this type of call allocates so many objects in Ruby 3.2. We can run Ruby with the dump instructions option, which will dump the virtual machine instructions used, and then we can evaluate the code with the E option. The simplified dump instructions show that the method call starts with 'put self' and ends with the second 'send without block' instruction.

00:02:56.239 Let’s analyze these instructions to see where the allocations occur. The first array allocation during the method call is for Splat array true, which duplicates the array local variable. The new hash instruction allocates a new hash, and this instruction copies the contents of the keyword hash into the newly allocated hash. The new array instruction wraps that newly allocated hash in another new array, and the concat array instruction allocates a new array when concatenating the array containing the new hash to the array produced by Splat array true, even though Splat array true has already allocated a new array for this purpose.

00:03:52.360 However, something doesn’t seem right, as these instructions should only result in three array allocations and one hash allocation. So where does the final array allocation take place? The final array allocation occurs on the caller side of the method, which duplicates the array of arguments because it needs to remove the empty keyword Splat from the array. Ruby 3.2 treats code like this, which has a positional Splat and a keyword Splat, as if it has a single Splat containing both the keyword Splat and the positional Splat, and it was not efficient about allocations when doing this. Sadan fixed this in Ruby 3.3.

00:04:47.240 In Ruby 3.3, the generated instructions are much simpler because the positional Splat and the keyword Splat are separated; the keyword Splat is not combined into the positional Splat array, so it avoids the caller side array allocation. My first change to reduce implicit allocations involved implementing this particular change. However, after Sadan's changes, the instruction still used Splat array true, which allocated a new array. I determined that it was unnecessary to allocate a new array in this case, as Ruby was no longer appending the keyword Splat to the array. I changed it to Splat array false, which reduced the allocation.

00:05:21.479 As I worked on this fix, I noticed that it affected other cases. I decided I wanted to address this issue. My goal is for Ruby to have no unnecessary allocations. I believe Ruby should only allocate an object if it is necessary to do so. This goal has two corollaries: first, Ruby should allocate at most one array implicitly during method calling. While there are cases where Ruby needs to allocate an array, such as for a positional Splat parameter, there should be no reason for it to allocate multiple arrays implicitly. Second, Ruby should allocate at most one hash implicitly during method calling. A keyword Splat parameter may require a hash allocation, but there should be no reason to allocate multiple hashes implicitly.

00:06:36.960 My first step towards reducing allocations was to fix additional cases where Ruby used Splat array true when Splat array false would work. For example, in Ruby 3.2, there was an instance where no objects needed to be allocated during a method call with a method accepting an optional positional argument. In this scenario, even Ruby 3.2 does not allocate any objects. However, if you change the call to pass both a positional argument and a positional Splat argument, then Ruby 3.2 will allocate an array due to it using Splat array true instead of Splat array false.

00:07:49.280 Initially, I tried making these changes directly in the compiler, but I found that it was quite challenging. The compiler is the third step in converting Ruby source code into virtual machine instructions. The first step is the lexer, which separates the source code into tokens. The second step is the parser, which builds an abstract syntax tree from the token stream. The third step is the compiler, which takes the abstract syntax tree and generates virtual machine instructions, followed by the optimizer, which modifies the instruction list to use more efficient operations. If you want to optimize code dependent on the abstract syntax tree, those changes are best made in the compiler.

00:08:42.680 However, if you want to optimize code for all cases where specific combinations of instructions are used, regardless of the abstract syntax tree, those changes are best made in the optimizer. For instance, to remove the unnecessary array allocation in a method call, we need to change Splat array true to Splat array false. However, you cannot just change it in all cases; sometimes Splat array true is necessary for the correct behavior. The optimization requires analyzing what instructions follow the Splat array instruction.

00:09:58.200 If a send instruction follows immediately after the Splat array instruction and supports an argument Splat, but does not use keywords or a block, then it's safe to change it. To implement this change, I added the code to the optimizer that checks if the current instruction is a Splat array instruction with an argument of true and recheck the conditions that allow a transition to false. The optimizer checks for the next instruction being a send instruction to confirm safe transitions.

00:11:10.840 To ensure everything works seamlessly, I needed Ruby to output the instructions before optimization. This is achievable using a specific dump option prior to Ruby 3.4. Here are the instructions for the method call before optimization. The change from send to 'opt send without block' happens later in the optimization pass. At the point of executing this particular optimization, the instruction is still send and will match correctly.

00:12:03.520 In working on these optimizations, I also looked into special call info flags. The call info flags contain details about the method call; for example, ARG Splat means the method uses a positional argument Splat. F call means the method call does not have an explicit receiver or the explicit receiver is itself.

00:12:39.760 We are looking for instructions that contain the args Splat call info flag. This optimization is only safe if the method does not pass keywords or use a block, so we need to verify these call info flags' absence before moving forward with the elimination of array allocations. Finally, I updated the Splat array instruction argument to false, thereby eliminating unnecessary array allocations.

00:13:28.320 Some cases optimized were for positional Splats used in combination with keyword Splats, blocks, and literal keywords. I originally developed the optimization for when a positional Splat is followed by a keyword Splat as compiler changes but switched it to an optimizer change after working on related optimizations. Another case involved when a positional Splat followed by literal keywords was treated similar to keyword Splats, but the instructions differ slightly in terms of how keyword Splats are generated. The final optimization dealt with the combination of positional, literal keywords, and blocks. However, optimizing the latter case revealed a bug related to method calls accepting both.

00:14:32.640 During this period, I also found general issues such as the evaluation order issue, especially when keywords and block arguments tend to be passed incorrectly. These revealed potential performance penalties. To address the bug, I introduced a new instruction, Splat keyword, which is invoked directly before send instructions that include both keyword and block arguments, ensuring that keyword Splat would execute correctly before calling the appropriate block.

00:15:52.320 Unfortunately, fixing this bug sometimes caused previously implemented optimizations to break, leading to further analysis and adjustments. I noticed that super calls with explicit arguments presented similar evaluation order issues. Fixing one bug in passing keyword arguments through operator assignment syntax revealed constraints about how Ruby discounts block transformations if mismanaged. Consequently, I rectified these bugs in Ruby 3.3 to enhance overall function.

00:16:35.760 Up until now, all optimizations we discussed have been available in Ruby 3.3. The remaining presentation will cover additional implicit allocation reduction optimizations targeted to be included in Ruby 3.4. For instance, we have an example where multiple positional Splats are passed to a method that has a positional Splat parameter, which in Ruby 3.3 allocates three arrays. Ruby’s internal calling API does not support multiple Splats, though Ruby implicitly combines them into one.

00:17:35.520 As previously mentioned, the first allocation occurs with Splat array true, which duplicates the first positional Splat local variable. This means we cannot eliminate this allocation since Ruby requires a singular array representation for the Splat. The second positional Splat utilizes the Splat array false instruction not allocating any arrays; Ruby then combines them into a single array, which currently creates a duplication due to the call E side. That’s why I worked on implementing flags that allow the caller to notify the callee of pre-allocated arrays.

00:18:18.320 By introducing the ARG Splat mute call info flag, we reduce allocations further down to two arrays; however, this is still more than the goal of at most one array. To achieve our target, it’s essential to eliminate the array allocation for concatenation. The challenge arises from the lack of a dedicated virtual instruction for this purpose. Consequently, transitioning concat array to concat 2 array became necessary, as this allows Ruby to directly mutate existing arrays, which lifts performance efficiency to a new standard.

00:19:23.880 With the adjustments I've made, that method call which previously allocated three arrays will now only allocate one. Subsequently, I shifted attention to the case where a positional argument follows a positional Splat, leading to four allocations in Ruby 3.3. Here, they produce identical instructions just like the multiple Splat case, except with an additional instruction for wrapping.

00:20:91.440 By removing the Call E side allocation, we lower total allocations to three. In seeking solutions toward a viable outcome, I realized we needed to eliminate new array allocations entirely and integrated a push to array instruction, which effectively moves the right object from the VM stack directly into the array at the top. By combining AR spat mute and push to array, Ruby 3.4 will only allocate a single array for this call.

00:21:37.600 We also have a method that abides by Ruby's positional Splat parameter rules where theoretically avoiding allocations becomes a challenge. During efforts to resolve the identified problem, I deduced it’s possible to disable object accesses via anonymous Splat parameters, a feature initially recognized in Ruby 3.2. This feature ensures that when using anonymous Splat parameters while calling other functions, no allocations are made unnecessarily.

00:22:59.760 Upon implementing the protocol, I discovered it was vital to track anonymous states using ISC parameter flags, checking the presence of non-rest flags while invoking methods to eliminate duplications. The positive outcome rendered all allocations negligible during this method invocation. This enhancement has proven beneficial for users, mitigating costs on multiple methods of indirection effectively.

00:23:46.560 Moving forward, even generic argument forwarding saw performance boosts. My strategy effectively pivoted the old technique into an anonymous keyword Splat structure, ensuring the calls remained allocationless where conventional methods previously allocated excessive arrays and hashes. This adjusted optimization has offered Ruby clients enhanced service without inherent costs.

00:24:33.560 Finally, as I wrap up this segment, I hope my journey through these optimizations demonstrated how elusive bugs arise while pursuing enhancement goals. I believe that prioritizing errors holds immense benefits over mere performance gains and reiterate that correctness must prevail as the foundation of any code changes. We’ve discussed allocation reduction optimizations introduced in Ruby 3.3 and 3.4 in hopes of producing meaningful impacts such as further talks at future Ruby events.

00:25:32.240 If you enjoyed this presentation and want to learn more about my thoughts on Ruby programming, please consider picking up a copy of 'Polished Ruby Programming.' That concludes my presentation, and I’d like to thank all of you for listening, and a special thank you to Shopify for sponsoring my travel to RubyKaigi.