The Second Oldest Bug

Performance

Jeremy Evans

The Second Oldest Bug

Jeremy Evans • November 13, 2023 • San Diego, CA

In this presentation, Jeremy Evans discusses the second oldest open bug in Ruby's bug tracker, identified as bug 4040, which involves a SystemStackError triggered by calling methods with very large array arguments. Over the years, Ruby's handling of method calls with large argument arrays has evolved significantly, from crashing with a core dump in Ruby 1.8 to raising SystemStackError in later versions. The bug primarily affected methods defined in C and persisted through Ruby 3.2. Evans details the journey to fix this enduring issue and the complexities that arose during the process. The key points include:

Historical Context:
- The initial behavior in Ruby 1.8 resulted in crashes due to method calls with too many arguments, which transitioned to raising SystemStackError in Ruby 1.9.
- Ruby 2.2 partially addressed the issue for Ruby-defined methods, leaving C-defined methods vulnerable.
Bug Fix Process:
- Extensive debugging techniques involved using GDB and examining how arguments were passed to methods.
- The solution introduced a method to manage larger argument arrays by allocating them on the heap rather than the VM stack, thereby avoiding stack overflow.
Implementation Challenges:
- Multiple method types needed to be considered, each requiring different handling strategies, particularly for methods defined by blocks.
- The ongoing effort felt like a game of 'whack-a-mole' as addressing one bug led to the emergence of others.
Performance Considerations:
- Initial fixes resulted in a minor slowdown in method calls, prompting optimizations that improved performance in certain cases.
- Changes made were discussed extensively within the Ruby community, considering their potential impact.
Integration and Future Outlook:
- The comprehensive fix was merged into Ruby 3.3, but the possibility of reverting the patch exists if future performance concerns arise.

The main takeaway is that persistence and collaboration within the Ruby community can lead to resolving long-standing issues. Evans encourages other developers to engage with Ruby's bug-fixing process, underscoring the value of determination and teamwork in improving programming languages.

The Second Oldest Bug
Jeremy Evans • November 13, 2023 • San Diego, CA

Historically, calling a method with a very large number of arguments resulted in a core dump. In Ruby 1.9, this was improved to instead raise SystemStackError. In Ruby 2.2, the issue was fixed for methods defined in Ruby. However, in Ruby 3.2, this is still an issue for methods defined in C. This issue was reported as a bug over 12 years ago, and was the second oldest open bug in Ruby's bug tracker. Come learn about stacks, heaps, argument handling during method dispatch, and how we fixed this bug in Ruby 3.3.

RubyConf 2023

00:00:18.520 Hello, everyone! My name is Maple, and I'm on the program committee this year. It's nice to meet all of you. I used to work with Ruby internals and recognized the name of our next speaker. When I saw the opportunity to introduce him, I immediately put my name down. So, next up, we have Jeremy Evans. Jeremy is a Ruby committer who focuses on fixing bugs in Ruby. He is the lead developer of the SQL database library, the Roda web toolkit, and many other Ruby libraries. He is also the author of "Polished Ruby Programming," and we have two copies here that you can grab after the talk. I've got mine signed, and it’s an excellent read!

00:01:11.360 So, please give it up for Jeremy!

00:01:18.799 Alright, hello everyone! In this presentation, I'll be discussing what was Ruby's second oldest open bug and how we fixed this bug in Ruby 3.3.

00:01:23.960 My name is Jeremy Evans. I am a Ruby committer who focuses on fixing bugs in Ruby. I’m also the author of "Polished Ruby Programming," which was published a couple of years ago. This book is aimed at intermediate Ruby programmers and focuses on teaching principles of Ruby programming as well as trade-offs to consider when making implementation decisions.

00:01:41.399 So, what is the second oldest bug that I am discussing? I'm referring to bug 4040, which has the subject "System stack error with hash splat for large arrays." The reproduction code given in the bug report was this simple Ruby code that calls the array reference method on a hash.

00:01:57.159 This code uses a splat of a very large array with over a million entries. Interestingly, this bug is not specific to that hash method. At the time the bug was reported, this issue affected any method when called with a splatted array. If the number of elements in the array was large enough, you could reproduce the bug when calling the kernel 'puts' method. You could reproduce the bug with methods defined in Ruby, and even with methods that do not accept any arguments, such as the kernel 'object_id' method. This is because the error occurs while setting up the arguments for the method call before actually calling the method.

00:02:30.120 Ruby's behavior when passing a very large array as a splat to a method has changed over time. Back in Ruby 1.8, passing too many arguments to a method would result in the Ruby interpreter crashing and producing a core dump, both for methods defined in C and methods defined in Ruby. In Ruby 1.9, this condition was recognized, and a SystemStackError was raised instead for both cases. Ruby 2.0 and 2.1 exhibited the same behavior as Ruby 1.9.

00:03:08.480 In Ruby 2.2, Kichi Sada, the author of the Ruby Virtual Machine, fixed this issue for methods defined in Ruby that accept splats. This same behavior remains in Ruby 3.2. Now, as you can probably guess from this presentation, we fixed this issue in Ruby 3.3. So now calling the 'object_id' method with a large array splat correctly results in an ArgumentError. If you switch to the example code given in the original bug report, you can see that it now works correctly and returns a hash with a single entry.

00:04:04.799 Now, some of you may be thinking, "Who cares about this issue? Surely, nobody is splatting such a large array, right?" But do any of you have code in your applications that does something like this, where you splat a variable when calling a method? I’m assuming most of us do. Is there any chance you have code somewhere in one of your applications that accepts user input and assigns it to a variable? Again, probably most of us have code like this. Now, any chance you have that variable with user input that is later splatted? This becomes less likely, but I’m guessing at least a small percentage of the audience has code that splats user input. If so, you need to recognize that this probably allows a user to remotely trigger a SystemStackError.

00:04:36.000 Let's say your application deals with user input and you want to handle unexpected errors using 'rescue.' Unfortunately, this does not work the way you expect. If the call to your method raises a SystemStackError, your cleanup method will not be called, and that's because this rescue does not catch SystemStackError. 'Rescue' without an argument is the same as 'rescue StandardError,' and SystemStackError is a subclass of Exception, not a subclass of StandardError. So, let's say you expect some method to raise non-standard exceptions and you want to do some cleanup using a couple of methods and then re-raise the exception.

00:05:09.520 What's the problem here? Well, the problem is that your cleanup method also raises a SystemStackError, which means that your other cleanup methods will never be called. Now, hopefully, these examples were helpful in demonstrating that this issue may exist in your own code and it may be triggerable remotely. Now, I'm going to discuss details about why this bug occurred. I'm going to focus on how arguments were handled for methods defined in C in Ruby 3.2, which will generally apply to earlier Ruby versions as well.

00:05:46.479 For background, I will first discuss how methods are defined using Ruby's C API. Here's an example of how 'kernel' itself is defined in Ruby. The part of the code is the C function for the method, and this section shows how the C function is registered as a method in Ruby. The kernel does not take any arguments, so you pass zero as the number of arguments when registering the method. Since the Ruby method does not accept any arguments, the C function is called with a single argument, which is the receiver of the method.

00:06:13.639 The value here is the C type used for all Ruby objects, and both the C function argument and the C function return value must be Ruby objects. To start working on fixing this bug, I needed to find out where the issue was occurring. One of the tools I use most when debugging is 'grep.' Another tool I frequently use when debugging C code is GDB, the GNU debugger. To get a general idea of how method calling works, we run Ruby through GDB and tell Ruby to execute the 'itself' method. Then we set a breakpoint on the underlying C function that will be called.

00:06:58.400 If you're building Ruby with support for shared libraries, most of Ruby is contained in a 'libruby' shared object and not in the Ruby binary itself. So, you have to tell GDB that you want this breakpoint to be pending a shared library load. Then GDB sets the pending breakpoint, allowing you to tell GDB to run the Ruby program. GDB then runs Ruby until it hits that breakpoint. At that point, you can use the GDB backtrace command to see how you got here. The simplified backtrace output can be split into four basic sections. The bottom section encompasses everything that Ruby runs before calling the 'itself' method. The second section comprises generic method calling code, regardless of how the method was defined. The third section consists of code specific to calling methods defined in C, and at the very top, we have code specific to the 'itself' method.

00:08:34.760 Since this bug occurs when calling methods defined in C, it seems likely that the problem lies in one of these three functions or in one of the functions they call. While the backtrace is helpful, it would be best if we could see exactly where this SystemStackError is being raised. We can start that process by grepping to see where SystemStackError is defined. It’s always nice when you get a single result back from this line. It shows that the SystemStackError Ruby class is stored in a C global variable named 'rb_eSystemStackError.' If you grep for that, you'll find a bunch of results, but I'll filter them out for simplicity.

00:09:12.799 Now, the 'stack level too deep' string gives a pretty good indication that we're on the right track. Here we see it called 'rb_vm_register_special_exception.' I do not know what this function does, but based on the name, it probably registers an exception. Since we know that 'rb_eSystemStackError' is a reference to SystemStackError, we can probably assume that the first argument is being registered with it. When we grep for 'Ruby errors to stack,' we only get four results, one of which we can ignore as it's the same as the previous grep. Now, this line seems interesting. What if we go to that file and look at that line? Here's the code around that line, and there's one large indication that this is what we are looking for—the function name, which contains the words 'stack overflow.'

00:09:56.760 It looks like this is the function that's called if the stack would overflow, allowing you to raise a SystemStackError instead of having the program crash. We can use GDB to verify that. In this case, we're going to run code that should trigger the stack overflow and then set a breakpoint on the 'rb_vm_stack_overflow' function. Then we run the program, and thankfully we guessed correctly. The breakpoint is hit, and we can use the backtrace command to see how we got here. Most of the backtrace is the same as the previous one, so I've highlighted the new lines. These top two lines are both for functions with 'stack overflow' in the name, likely called when a potential stack overflow has been detected.

00:10:47.760 However, they are not the cause of the stack overflow. The function directly before those lines, named 'vm_caller_setup_arg_splat,' is the probable cause. Here’s the definition of that function: it determines which object is being splatted. If the object being splatted is not nil, we assume it is an array. We then determine the length of the array to splat, and in the normal case, we copy all objects from the splatted array to the VM stack.

00:11:20.000 Some of you are probably wondering what the VM stack is. The VM stack refers to the stack used by Ruby's virtual machine. Virtual machines are generally either stack-based or register-based, and Ruby's virtual machine is stack-based. Ruby's compiler compiles your Ruby code into virtual machine instructions that operate on the stack, and values on the VM stack are Ruby objects.

00:12:06.360 To understand the underlying SystemStackError issue, it would help to explain how things are laid out in memory. This explanation will be oversimplified, brief, and at least partly wrong, but I hope it helps some people. You can consider memory as continuous storage from the lowest memory address, represented here by all zeros, to the highest memory address, represented here by all Fs. Ruby is written in C, and while C does not use a virtual machine, C uses a stack to store stack frames. There is generally one stack frame per C function call, and this stack frame holds the memory locations for the local variables for that C function call.

00:12:57.360 When C needs to store data that is not stored on the stack, it first needs to allocate memory from the operating system, which is often done using the 'malloc' function. The section of memory where these allocations are located is often referred to as the heap. Ruby uses 'malloc' and other functions to allocate most of the memory it uses. Therefore, most of Ruby's memory usage is on the heap. One of the things that Ruby stores on the heap is the object pool, which is where Ruby objects are stored in memory. For large objects, sometimes they don’t fit in memory and are stored elsewhere on the heap.

00:13:41.160 Another thing Ruby stores here is the stack for Ruby's virtual machine. This is what this presentation refers to as the VM stack, and this is where Ruby stores stack frames for Ruby method calls and temporarily stores objects that are passed as arguments to those methods. If you try to pass too many arguments to a method call or have unbounded recursion, your program will try to write beyond the end of Ruby's VM stack. Since Ruby 1.9, Ruby recognizes when you're about to do this and raises a SystemStackError instead of having the program crash.

00:14:04.880 In this presentation, the term VM stack refers to the stack used by the currently executing Ruby code. This could be the main VM stack or one of the separate thread or fiber stacks. If we go back to the initial example, we saw that we needed over a million arguments to trigger this issue because this uses the main VM stack. However, you only need about 132,000 arguments to trigger this issue in a thread stack. One thing to be aware of is that the majority of request handling in Ruby web applications will be on a thread stack and not on the main VM stack. If your Ruby web application accepts JSON input and will splat the result of JSON input, triggering a SystemStackError remotely may be possible with less than 300 kilobytes of uploaded data.

00:14:54.040 If you're on a Ruby web application using a fiber per-request approach, like in Falcon, it may take only 16,400 arguments to trigger a SystemStackError. Furthermore, if you're tuning your Ruby web application to decrease the stack size per fiber or thread for better garbage collection performance or reduced memory usage, triggering a SystemStackError may be possible with just over 2,000 arguments. With that background, let’s get back to our example.

00:15:37.720 To refresh, we discussed this section, which copies all the objects from the splatted array to the VM stack. After this, we increase the number of arguments the method is called with, using the length of the splatted array, subtracting one for the array itself. However, before doing any of this copying, we check whether copying all of the arguments from the splatted array to the VM stack would overflow the stack so that a SystemStackError can be raised instead of the program crashing. This is the check that gets hit when you splat a large array. Unfortunately, there is no way to avoid this issue as long as you take the approach of copying the arguments to the VM stack. The only way to avoid this approach would be to not pass the arguments on the stack.

00:16:40.959 If we go back to the initial backtrace when calling 'itself' without any arguments, we see the function calling 'rb_object_itself' is named 'rb_vm_call_cfunc.' The prototype for that function looks like this: it takes the receiver of the Ruby method, the number of arguments the method is called with, a pointer to the first argument for the method, and a function pointer for the C function to call that implements the Ruby method. So, taking a step up, the function calling 'rb_vm_call_cfunc' is named 'VM_call_cfunc_with_frame,' and that function is large, so I’m going to focus on the line calling 'rb_vm_call_cfunc.' You can’t really tell that it’s calling 'VM_call_cfunc' because that’s actually stored in an invoker function.

00:17:25.399 The important part here is that the value being passed is a pointer to the first argument. In this case, it's always passing a pointer to the VM stack, and the VM stack is quite limited in size. If we could change this call to pass a pointer to somewhere else in memory, calling methods defined in C with a large array splat should work. We can start that process back in the VM_caller_setup_arg_splat function. Let’s reduce the code and focus on the case where there is an array to splat. If some condition is true, we should use an approach that does not copy arguments to the VM stack.

00:18:11.999 One simple condition is that if the number of arguments being passed to the method plus the length of the splatted array exceeds some high number, say a thousand, then we should not use the VM stack for argument passing. But how should you pass the arguments if you're not going to pass them on the VM stack? One of the easiest ways to do this is to create a temporary Ruby array for the arguments, and instead of passing a pointer to the stack, you pass a pointer to the first element of that temporary array. Here’s the code that implements that approach.

00:18:56.560 We begin by creating a Ruby array with the expected capacity. We then hide the array to ensure that it is not accessible via the object space. Next, we copy the method arguments before the splatted array into the temporary array. Then, we copy the elements of the splatted array into the temporary array. We still pass a pointer to the temporary array of arguments on the VM stack, so we need to adjust the stack pointer to reflect that. Since there is only one argument on the VM stack, we adjust the calling information to only show one argument. This is correct from a stack perspective, but it is incorrect from a method-calling perspective.

00:19:38.400 Finally, we set a new flag in the calling information named 'HeapArg' to indicate that the arguments are being passed on the heap in a temporary array instead of on the VM stack. Some of you may be wondering how the code that copies the arguments before the splat and the splat itself handles any arguments after the splat. It turns out you don't need to worry about that because the compiler implicitly converts method calls with arguments after the splat into a single array that is then splatted.

00:20:25.679 So, the method argument handler does not need to worry about arguments after the splat. However, if you splat multiple arrays like this, Ruby combines them both into a single array that is then splatted, so the method argument handler does not need to worry about multiple splats either. However, what about keyword arguments, either literal keyword arguments or keyword argument splats? It turns out this is a single case since Ruby's compiler compiles a literal keyword argument after an argument splat by creating a new hash and then keyword splatting that hash. In Ruby 3.2, for all of these calls, the compiler will actually combine the argument splat and the keywords into a single array that is splatted and then set a special flag.

00:21:13.760 Now, in Ruby 3.3, Kichi changed the compiler to not combine the keywords into the argument splat, which avoided three unnecessary array allocations and an unnecessary hash allocation. However, that means the code in 'rb_vm_call_cfunc' does need to handle keyword argument splats to get the correct behavior. So, to ensure that the keyword splat is passed properly when using a temporary array to hold the arguments, we push the keyword hash onto the array, but only if the keyword hash is not empty since empty keyword hashes are ignored.

00:22:06.240 Alright, back to the function for calling methods defined in C. Now that we have set up the temporary array for arguments on the heap, we need to change the code to use the pointer to the first element of that array instead of the pointer to the VM stack. We also need to recognize when to use the temporary array for arguments, which we can do by checking for the 'HeapArg' flag. Here’s the code for using the temporary array for arguments. We get a pointer to the temporary array containing the arguments, and from there, we get a pointer to the first element of the temporary array.

00:22:57.600 Now, that pointer points to the heap and does not point to the VM stack. We use the length of the temporary array as the number of method arguments, and finally, we call the C function using the correct number of arguments and a pointer to the first element of the temporary array. While the changes mentioned are the most important ones, there were a few adjustments needed in other functions just to check if this 'HeapArg' flag is set and handle things differently in that case. I don't have time to discuss all the other places that needed to be changed, but it required adding about five different checks for this 'HeapArg' flag.

00:23:49.920 After making those changes, it didn’t take much more work to get Ruby's test suite passing with the updates. Unfortunately, while the tests passed, there was a minor slowdown as a result. The additional checks slowed down method calling microbenchmarks by up to 7%, and considering that few users are affected by this bug, slowing down method calling to such a degree is not acceptable. I tried a couple of different approaches to speed up the implementation, and ultimately I was able to minimize the performance difference to a level that even the microbenchmark did not show it as significantly slower.

00:24:44.160 However, my approach was a little invasive, making multiple changes to a couple of methods used in the generic method dispatch code. Since this issue had already been fixed for methods defined in Ruby, and we were trying to fix it for methods defined in C, it would be preferable if the changes could be made only to the dispatch code for methods defined in C. In discussions with Kichi, I mentioned we could switch to that approach, but it would result in more duplication, and Kichi took it upon himself to implement that non-invasive approach. It turns out I was right about the duplication.

00:25:36.640 Now, here’s the 'VM_call_cfunc' function before the changes. To give you a correct perspective, I’m going to shrink the font size. Now let's see the changes: as you can see, there is quite a large increase in lines. These two lines in the normal case, when you're not passing a large argument splat, use the standard argument setup functions, and those two lines are replaced with expanded and customized versions in the case where you are splatting a large array. Now there's a maintainability trade-off here.

00:26:18.720 One option is to add features to the generic functions, making the functions more complicated. This can make the generic functions slower and more difficult to debug code that does not benefit from the features. On the plus side, it can make maintenance easier, especially if a future code can also benefit from the features added to the generic functions. My approach chose this trade-off, while the other option is to duplicate the parts of the generic functions you need and modify them for a specific use case. This can result in faster code and localizes the change, but can complicate maintenance if you need to make the same change in multiple places.

00:27:22.920 Ki's post-patch was merged back in January, ensuring that the second oldest bug would be fixed in Ruby 3.3. Or so I thought when I started working on this presentation. It turns out this bug is not a single bug but a general class of bugs. My initial patch and Koichi's commit only fixed a single instance of this class of bugs. Therefore, this issue was not fixed—or at least not completely fixed. Now, I’m going to go over some code that still raised the SystemStackError even after the patch was committed.

00:28:41.600 I mentioned earlier that Kichi made changes in Ruby 2.2 to allow this code to work; however, if you replace the normal singleton method definition with the equivalent defined singleton method call, it raises a SystemStackError. To understand why normal Ruby method definitions work but the block-based approach fails, you need to understand that Ruby has multiple method types, each of which is handled differently internally. Ruby has a C function named 'VM_call_method_each_type,' which handles all of the different method types that Ruby supports. A simplified version of that function is shown here.

00:29:56.440 This function uses a C switch statement on the type of the method, similar to a Ruby case expression. Normal Ruby methods defined with 'def' use the ISC method type. The logic specific to calling ISC methods is in 'VM_call_isc_setup,' and this is the code path that Kichi fixed in Ruby 2.2 to allow methods to accept large argument splats. Ruby methods defined with C functions use the C funct method type, and logic specific to calling C funct methods is in 'VM_call_cfunction,' this is the code path fixed by my original patch and Koichi's patch. Ruby methods defined with 'defined_method' or 'define_singleton_method' use the B method type, which is likely short for block method because these are methods defined by blocks. The logic specific to calling B methods is in 'VM_call_B_method' and this is one code path that had not been fixed, which is why calling a B method with a large argument splat raises a SystemStackError even after Koichi's patch was merged.

00:31:10.640 Unrelated to the current bug but useful to note is that Ruby uses a call cache to improve performance. The first time you call a method, the call cache directs you to the slower generic method dispatch functions. However, once Ruby figures out a more specific method handler, it updates the call cache so that future method calls at the same call site can jump directly to the more specific method handler. Going back to the 'VM_call_B_method' function, we need to determine why it fails. Here’s the definition of the 'VM_call_B_method' function. There’s one similarity here with the 'VM_call_C_func' function we fixed; all callers of this function are vulnerable to this issue.

00:32:40.919 In Ruby 3.2, all method types other than ISC use 'call_setup_arg,' which is why they were all vulnerable to this issue. I mentioned earlier that my initial patch to fix this issue made the generic argument setup code more complicated, while Koichi’s patch was less invasive because it only modified the logic for C funct methods. The duplication approach may be preferable when you only have a problem in a specific case; however, because all usage of 'call_setup_arg' is vulnerable to this issue, the duplication approach is not maintainable. The only maintainable approach that fixes this issue in all cases requires complicating the generic code.

00:33:23.680 Here's the initial code I discussed earlier, to avoid the SystemStackError. Now, between the time I originally submitted my pull request and when I started working on this presentation and discovered these additional bugs, Kichi had modified the related code to significantly improve performance. So, I had to make some changes to my original patch to adjust for Kichi's changes. One of the changes is that at the point this function is called, the splatted array has already been removed from the VM stack. Another change is that any keyword splat is passed as a separate argument and not as the last element of the splatted array. Instead of calling 'RB_ARRAY_NEW_CAPACITY' and 'RB_OBJECT_HIDE' functions, I found there was an 'RB_ARRAY_HIDDEN_NEW' function that combined those two features, so I switched to calling that.

00:34:33.599 The capacity of the array is set to the number of arguments before the splat, plus the number of arguments in the splatted array, plus one extra for a possible keyword hash. The other important change is that I switched to storing a pointer to the temporary array in the calling structure instead of only storing the pointer on the C stack or the VM stack. This was mostly for simplicity, so I could easily reference the temporary array later. If we go back to the 'VM_call_B_method' function, we can modify it to use this new approach.

00:35:26.640 Here are the changes needed. The first change is the addition of an argument to 'call_setup_arg' to indicate whether it's safe to use a temporary array for arguments. In this case, it is safe; if there was a temporary array created to handle a large argument splat, we get a pointer to the first element in that array. This behavior differs from allocating space on the C stack and copying the arguments from the VM stack to it. We need to decrement the stack pointer by two since the array is considered a single argument, and this adjustment is essential to avoid mismatches and fails in C function pointer consistency checks.

00:36:23.320 Unfortunately, this fix did not solve the issue for B methods. There is a similar issue that arises later in the 'VM_call_method_missing' function. After calling 'VM_call_B_method_body,' the VM calls a function named 'invoke_ISA_block' from C, which again copies all arguments to the VM stack, triggering a SystemStackError if copying the arguments would overflow the stack. If you remember a few slides ago, the code that called this function copied the arguments from the VM stack to the C stack and then copied the exact same arguments back to the VM stack. This unnecessary copying is one of the reasons that calling B methods is slower than calling normal Ruby methods.

00:37:02.720 I worked around this issue by using the same approach I used in the 'call_setup_arg' code by creating a temporary hidden Ruby array to store the arguments. If the number of arguments provided exceeds the limit, we call a newly added 'VM_rb_array' function to create a temporary Ruby array, copy the arguments from 'argv' into it, and return a pointer to the Ruby array. We actually always pass an array of two elements; in this case, the first argument is the temporary array, and the second argument is always a keyword hash. If 'keyword_splat' is true, the last argument in 'argv' is the keyword arguments, so it’s removed from the array and placed as the keyword hash. If 'keyword_splat' is false, we add an empty hash for keyword arguments so Ruby will not treat a flagged keyword hash passed as a regular argument as a keyword.

00:37:55.760 With those changes, you can now call methods defined with 'defined_method' or 'define_singleton_method' using larger array splats. However, this was not the only case that previously failed. There were other cases that had similar issues. While splatting a large array started working with normal Ruby method definitions in Ruby 2.2, if you called the same method using 'send', you would still get a SystemStackError. If you used 'symbol_to_proc' and called the resulting proc with a large array splat, you also got a SystemStackError. If you created a method object for the method and called that with a large array splat, it would raise a SystemStackError.

00:38:45.200 If you defined 'method_missing' and called a method that did not exist with a large array splat, it would raise a SystemStackError. Additionally, passing a large number of arguments in a C extension using 'rb_yield_block' also resulted in a SystemStackError. These were the most important cases that needed to be handled where it could be useful to pass a large number of arguments. There were also some cases where the method type only accepts zero or one argument, but passing a large array splat would raise a SystemStackError instead of an ArgumentError.

00:39:37.160 For example, if you go back to the normal Ruby method definition and you change this argument splat to a regular argument when calling the method with a large array splat, you should ideally get an ArgumentError raised. However, Ruby raises a SystemStackError in that case. Similarly, if you have a class that uses 'attr_reader', calling the method defined with a large array splat raises a SystemStackError instead of an ArgumentError. This same issue is true of 'attr_writer' as well. If you create a struct class, calling a member reader method for an instance of that struct with a large array splat raises a SystemStackError instead of an ArgumentError. The same issue persists for the member setter method.

00:40:24.240 Squashing this bug for both methods that accept large numbers of arguments and to ensure that the correct exception is raised for method types that only accept zero or one argument took quite a long time. One reason for this was that this issue was not a single bug but a general class of related bugs. Another challenge was that fixing these bugs felt like an ongoing game of whack-a-mole, where you'd fix an instance of a bug, and that would break other test cases, prompting you to fix those cases, and just when you thought you were done, something else would pop up.

00:41:09.919 It's worth noting that almost no code in Ruby's test suite passes a large array splat, so running the test suites that come with Ruby will be unlikely to catch issues with the changes I was making. To gain higher confidence that the bug fixes were actually working, I did most of my debugging by enforcing temporary array usage for all method calls with splats instead of the default behavior of only creating a temporary array for splats of large arrays. Eventually, I was able to get the entire test suite passing, fixing all cases I was aware of that raised a SystemStackError when passing a large array.

00:41:48.480 However, just because all bugs have been fixed does not necessarily mean that the fixes should be committed. When deciding whether to commit anything, it's essential to weigh the costs and benefits and feel that the benefits are worth the costs. In this case, the benefit is that you should be able to splat an array when calling any Ruby method and get the expected behavior regardless of the size of the array. I think that's a significant benefit, but the actual need to pass a large array as a splat is rare, and it's almost always better to structure your code to pass a large array as a normal argument.

00:42:22.680 Additionally, as I showed earlier in the presentation, if this issue is not fixed, it may be possible to remotely trigger a SystemStackError in some Ruby web applications. Now, there are significant costs associated with the fixes: for one, the fixes are invasive, much more so than my initial attempt to fix this issue just for C funct methods. Furthermore, there is a minor slowdown associated with this change. With all the work to improve Ruby's performance, committing this patch can feel like a step backward.

00:42:58.680 One reason for this slowdown is the extra checks added for every call to a method that’s not written in Ruby. This entails extra checks for whether a temporary array is used, even if the method call did not use a splat in some cases. Furthermore, the code to handle arguments in temporary arrays also inflates the code, resulting in additional instruction cache misses because less of Ruby's code may fit in the processor's instruction cache. At first, I thought it would be unlikely that committers would accept the minor slowdown from these corner cases.

00:43:44.680 Fortunately, during my work on fixing these bugs, I learned a lot about Ruby's internals. With this knowledge, I developed a series of patches to optimize method calling in certain cases. My goal with these patches was to offset the minor slowdown introduced by the bug fixes. One optimization was to change B method calling. Here’s the code for 'VM_call_B_method.' One thing to notice is that this function always calls 'caller_setup_arg,' which allows it to flatten the arguments into a C array and pass a pointer to the first argument to 'VM_call_B_method_body'.

00:44:18.680 I determined that this call to 'caller_setup_arg' is unnecessary in the common case where the block for the B method is defined in Ruby. However, it is needed in other cases where the block is defined in C or created by 'symbol_to_proc.' Thus, I decided to split B method call handling into two paths: one to handle the case where the block was defined in Ruby and another for other blocks. All this code is essential to determine whether the block is defined in Ruby. If the block is defined in Ruby, we call an optimized function that does not need to use 'caller_setup_arg.' If the block is not defined in Ruby, we call another function that does utilize 'caller_setup_arg,' similar to the previous implementation.

00:44:58.680 In either case, before calling the function, we update the call cache. We do this so that future calls at the same call site can jump directly to the optimized function and skip all of this code. I added benchmarks and found that these changes improved B method calling by 40% in simple cases and up to 180% in keyword argument cases. Another optimization was related to method missing calls. Here is the top of 'VM_call_method_missing_body' before the bug fix and optimization.

00:45:36.680 I determined that the call to 'caller_setup_arg' here is entirely unnecessary, as method missing calls do not modify existing arguments; they only insert an argument before those arguments. Thus, the call to 'caller_setup_arg' can be removed completely. However, we do still need to fix the calling flags. The calling flags include information about whether the call includes an argument splat or keyword arguments. Prior to removing the 'caller_setup_arg' function call, new calling flags were created. However, when removing this function call, we can simply copy the calling flags from the original method call that resulted in the method missing. This adjustment resolves the issue and improves method missing calls by 10% for simple calls and up to 110% for calls involving keyword arguments.

00:46:38.360 Similarly, I made a change to symbol procs, which are procs created by 'symbol_to_proc.' This improvement resulted in a 5% performance boost for simple calls and up to 100% for keyword argument calls. I also made a simpler change for method calls using 'send,' which improved performance by 5% for simple calls and up to 115% for keyword argument calls. To see the overall effect of the bug fixes combined with the performance optimizations, I used 'wet_bench.' Wet Bench was developed by Shopify to test web performance but is also useful as a set of general benchmarks.

00:47:20.080 It contains 33 separate benchmarks, and the results showed that performance was about the same for 12 of the benchmarks. Eight of the benchmarks got slower, in some cases by up to 3% slower, while 13 benchmarks became faster by up to 10%. This indicates that the performance increase from the optimizations outweighed the performance decrease resulting from the bug fixes. After finishing those method calling optimizations, I brought the issue up as a topic back in April during the monthly developer meeting.

00:47:59.880 This allowed other committers to provide feedback and for Mats to decide whether the benefits of the changes were worth the costs associated with them. There was some concern over the performance and invasiveness of the changes, but ultimately, I received approval to merge them. Now, there was one remaining issue. Although I had fixed all the bugs I was aware of in the code, it's essential to add a caveat. In truth, I had only fixed all bugs I was aware of in the virtual machine.

00:48:45.480 It turned out that Wet did not support the changes I was making. The Wet tests on ARM64 generally failed, with occasional failures on AMD64. So, I discussed these issues with the Wet team, and thankfully, they were able to address the Wet issues. After fixing these Wet issues and completing one last round of testing, I was able to merge the changes, ensuring that the second oldest bug and all of its incarnations would finally be fixed in Ruby 3.3.

00:50:18.560 Second, fixing an old bug is often a learning process that teaches you useful things you probably wouldn't have learned otherwise. Without the experience I gained from fixing these bugs, I would not have implemented the performance optimizations I discussed. Finally, don't worry if you can't fix a Ruby bug completely by yourself; as this issue demonstrated, other committers will likely be available to help improve your bug fixes for them to get committed.

00:51:05.080 Currently, Ruby has over 50 open bugs in the bug tracker that are over five years old, just waiting for you to fix. We look forward to your contribution. I hope you had fun learning about Ruby's second oldest bug and how we fixed it in Ruby 3.3. If you enjoyed this presentation and want to read more of my thoughts on Ruby programming, please consider picking up a copy of 'Polished Ruby Programming.' And that concludes my presentation. I'd like to thank all of you for listening!

00:51:55.680 Thank you!

Jeremy Evans

@jeremyevans