RubyConf 2021

Optimizing Partial Backtraces in Ruby 3

Optimizing Partial Backtraces in Ruby 3

by Jeremy Evans

Main Topic

The video titled "Optimizing Partial Backtraces in Ruby 3" presented by Jeremy Evans at RubyConf 2021 focuses on the optimization of backtrace generation in Ruby, specifically addressing the improvements made in Ruby 3 regarding partial backtraces.

Key Points Discussed

  • Importance of Backtraces:

    • Backtraces are crucial for debugging in Ruby as they provide context about where errors occur in source code. Without them, pinpointing issues becomes significantly more challenging.
  • Performance Issues in Older Ruby Versions:

    • In earlier Ruby versions, generating even partial backtraces (a single frame) was slow due to the redundant processing of unnecessary frames. This inefficiency affects memory usage and overall performance.
  • Optimization in Ruby 3:

    • Ruby 3 introduced optimizations to backtrace generation, significantly speeding up the process by filtering out unnecessary frames in partial backtraces. This enhancement allows for much faster access to relevant debugging information.
  • Complexity of Implementation:

    • The implementation of backtrace handling is done in C within Ruby's VM. The presentation translates the complex C algorithms into simplified Ruby pseudocode to illustrate how backtrace generation occurs.
  • Bugs and Subsequent Fixes:

    • After the initial optimization, bugs were discovered in the new algorithm that led to missing or incorrect entries in the backtrace. Jeremy describes the process of identifying these issues and the complex workarounds used to solve them. It required multiple iterations of debugging and enhancements, eventually leading to the new algorithm that was easier to manage and less prone to errors.
  • Lessons Learned:

    • Jeremy highlights the importance of not shying away from unfamiliar code, the inevitability of encountering new issues after optimizations, and recognizing when a completely new approach might be necessary over tweaking an existing one.

Conclusion and Takeaways

The video concludes with a reflection on how the continuous process of optimization can lead to increased complexity in code. The newer algorithms proposed are not just about performance gains; they also consider maintainability and potential future error reduction. The advancements in backtrace generation introduced in Ruby 3 represent a significant step forward in improving Ruby developers' debugging experience.

Overall, Jeremy Evans emphasizes that while performance is crucial, it's equally important to ensure the robustness and clarity of the underlying code involved in these optimizations.

00:00:10.000 I'm going to be discussing backtraces in this presentation.
00:00:11.679 I will discuss why backtraces are important, the implementation of backtraces in Ruby, why the production of partial backtraces was slow in older versions of Ruby, and how now, with Ruby 3, it is faster.
00:00:15.519 I will also discuss bugs that occurred as a result of the optimization, and how to fix those bugs.
00:00:22.400 My name is Jeremy Evans, and I'm a Ruby committer who mostly works on fixing bugs in Ruby.
00:00:29.439 I'm also the author of 'Polished Ruby Programming,' which was published a few months ago. This book is aimed at intermediate Ruby programmers and focuses on teaching principles of Ruby programming, as well as the trade-offs to consider when making implementation decisions.
00:00:41.040 Alright, so backtraces. Why do we care about backtraces? Well, consider programming without backtraces. Let's say you're working on a library, and after working on it, you run your tests, and it turns out one of the tests fails with an argument error.
00:00:52.960 The error message gives you an idea that something went wrong, but without a backtrace, you have no idea where the error is occurring, which makes debugging much more difficult.
00:01:20.159 Backtraces themselves are important, but how important is their performance, specifically the performance of partial backtraces? I actually don't know the answer to that question.
00:01:26.400 As mentioned, I focus most of my time on the Ruby bug tracker and fixing bugs, not implementing performance improvements. However, it happened that another Ruby committer filed issue 17031 on the tracker regarding partial backtraces not being optimized.
00:01:58.559 The other committer mentioned that they found this issue when checking memory usage in RuboCop. Since this performance issue was filed as a bug and since I fixed many Ruby bugs, I did some research into it.
00:02:29.440 This was the first time I had looked into how backtraces were generated in Ruby, which was a learning experience for me.
00:02:31.120 I think understanding how backtraces regenerate is interesting, and I'm guessing most of you are unfamiliar with the process as I was before working on this optimization.
00:02:38.400 So, let's go over this process now. Before understanding the backtrace generation process, we need to understand what backtraces actually are.
00:02:52.959 As in most programming languages, backtraces in Ruby are a representation of the call stack. Let's start off with an example of a call stack.
00:03:13.680 We start with a method `foo` that calls `bar` with a block that calls `caller`, passing zero as the argument to `caller`. This makes the backtrace include the call to this method. The default value for `caller`'s first argument is one, which excludes the call to `caller`.
00:03:39.840 Let's take the `bar` method and yield to see the result of the backtrace. The results will show that the dashes here represent the filename, in this case, represented by a dash since I was providing the Ruby code to execute on the standard input.
00:03:57.680 Now, the first entry in the backtrace is the call to `caller` inside the block passed to `foo`. Each `caller` block has a separate entry in the backtrace.
00:04:05.040 The second entry in the backtrace is the yield inside `bar`, which is the same as calling the block passed to `bar`, followed by the call to `bar` inside `foo`, and the last entry is the call to `foo` in the main script.
00:04:21.680 So, I mentioned earlier that the backtrace is a representation of the call stack. If you read the backtrace from the bottom to the top, you can see how each call was added to the stack.
00:04:54.800 When you call `foo`, it calls `bar`, which calls the block and yields. Now, let's dive into how backtraces are generated. I mentioned that backtraces represent the call stack, but Ruby does not have a separate data structure for the call stack.
00:05:38.560 Ruby uses a virtual machine to execute code. To execute your Ruby code, it compiles the source code into bytecode that is executed on the virtual machine. This VM processes the bytecode and keeps track of the execution of the current method or block using a control frame stack.
00:06:11.760 Each time your method or block is called, the VM pushes the frame onto the stack, and when the method or block returns, the VM pops the frame from the stack.
00:06:48.959 This control frame stack is not exactly a call stack, as it contains control frames that are not considered part of the call stack and are ignored during backtrace generation.
00:07:32.960 Like most parts of Ruby, the implementation of backtraces is written in C. The handling of backtraces is implemented in Ruby's `vm_backtrace.c` file, so to make the backtrace generation process easier to understand for Ruby programmers, this presentation will translate the algorithm used into Ruby pseudocode.
00:08:11.680 I'll simplify it slightly to avoid parts that are not important to this discussion. We'll start with the description of the backtrace generation process used in Ruby 2.7, which is mostly the same for earlier versions of Ruby, before describing the process following this optimization.
00:09:09.680 The backtrace generation process starts with the virtual machine's control frame. Since Ruby 2.7, the backtrace generation process starts at the earliest control frame and then runs towards the current control frame in the order each frame was added to the stack.
00:09:57.919 The first two control frames after the end control frame are dummy frames from the perspective of backtrace regeneration, so we'll ignore those. Let's assume the committed control frame returns the next control frame pushed after the caller.
00:11:59.360 Since the backtrace is returned as an array, we'll initialize an empty array for it. The backtrace generation process will iterate over the remaining control frames, needing to know the total number of frames and have a counter that increments for each control frame to avoid processing off the end of the control frame stack.
00:12:58.720 The majority of the backtrace generation process happens while iterating over the control frames, using a while loop. At the end of each loop, we will move to the next frame and increment the counter.
00:13:50.720 Now, let's go over each step of the main part of the backtrace generation algorithm.
00:14:08.320 I mentioned that different types of virtual machine control frames exist, and not all are considered during backtrace generation. Only two control frame types are considered: isec frames (instruction sequence) and c-funct frames.
00:16:04.320 Isec frames generally represent methods defined in Ruby, while c-funct frames represent methods defined in C.
00:16:37.760 These two frame types need to be treated differently during backtrace generation, as methods defined in C do not have location or line number information associated with them.
00:17:18.880 As a result, the location of c-funct frames is determined as the same as the preceding isec frame, which is why the backtrace generation process starts at the earliest control frame and runs towards the current control frame.
00:18:35.919 The backtrace generation process does not use all isec frames; it only uses isec frames that have an instruction pointer (or program counter). Isec frames without an instruction pointer can occur for internal function calls like the Enumerable's `group_by` method.
00:19:30.799 These ignored frames should not be included in the backtrace as they would complicate the entry creation by causing multiple entries for a single method call, one of which would lack location information.
00:20:33.600 The backtrace generation algorithm used in Ruby 3.0.0 ensures that we only add frames concerned with location information.
00:21:12.240 After recording the relevant position information, any future c-funct frames will be treated according to their local position, using the most recent isec frame data until the next isec frame appears.
00:21:49.760 Finally, we need to reverse the backtrace array because we want the backtrace ordered from the most recent call towards the oldest call.
00:22:48.480 At the end of this process, the arguments provided to `caller` are used to extract the entries of the backtrace that we are interested in. Now, `caller`'s default start value is one, and the default length is nil, meaning all backtrace entries after the first entry.
00:23:48.080 Hopefully, this explains why partial backtraces are slow. They require doing a whole bunch of unnecessary work. For example, if you want the most recent caller entry before the current method, you call `caller` with one as the start entry and one as the length.
00:24:49.440 This will return at most one backtrace entry, but to get that array of at most one entry, we will still create backtrace entries for most or all of the control frames before filtering it to the frames requested.
00:25:36.000 The problem was clear: we do unnecessary work that will be discarded, and the solution is also clear—stop doing unnecessary work and only add the backtrace frames that we care about.
00:26:06.000 To begin implementing this solution, we need to consider the start and length arguments passed to `caller`. For our initial example, we will assume that both start and length are provided correctly.
00:27:00.160 So, instead of iterating over the total number of control frames, we can use the start argument passed to `caller` to limit the number of iterations, which we will store in a variable called 'last'. By subtracting the desired length of the backtrace from the number of iterations, we can determine the starting point where we actually want to start adding backtrace entries.
00:27:53.839 We will continue to iterate from the same control frame as in the previous implementation, but until we reach the specified start point, we only record the last isec location for every isec control frame with an instruction pointer.
00:28:31.679 We should not actually add backtrace entries until we reach the required number.
00:28:45.919 We need to ensure that if the last entry in the return backtrace is a c-funct frame, it will have the correct location.
00:28:58.239 Next, if we are given conditions where start plus length is greater than the number of frames or if no length argument is provided, the starting point for backtrace collection should be set to zero.
00:29:33.839 This is crucial because we want to start the generation from the outermost frame in the main script.
00:30:14.080 So this is basically the algorithm that Ruby 3.0.0 shipped with. It passed all tests and specs, resulting in much faster partial backtraces.
00:30:56.960 However, unfortunately, this optimized algorithm turned out to be wrong. It took a month after the release of Ruby 3.0 for someone to report problems with it.
00:31:54.880 This optimization could lead to missing backtrace entries, implying that the loop iterations would not provide sufficient backtrace entries to match the expected count.
00:32:52.160 The problem occurs specifically when you encounter an isec control frame without an instruction pointer which may lose backtrace entries when looping through.
00:33:48.399 The solution was to decrement the counter when this happens, ensuring that you generate the intended number of backtrace entries. However, certain circumstances can cause issues like skipping off the end of the control frame stack.
00:34:45.079 To prevent this, a separate counter should be maintained to ensure that it stays within the bounds of the control frame stack.
00:35:58.560 I was pleased with these solutions but knew that with performance optimizations, you often need to check for regressions. Eventually, a second fix was committed.
00:37:29.679 The fix for this second issue was able to be implemented quickly but relied heavily on having a good test suite.
00:39:32.160 These issues showed that the initial optimizations could lead to problems, but as the code grew more complex, the measures needed to track correct backtrace entries expanded.
00:40:45.240 So, in conclusion, moving forward showed the importance of not only adopting optimizations but ensuring that all cases are covered adequately in the testing suite.
00:41:31.599 I hope you had fun learning about backtraces and their optimizations in Ruby 3.
00:41:49.679 If you enjoyed this presentation and want to read more of my thoughts on Ruby programming, consider picking up a copy of 'Polished Ruby Programming.'
00:42:34.320 The book publisher is currently running a giveaway for conference attendees, and I’d like to thank everyone for listening to my presentation.
00:43:35.040 If you have any questions, I'll be here for a few minutes.
00:44:16.840 Thank you again!