RubyKaigi 2022

Fixing Assignment Evaluation Order

RubyKaigi 2022

00:00:09.059 In this presentation, I'm going to discuss assignment evaluation order. I will explain why assignment evaluation order was inconsistent in older versions of Ruby and how we fixed it in Ruby 3.1 and 3.2.
00:00:15.480 My name is Jeremy Evans. I'm a Ruby committer who focuses on fixing bugs in Ruby. I am also the author of "Polished Ruby Programming," which was published last year. This book is aimed at intermediate Ruby programmers and focuses on teaching the principles of Ruby programming, as well as trade-offs to consider when making implementation decisions.
00:00:23.580 So, first, what is assignment evaluation order? It's not something that most Ruby programmers need to think about, but all assignment expressions have an order in which the expression is evaluated.
00:00:30.779 Here's a simple assignment example. We start by defining a method named `a` that returns an array. Then we have a method named `b` that returns one. We can now consider the assignment expression and its evaluation order.
00:00:41.640 How is this assignment expression evaluated? Note that there are three separate method calls in this assignment: there is a method called `a` which will return the array, a method called `b` which will return one, and then there's a call to the element assignment method. It is clear that this method must be called last because it's applied to the result of `a` with the argument of zero and the result of `b`.
00:01:00.539 However, it is probably not obvious which of these two methods is called first, `a` or `b`. In this case, Ruby calls `a` first, then `b`. So the assignment evaluation order is `a`, then `b`, followed by the element assignment method.
00:01:13.920 The reason that `a` is evaluated before `b` is that Ruby follows the left-to-right evaluation principle, meaning that in a typical expression, the left part is evaluated before the right part. You can see this left-to-right evaluation principle in other cases in Ruby. For example, consider this code with the same definitions of `a` and `b`. How does Ruby evaluate this expression?
00:01:40.079 With the left-to-right evaluation principle, `p` comes first. However, Ruby cannot evaluate the `p` method call yet because the results of `a` and `b` are arguments. So it continues: Ruby first evaluates `a`, then it evaluates `b`, and now that it has the results of the method calls to `a` and `b`, it can evaluate the method call to `p`.
00:02:04.860 More precisely, the left-to-right evaluation principle in Ruby means that Ruby will evaluate expressions in a left-to-right order, delaying the evaluation of the current expression if it depends on expressions to the right, until all expressions it depends on have been evaluated. Unfortunately, this left-to-right evaluation principle was not consistently implemented in older versions of Ruby.
00:02:59.519 Here is a case where older versions of Ruby did not follow the left-to-right evaluation principle. Using the same method definitions of `a` and `b` as before, we see that the assignment expression uses multiple assignments. Following the principle of left-to-right evaluation, you would assume Ruby would first evaluate the call to `a`, then the call to `b`, and then perform the element assignments.
00:03:44.400 However, that is not what happens. Before Ruby 3.1, it violated the left-to-right evaluation principle by first evaluating the left call to `b`, then the right call to `b`, followed by the left call to `a`, and finally the left and right element assignments. This evaluation order issue for multiple assignments was known for a long time; it was reported and confirmed as a bug before the release of Ruby 1.9.3, reported by `endoson`. This issue was one of the oldest still open bugs in Ruby's bug tracker, with over ten years between when it was reported and when it was fixed.
00:04:21.360 This bug was reported in Japanese. I don't speak Japanese, so I had to rely on a Google translation of the bug report and the initial discussion. Apparently, from the translation, the bug was considered difficult to fix by Mats, and Mats was not wrong. Of all the bugs I fixed in Ruby, this was the most challenging bug to fix, and it took far more time to fix than any of the other bugs I have worked on.
00:04:38.880 Not directly related to the multiple assignment order evaluation issue, there was a similar evaluation order issue for constant assignment. Here's an example: the code is similar to before, except that the `a` method returns a module. We have the assignments, and the left-to-right evaluation principle indicates that we should first evaluate the `a` method, then the `b` method, and then set the constant on the result of the `a` method.
00:05:10.860 However, what older versions of Ruby do is evaluate `b`, evaluate `a`, and then set the constant on the result of the `a` method. This constant assignment evaluation order issue was reported in 2019, and I ended up fixing this issue shortly after fixing multiple assignment. Thanks to the work I did on fixing multiple assignments, it was much easier to fix constant assignment, but the fix for constant assignment evaluation order was not merged into Ruby until after Ruby 3.1 was released.
00:05:42.240 Ruby's assignment evaluation order will not be fully consistent until the release of Ruby 3.2. I am first going to discuss the issues related to multiple assignment evaluation order, since that's the more complex and challenging case.
00:06:00.600 It may not be obvious at first glance why fixing multiple assignment evaluation order would be so challenging. After all, it seems like you are just changing the order in which the evaluations occur—moving code from one section to another. However, it's unfortunately not as simple as it seems.
00:06:30.660 One of the main reasons for this complexity is that Ruby uses a stack-based virtual machine. Ruby compiles source code into instructions that are executed on this virtual machine; most of the implementation complexity regarding left-to-right evaluation comes from having to correctly manage the virtual machine stack.
00:06:55.740 So, what do these virtual machine instructions look like? Ruby comes with tools that allow you to see the generated instructions. You can run your Ruby program with the dump option with an argument value of `I`, which stands for dumping instructions. This will allow us to see how multiple assignment instructions changed between Ruby 3.0 and Ruby 3.1.
00:07:32.040 We will start with a slightly different example. Here we have a multiple assignment expression, which sets the `b` attribute on `a` and assigns the first element of `c` with the values of `d` and `e`. When you run Ruby 3.0 with the option to dump the instructions, you get the following instructions. As Ruby uses a stack-based virtual machine, to understand how these instructions work, you also need to consider the virtual machine stack.
00:08:59.760 Currently, the stack is empty. As mentioned earlier, Ruby 3.0 evaluates the right-hand side of the assignment first. The first expression on the right-hand side of the assignment is the method call `d`, which is implemented using two instructions: the first instruction, `put self`, adds `self` to the stack (this will be the receiver of the `d` method). We will use a right arrow to show objects added to the stack by the current instruction. The next instruction is `opt_send_without_block`, used for certain method calls.
00:09:54.240 From looking at the data associated with this instruction, we can see that it calls the `d` method with zero arguments. Since the method call passes zero arguments, the virtual machine pops the top entry in the stack (which is the receiver of the method) and pushes the result of the method (we will call this `d`). We will use a left arrow to show objects removed from the stack by the current instruction. You can see that this instruction removes itself from the stack and replaces it with the result.
00:10:57.600 Next, Ruby has to evaluate the method call `e`. This is similar to the previous instruction: it pushes `self` on the stack and pops it off when calling the method, then it pushes the result of the `e` method onto the stack. This design continues for the next three instructions because this multiple assignment expression is the last expression being evaluated. The return value of the multiple assignment expression is needed and will be an array containing the right-hand side values.
00:11:51.660 The `new_array` instruction indicates a value of 2, which tells the virtual machine to pop the top two entries from the stack and push an array containing those two entries. Next, `dupe` takes the top entry on the stack and pushes it onto the stack, meaning the top two entries on the stack will remain the same after the `dupe` instruction. The `expand_array` instruction pops the top entry from the stack (which should be an array) and, with arguments of 2 and 0, pushes the first two elements into the stack in reverse order, resulting in `e` and then `d`.
00:13:05.019 Now Ruby evaluates the method call `a`. This is similar to the method calls to `d` and `e`, and the result of `a` gets pushed onto the stack. The next four instructions implement the `b=` method call. The top `n` instruction is used to push an existing stack entry onto the stack, using an offset from the top of the stack. The argument value of 1 means that the second entry will be pushed onto the stack.
00:14:02.700 The next instruction is another `send` instruction, but unlike the previous send instructions, it takes one argument. This means the virtual machine will pop two entries from the stack: the top entry (which will be the argument to the method) and the second from the top (the receiver of the method). This calls the `b=` method on `a` with the value returned by `d`, and it pushes the result of the `b=` method onto the stack.
00:14:58.560 The following two instructions are both pop instructions, each removing the top entry from the stack, which we don't need anymore. Next, Ruby needs to evaluate the method call `c`, which is similar to `d`, `e`, and `a`, and its result gets pushed onto the stack. Finally, we reach the element assignment method call for setting the first element in `c`.
00:15:29.880 The first instruction is `put object into fix 0`, which pushes the number zero onto the stack, corresponding to the first argument of the element assignment method. The next instruction is `top n` with an argument of 2, which means pushing the third entry on the stack (the second argument). The next instruction is `opt a set`, an optimized instruction for the element assignment method. This method call takes two arguments: the top entries (arguments) and the third entry in the stack as the receiver.
00:16:15.180 The result of calling this method will be pushed onto the stack. The following instructions are another two pop instructions, which remove the values of the element assignment method or the return value of the `e` method as they are no longer needed. The final instruction is `leave`, which is used for return values. Since this is the last expression being evaluated, the top entry on the stack is popped as the return value.
00:17:04.620 This means the return value will be the array containing `d` and `e`, as this is the right-hand side of the assignment. Thus, that is how multiple assignment was implemented in Ruby 3.0.
00:17:33.660 You gain an appreciation for how much Ruby does for you when going through each instruction. Now here are the instructions when using Ruby 3.1. You can see right away there are more instructions. While some of the instructions are the same, they have been reordered, and there are more instructions that deal with stack manipulation. In Ruby 3.0, this code resulted in six stack management instructions (such as dupe, pop, and top n); in Ruby 3.1, there are 13 stack management instructions.
00:18:41.460 As you might expect, due to this, multiple assignments with attribute and element assignment are slightly slower in Ruby 3.1 than in Ruby 3.0 because of the cost of correctness. Now that you have some experience interpreting these instructions, we can pick up the pace.
00:19:32.340 As I discussed, Ruby 3.1 uses the left-to-right evaluation principle for multiple assignments. It first evaluates the call to `a` and pushes the return value onto the stack, then evaluates the call to `c` and pushes that return value onto the stack. Before evaluating the right-hand side of the assignment, it evaluates the first argument to the element assignment method. If this argument's method call were to be evaluated, it would be done before the right-hand side of the assignment.
00:20:13.560 Next, Ruby evaluates the `d` and `e` method calls and pushes the results of those onto the stack. Ruby uses the same `new_array`, `dupe`, and `expand_array` instructions when the result of the multiple assignment is needed, and it pushes an array containing `d` and `e` onto the stack, resulting in `d` and `e` switching places because `d` will be used first.
00:21:11.940 Next, we look at the instructions Ruby will use to implement the `b=` method call, going through it instruction by instruction. The `b=` method call is called on the result of `a`, which requires pushing the result of `a` to the top of the stack using a `top n` instruction with an argument of five since `a` is in the sixth place in the stack.
00:21:53.220 Then, a `swap` instruction is used to swap the top two entries on the stack because the following instruction is a method call that takes one argument. We want to have the receiver of the method second from the top and the argument to the method on the top. After this, we execute the `b=` method call, which pops the top two entries and pushes the return value of the `b=` method onto the stack.
00:22:39.840 Now, we don't need the return value of the `b=` method, so we immediately pop that from the stack. Next, Ruby needs to evaluate the element assignment using similar instructions. We will go through this section instruction by instruction, starting with a `top n` instruction with an argument of three to push the receiver of the element assignment method `c` onto the stack.
00:23:14.940 Next, it uses the same `top n` instruction with an argument of three to push the first argument of the element assignment method onto the stack. Even though `top n` uses the same argument, it pushes a different stack value because the previous instruction already pushed an entry onto the stack.
00:24:06.780 Then, there's another `top n` instruction using an argument of two to push the second argument of the element assignment method onto the stack. The following instruction calls the element assignment method with two arguments, which will pop the top two arguments and the receiver from the stack, pushing the return value of the element assignment method onto the stack.
00:24:44.520 The return value is not needed, so it is popped from the stack, alongside the return value of `e`, which is also popped. The next instruction we haven't seen before is `set n`, which replaces the stack position at a given offset from the top of the stack with the value currently at the top of the stack. In this case, with offset three, we replace the bottom stack entry with the top entry to facilitate returning the right-hand side of the assignment.
00:25:57.479 This means we no longer need the top three entries on the stack, so we pop each of them, and then we return the result of the multiple assignment, which will be the array containing values `d` and `e`. While this is more complex than the Ruby 3.0 code, the Ruby 3.1 instructions correctly implement the left-to-right evaluation principle.
00:27:06.420 The challenging part lies in using the correct stack management instructions while keeping track of what offsets to use for the `top n` instructions. For instance, when making small modifications to the code.
00:27:49.380 Here, we'll compare the Ruby 3.0 and 3.1 instructions side by side. Adding an argument to the element assignment method affects several instructions in Ruby 3.0, with minimal changes localized to the instructions. By contrast, in Ruby 3.1, the addition of a single argument influences many instructions, illustrating that the offsets depend on the number of arguments in each element assignment method on the left-hand side.
00:28:36.480 When adding a local variable at the end of multiple assignments, the argument to the `expand_array` instruction will change, but all other instructions remain the same in Ruby 3.0. Meanwhile, Ruby 3.1 shows additional shifts in offsets based also on local variables added into multiple assignments.
00:29:32.520 Lastly, we discuss the implications of adding a local variable and how the net result does not change with offsets. The complexity of multiple assignments in Ruby goes beyond mere attribute and local variable counts. We also need to address splats, which don't seem to change offsets but does add a layer of complexity in actual implementation.
00:29:58.620 In conclusion, after exploring all these small changes and their impact on instruction generation, we recognize that several factors affect the top `n` and `set` instructions: the total number of assignments, types of assignments, order of assignments, number of element assignment method arguments, use of nesting, and whether the return value of assignment expressions is utilized.
00:30:45.180 Even with all that information, Ruby's compilation process poses challenges for effective instruction generation. The Ruby source code undergoes parsing, resulting in an abstract syntax tree, which complicates the prompt generation of the correct instructions.
00:31:27.600 Now, just as in the case of multiple assignment, constant assignments did not follow the left-to-right evaluation principle in older versions of Ruby. In Ruby 3.1, we see a violation where `c` is evaluated before `a`. In Ruby 3.2, `a` is evaluated before `c`, aligning with the left-to-right evaluation principle.
00:32:10.740 The lessons learned include: just because a bug is old does not mean it cannot be addressed. Both assignment evaluation order bugs existed since multiple and constant assignments were included in Ruby, and both were known for multiple years before I started working on them.
00:32:47.640 Bugs that are easier to fix are usually resolved quickly—an older bug could indicate more significant underlying issues. Hence, treat them as opportunities to enhance your understanding of Ruby.
00:33:25.380 Ruby currently has over 75 open bugs in the bug tracker that are more than five years old, just waiting for fixes from contributors.
00:33:59.940 I hope you had fun learning about how we fixed assignment evaluation order in Ruby 3.1 and 3.2. If you enjoyed this presentation and want to read more of my thoughts on Ruby programming, please consider picking up a copy of "Polished Ruby Programming."
00:34:38.820 That concludes my presentation. I'd like to thank all of you for listening. A special thanks to Cookpad for sponsoring my travel to RubyKaigi. I believe I’m out of time, so if you have any questions, please feel free to ask me during the break.