Popping Into CRuby

by Jemma Issroff

In the presentation titled "Popping Into CRuby" given by Jemma Issroff at RubyConf 2023, the speaker delves into the inner workings of CRuby, particularly focusing on the concept of 'popping' in relation to code execution and optimization. The talk aims to clarify how certain lines of Ruby code, specifically those with no side effects, do not impact performance due to a feature in the CRuby compiler that allows for the omission of unnecessary instructions during the compilation process.

Key Points:
- Understanding the Compilation Process:
- Ruby code is first parsed into an Abstract Syntax Tree (AST), representing the program’s structure in a tree format.
- Introduction to Popped Instruction Sequences:
- The term 'popped' refers to how the CRuby compiler identifies and eliminates unnecessary instructions before they become part of the bytecode, ultimately enhancing performance.
- The Role of AST and Linters:
- Tools like Linters and Language Server Protocols (LSPs) need the full AST, as they rely on the complete structure to operate accurately, thus they require every part of the code to be included in the parsing process.
- Prism and Error Handling:
- The new Ruby parser, Prism, improves error tolerance and provides clearer error messages compared to its predecessor, enhancing the development experience.:
- The YARV Virtual Machine:
- CRuby utilizes a stack-based virtual machine called YARV, which operates on a stack where values are pushed and popped, showcasing the efficiency of removing unneeded items before execution.
- Optimization by Popping:
- A highlight of Jemma's presentation includes a simplified example demonstrating how unnecessary strings can be optimized out by the compiler in a scenario where they are immediately popped from the stack, thereby not contributing to the instruction sequence.
- Implications of Popping:
- Understanding the distinction between elements that can be popped, such as literals and objects, versus those that cannot, like return values or locally assigned variables, is crucial for effective Ruby programming and optimization.

In conclusion, Jemma emphasizes the importance of recognizing the dynamic processes occurring in the background of Ruby applications. By understanding these optimizations, developers can write more efficient code that not only runs faster but also utilizes resources more effectively. The talk encourages attendees to engage with the community and further explore these concepts, equipping them with knowledge essential for advanced Ruby development.

00:00:19.359 Hello everyone, it's my pleasure to introduce Jemma Issroff.

00:00:22.439 Jemma is a Ruby core committer and works at Shopify as a senior developer on the Ruby infrastructure team.

00:00:26.960 I have to say it’s been a pleasure working with her. She is also a co-founder of wb.rb, the Women and Non-Binary Ruby Community.

00:00:32.880 You can applaud! Additionally, she's a co-host of the Ruby on Rails podcast. Please give her a warm welcome as she presents her talk titled "Popping Into CRuby."

00:01:00.559 Thank you! Good morning everyone. A little while ago, I was looking through a file in CRuby's codebase, specifically at Ruby compile.c.

00:01:04.239 As the name suggests, that’s where a lot of the compiler code resides. If that’s not familiar to you, my goal today is to help you understand what I mean by that by the end of this talk.

00:01:10.159 I was looking specifically at a method called IC_compile_each, and something caught my attention—a parameter named 'popped'.

00:01:18.760 The usage of 'popped' was sprinkled throughout this method, which is quite large, and I found its functionality intriguing.

00:01:25.360 Today, we're going to dive into the concept of 'popping' in CRuby. If we haven't met yet, I’m Jemma, she/her pronouns. I would love to meet you, so please come say hello at any point during the conference.

00:01:41.760 As my manager mentioned, I work at Shopify on the Ruby infrastructure team, and I'm also a co-founder of wb.rb, the Women and Non-Binary Ruby Community. We're having a little dinner tonight at Cafe Coyote from 6:00 PM to 8:00 PM.

00:01:58.120 If you're a woman or a non-binary person, please come join us. You can find more information in the Slack if you'd like to join.

00:02:01.599 So, we can think of our Ruby applications as cars. We might have seen metaphors like this in the past, and like most car metaphors, we may not think much about what goes on under the hood.

00:02:11.440 The smoke here isn't due to something broken but rather that there are some really interesting processes occurring under the hood that we may not have considered regarding our Ruby programs.

00:02:30.160 While we know our Ruby code exists, underneath that there's also bytecode and parse trees that we should understand. Today, we'll discuss all three elements and their relation to the concept of 'popping'.

00:02:45.639 When we run our Ruby code, the first step is that it goes through the parser and becomes what's called an abstract syntax tree (AST). This is a representation of our programs in tree form.

00:02:54.959 So, how do we represent our programs as a tree? Let’s look at an example. If we have a very simple class named Conference, it has one method.

00:03:10.080 At a high level, we can see the structure of this tree, and you can check this for any Ruby file you have by passing a parse tree to Ruby's 'd-dump' command.

00:03:19.800 You'll see the entire parse tree for any Ruby code. Looking again at our Conference class, the first node we get will typically be the class node.

00:03:39.520 It will have several child nodes depending on the type of node, including locations for instance variables. A class node has a few children; one of them is its name, 'Conference', which Ruby refers to as a constant node.

00:03:55.159 Another child node is the method definition, containing additional child nodes that include things like arguments for methods and various data types.

00:04:08.680 This abstract syntax tree, or parse tree, represents all the code—every line of code needs to be in this tree.

00:04:22.120 You might wonder why we can’t optimize out parts of the code at this first step and simplify it to make the parse tree smaller or faster.

00:04:41.800 The answer is, running the code isn’t the only function being performed on this parse tree; there are other consumers of the parse tree whose goals aren’t solely to run the program.

00:04:56.000 For example, tools like LSPs and linters like RuboCop critically require the full tree to operate correctly.

00:05:12.080 Additionally, you may have heard of Prism, a new Ruby parser which was once called YARP. It offers similar functionality but represents the data in a different manner.

00:05:27.760 Prism creates its own abstract syntax tree, using the same information but with a slightly different representation. An example of this is that an empty args node in the old parser translates to nothing in Prism.

00:05:44.320 There are clearer naming conventions in Prism, for instance, changing 'colon node' to 'constant read node'.

00:06:01.000 You may wonder why we created a new Ruby parser when the old one works perfectly fine. The motivation for Prism includes improvements in error tolerance, which allows the parser to recover from errors without failing.

00:06:16.720 To illustrate, here’s an example of a Ruby class with several syntax errors. Rather than crashing, Prism would provide clear error messages telling you what's wrong, rather than generic syntax error messages.

00:06:35.560 In Prism, each error message explicitly states what was expected, making it easier for developers to understand how to fix them.

00:06:55.239 Prism is available as a gem and many projects are already integrating with it, hoping it will be included in Ruby 3.3.

00:07:12.720 The code then progresses from the parser to becoming bytecode by running through the compiler.

00:07:28.239 Prism uses its own AST, meaning that the existing compiler needs modification, as it expects different node types.

00:07:35.680 My recent work has focused on developing a compiler for Prism that generates bytecode that the virtual machine can execute.

00:07:50.239 The virtual machine is known as YARV, or Yet Another Ruby Virtual Machine, which explains the origin of Prism's name.

00:08:00.559 YARV is a stack-based virtual machine, and to understand that, we can run a simple Ruby program, such as 2 + 3.

00:08:19.639 The instructions for this will involve putting numbers on the stack and calling methods that pull values off the stack, illustrating the popping concept.

00:08:36.960 You might have heard me mention the word 'pop' frequently; we put objects onto the stack and then pop them off. This will tie back to the talk’s central theme—popping.

00:08:49.680 In the Conference example, you see instructions like 'put nil', which represents putting a nil value onto the stack, followed by other 'put' instructions related to placing values on the stack.

00:09:08.080 I’d like to focus on one instance where a string is put onto the stack and immediately popped off, thus not contributing anything to the instruction sequence.

00:09:26.960 When we enhance the program to include additional lines, the unused string is effectively removed during the compilation, contributing to optimized performance.

00:09:44.640 This concept of popping suggests that unnecessary instructions can be skipped, thus improving efficiency. The compiler can omit instructions like putting a string if it’s determined as not needed.

00:10:00.440 In my example program, after adding an integer, we observe that the compiler skips instructions for 'put string', effectively enhancing performance by not generating unnecessary bytecode.

00:10:14.960 The optimization occurs because once the program determines a value won’t be used, it pops it off the stack before it can become part of the bytecode.

00:10:30.560 To emphasize this, if we're developing with Ruby and an instruction can be determined as unused, we can eliminate it before compilation.

00:10:44.960 It's essential to recognize the distinction between things that can or cannot be popped. For instance, if the program or method is structured to return a value, it can't be popped.

00:11:05.720 Examples include strings within return statements, method definitions needing to return values, or local variable assignments that require preserving original data.

00:11:19.160 On the other hand, literals, symbols, or objects that aren't tied to any return behavior can, indeed, be optimized out.

00:11:30.720 To demonstrate this further, I’ll provide additional instances of what can successfully be popped—like strings, numbers, or arrays with constants.

00:11:45.800 The key behind them is that they are contextually unneeded after the execution; thus, the compiler optimizes them away.

00:12:00.160 Overall, I hope that gives you a well-rounded understanding of the concept of popping in C Ruby.

00:12:16.760 It turns out there’s a lot going on behind the scenes, making programs faster and more efficient—knowledge worth engaging with and developing further.

00:12:35.680 If you're interested in furthering your understanding, please connect with folks who can guide you, including those involved in various projects and discussions.

00:12:50.160 I have sprinkled several resources throughout this talk for you to engage with these concepts further. Thank you for your attention!