Talks

Trolls of 2013

by Ryan Davis

In the video "Trolls of 2013," presented by Ryan Davis at the MountainWest RubyConf 2013, the main focus is on the process of writing an interpreter, specifically a subset of Ruby referred to as Ubie. Throughout the talk, Davis emphasizes the importance and ubiquity of interpreters in programming, explaining that understanding interpreter construction can greatly enhance one's approach to coding. The key points discussed in the video include:

  • Introduction and Concept: Davis introduces the concept of interpreters, highlighting that they exist in many programming languages and the value of learning about them.
  • Ubie Interpreter Overview: He begins by explaining the structure of the Ubie interpreter, which includes fundamental constructs such as local variables, function calls, and branching logic.
  • Parsing and S-expressions: The process involves parsing Ruby source code into S-expressions, which are fundamental for understanding the structure of the code being interpreted. He gives insights into how various node types, like literals and function calls, are processed.
  • Test-Driven Development: Davis employs a test-driven approach to developing the interpreter, creating test cases for expected outcomes, which is a crucial practice in programming.
  • Functionality and Extensions: The talk covers various constructs, including conditionals and loops. Davis explains how to implement 'if' statements and while loops to facilitate iterative processes within the interpreter.
  • Implementation of User-Defined Functions: He illustrates the creation of functions, demonstrating how the Ubie interpreter handles function definitions and recursive calls.
  • Error Handling and Node Types: Throughout the development, Davis addresses errors related to unknown node types and emphasizes understanding these errors to improve the interpreter's functionality.
  • Final Development Stage: By the conclusion of his presentation, he summarizes the capabilities of the Ubie interpreter, which now can handle fundamental types, operations, and user-defined functions, and he encourages further exploration of sophisticated programming concepts.
  • Personal Project Bonus: As a light-hearted conclusion, Davis shares a personal project analyzing English word ligatures and ends with a discussion of his kitten fostering experience.

The main takeaway from Ryan Davis's talk is that learning to write interpreters enhances one's programming skills and understanding of language mechanics, encouraging programmers to explore more complex programming concepts through this lens.

00:00:20.519 Again, thank you for having me. Thank you for sticking it out at the end of the day. My topic is going to be "Trolls of 2013." I'm Ryan Davis from CLRV. This talk follows up on my presentation "Occupy Ruby: Why to Moderate the 1%" that I gave last year at Escadia Ruby. It consists of 130 slides, and I aim to cover them in 30 minutes, which is roughly 4.3 slides per minute. So, I'm going to have to go really fast. Please hold your questions until the end.
00:00:26.119 Actually, the title is quite a bit of fun. What really happened was that I tweeted that I was done with my slides for West, and someone I called out as a troll last year jokingly said, "Let me guess, the title of your talk is 'Trolls of 2013.'" So, for fun, I got together with Mike and decided to troll the troll. Hence, the title of this talk. However, the actual content is about writing an interpreter. What I have here is actually 160 slides, which means I have to condense my content to around three slides per minute. I have loads of code to share, and I was asked to challenge your thinking. I fully intend to do that; however, I will be moving incredibly fast. Much of what I go over will be a pattern that I will revisit numerous times throughout the talk. You don't need to understand everything from the beginning to comprehend the material later, so just bear with me.
00:01:16.720 I was informed a few days ago that my talk was expanded to 45 minutes, but I had prepared for 30, so that is what I will aim to stick to. Now, the real first question is, why would you ever want to do this? I had conversations with some people last night, and we realized that interpreters are everywhere in our daily use. Ruby is an interpreter, Bash is an interpreter, and nearly everything we interact with involves interpreters in one way or another. However, conceptually, this topic is one of those things like Lisp; if you learn it, it alters your brain in useful ways. This interest is not solely because I am a programmer, though I admit it is one of my minority interests. It's really because we can, and because it will change the way you think.
00:02:39.679 Quick show of hands: how many here have a computer science background and have taken either formal language theory or written interpreters in school? This talk may not solely be for you, but I might cover tools that you are unfamiliar with and that you might want to learn. We will be going over the Ubie interpreter, which is a very small subset of Ruby, but it will cover branching logic, while loops, local variables, and the invocation of functions. This is the most complex piece of code I expect you to be able to understand during my talk.
00:03:11.640 In this function, you can see that we have local variables and what we call primitive function calls. These are supported by the underlying runtime. We also have user-defined functions, and in this case, those function calls can be recursive. The roadmap for this process looks something like this: we have the parser, with runtime at the core, then the environment, local variables and conditionals, and finally, functions on top of everything else. All my parsing is done using a gem called Ruby Parser. It takes Ruby source code and parses it into a structure that, while initially confusing, eventually reveals some patterns.
00:04:43.240 When you start parsing, you'll see elements that include an if-node. The if-node contains a condition, a true case, and a false case—each section corresponds to a recursive call. Altogether, if you analyze the structure, you will find that it represents conditional operations. For a quick breakdown of vocabulary: this is called an S-expression. The terminology comes from Lisp, with 'car' representing the type of node we've parsed, and 'cdr' representing the rest, which may include sub-elements. You get all of that for free using the Ruby parser. Parsing can be incredibly complicated and is its own topic; Michael Jackson gave a talk two years ago called "Parsing Expressions in Ruby," which I recommend.
00:05:32.720 Now, let's discuss the interpretation process, which is the core of the runtime. How does one get a result like seven from the expression 3 plus 4? The first step is to start with the source, and you ask the parser to parse it. The output is an S-expression. You then process it by its type, working from the inner values outward. This reflects how most programming languages work, particularly Ruby. Most languages use a method called 'application order evaluation;' however, some languages opt for different methods, allowing for greater flexibility and more complex macro systems.
00:06:44.280 In the case of the expression, we would first interpret the inner expression, identifying the values 3 and 4. Then, we would evaluate the outer expression, processing the addition operation. While this is just one interpretation method, for brevity, I will demonstrate these principles using Ruby to define and implement much of the semantics of this language. This approach is known as a meta-circular interpreter.
00:07:34.079 The S-expression processor is responsible for dispatching processing methods based upon node types. For instance, when we encounter a literal node, we call the 'process_lit' method. We're also applying test-driven development principles to interpretive processes, which involves implementing unit tests that take an input source and an expected value, asserting equality. There’s nothing particularly magical about this approach; it’s just how I would handle most projects. For simplicity's sake, I’ll refactor this to use an asserting method called 'assert_eval' that tests our expressions directly.
00:08:59.560 The first thing we want to do is get our sanity test working. We start by instantiating a Ubie interpreter instance, then define our first test case, which is relatively straightforward. We expect the interpreter to return the value of three when we input it. If we then parse the expression '3 + 4', we expect the output to be seven. This standard test case is common among programming languages when initializing a new interpreter. However, running the test results in an 'uninitialized constant UbieInterpreter' error.
00:10:50.320 Since we haven't implemented anything in our interpreter yet, it raises an error, thus we subclass from S-expression interpreter and name it the Ubie interpreter. To proceed, we establish our evaluation method. Concerning method definitions, I’ve chosen to name the evaluation method 'eval.' However, since this name is a common Ruby method, we encounter a private method error instead of a standard no-method error, but that’s an acceptable outcome.
00:12:13.400 Next, we delve into implementing aspects of the Ubie interpreter. The 'eval' method is crucial, as it takes the source code and passes it off to the parser. Afterward, the parsed S-expression is passed to the 'sexp_process' method, inherited from the S-expression interpreter. At this stage in the process, we encounter the 'unknown node type: lit' error. This indicates that we've stumbled upon a node that the interpreter hasn't defined yet; thus we go ahead and raise this as an error. We've talked about 'process_lit' previously, so now let’s implement it.
00:13:51.920 Simultaneously, we address 'unknown node type: call,' which is more complex. Instead of overlooking this error, we'll inspect the S-expression closely. We discover aspects such as the node type for the call, the receiver, the message involved in the call, and the arguments present. Employing multiple assignment, we can extract components from our S-expression into named variables for easier tracking, focusing particularly on those key components necessary for our operations.
00:15:22.680 Next, we implement the processing of these S-expressions. Defining the receiver is integral; therefore, we conduct a recursive evaluation of the receiver field. Meanwhile, we iteratively map through arguments to apply the recursive processing of those values as well. After solidifying this framework, we can finally call the receiver and send it the resulting arguments. This part of the implementation takes shape quite rapidly, setting an important precedent as we build out our evaluator.
00:16:56.680 Having instantiated an S-expression interpreter and defined our evaluator, we begin testing our inputs to ensure proper functionality. It's important to realize that up until this point, we’ve essentially crafted a rudimentary calculator. To expand beyond that, we’ll need to integrate conditionals into our code, which will involve defining the truthiness of expressions. In this case, we will assess expressions for truthy or falsy conditions.
00:18:55.519 In Ruby, we recognize that 'nil' and 'false' are falsy, while everything else is truthy. Testing for these conditions, the initial execution uncovers another error regarding unknown node types, so we proceed to define it generically. By carefully unpacking elements within our expressions, we can assist the interpreter in identifying conditions and evaluating accordingly. This will ultimately allow us to enable the interpreter to handle 'if' statements with greater accuracy.
00:20:24.480 As we continue, we will develop our approach to managing 'if' statements correctly. When processing an if-statement, we prioritize evaluating the condition before proceeding to handle the true or false cases. Understanding this order of operations ensures our interpreter functions intuitively. Similarly, once we've established that our conditions of 'true' and 'nil' work correctly, we can move on to fleshing out additional error handling for any unknown node types that may arise.
00:21:45.840 Continuing with the test-driven approach, as we encounter errors indicative of the interpreter's uninitialized states, we can broadly categorize these issues and implement solutions to mitigate them. Following a repeatable pattern of understanding the error, fleshing out the identified node type, and subsequently processing these node types, allows us to have a fast-paced yet productive development cycle. Utilizing tools such as autotest to continuously verify if any changes are successful reinforces the reliability of this interpreter model.
00:23:41.920 Now we've reached a critical juncture: we have defined fundamental constructs for our interpreter. As I tie these principles together, I’ll introduce while-loops, focusing on how to implement a loop that will iterate under specific conditions, much like in Ruby. Analyzing the flow of our code will allow us to maintain control over how the interpreter identifies conditions and manages code execution.
00:25:29.440 Ultimately, through creating conditions and loops, we shape our interpreter's functionality to handle more robust constructs, such as while-loops that encapsulate valuable logic for iterative processes. We've outlined structural requirements, built on foundational elements, and defined branching logic within our language framework. Going further, we will introduce function definitions, establishing a more sophisticated interaction dynamic.
00:27:04.160 Now, we will construct a simple function definition. For example, we will define a function called 'double' to take an argument and return the doubled value. When we call 'double 21', we expect to receive 42 in return. As each call executes, the necessary logic will be applied, maintaining the structure we’ve defined. Throughout these construction processes, we will leverage our principles of variable assignment and environment management to facilitate seamless recursive interactions.
00:28:09.720 With our foundational pieces in place, we can expand our implementation by refining it to incorporate Fibonacci calculations, enabling us to compute the sum of Fibonacci numbers through functional definitions and recursive calls. We’ve encountered challenges; for instance, recognizing an 'unknown node type' error as we delve deeper expedites our understanding of node structure, allowing the interpreter to manage simple expressions and variable definitions more precisely.
00:29:00.560 The next steps toward enhancing our interpreter will involve constructing a more intricate environment class, capable of supporting recursive calls while preserving unique contexts across multiple invocations. Each function call will maintain its unique scope while ensuring that the previous values remain accessible, as established through our previously defined environment and scope management.
00:30:30.600 Wrapping this all together, we’ve now implemented the Ubie language to support fundamental types and operations, including numeric types, conditional branching, and local variable handling. The interpreter has evolved to allow for user-defined functions while implementing arrays, hashes, and stack management. Upon successfully determining the initial code implementation in around 130 lines, we can condense our tests down to approximately 70 lines.
00:31:25.680 Additionally, I encourage further experimentation with additional functionalities; Perhaps exploring richer types or different scopes may provide better learning opportunities. Ultimately, guiding forward with linguistically-driven materials, such as books like "Structure and Interpretation of Computer Programs" and "The Little Schemer," may enhance your understanding of interpretive processes.
00:32:30.000 Before concluding, if you have any questions, feel free to approach me afterward, as I will be available in the hallway.
00:34:00.000 Thank you for your attention, and as an improvised bonus round, I want to share a personal project: I created a Ruby program that analyzes ligatures in English words! The results surprised me; for instance, the word "anti-product" made it to the top five. On that whimsical note, thank you once again, and as I wrap up, here’s a glimpse of my favorite kittens from my last fostering batch.
00:36:30.000 I look forward to your questions!