Lexing

Summarized using AI

Beware the Dreaded Dead End!!

Richard Schneeman • November 08, 2022 • Denver, CO

In the talk titled "Beware the Dreaded Dead End!!" presented by Richard Schneeman at RubyConf 2021, the speaker addresses a common pain point for Ruby developers—syntax errors—and introduces the deadend library, designed to improve error handling. The main topic revolves around understanding syntax errors, how they disrupt program execution, and the innovative ways deadend can assist developers in identifying and resolving these issues.

Key points discussed include:

- Understanding Syntax Errors: Syntax errors can be daunting because they often indicate problems that are not exactly where the error is reported. Developers frequently encounter vague messages that do not direct them to the actual source of the error, leading to frustration.

- Demo of Dead_End: The library provides a method to transform invalid Ruby code into something manageable. Richard illustrates how deadend identifies a syntax error in a piece of code, uses lexical parsing, and manipulates the abstract syntax tree to pinpoint the actual missing components, thereby providing clearer feedback.
- **Core Functionality of Dead
End: The speaker explains how the dead_end library utilizes a parser from Ruby, called Ripper, to evaluate code and determine syntax validity. It can identify missing keywords, extraneous ends, and other common mistakes that lead to syntax errors.

- **Algorithm and Internal Logic
: The logic behind deadend's algorithm is outlined, including how it handles incorrect indentation and ambiguous cases. The library employs structured approaches to search through code blocks, ensuring multiple syntax errors can be identified at once.
- Importance of Community and Contribution: Richard emphasizes the role of community contributions and the importance of user experience in developer tools, drawing comparisons to languages like Rust that prioritize clarity in error messages.

- Possible Future Integrations: He mentions future plans to integrate dead
end into Ruby core, potentially broadening its accessibility and usefulness in Ruby development.

The talk concludes with a reminder of the significance of enhancing developer tooling and recognizing that syntax errors, rather than being an end in programming, can be viewed as an opportunity for improvement and learning. The overall takeaway is that the developer community has the potential to evolve their tools to make programming more intuitive and error-friendly, thereby changing how developers perceive and handle syntax errors.

Beware the Dreaded Dead End!!
Richard Schneeman • November 08, 2022 • Denver, CO

Nothing stops a program from executing quite as fast as a syntax error. After years of “unexpected end” in my dev life, I decided to “do” something about it. In this talk we'll cover lexing, parsing, and indentation informed syntax tree search that power that dead_end Ruby library.

RubyConf 2021

00:00:11.759 Hello everyone, my name is Richard Schneeman, and I want to talk to you about one of the scariest things a Ruby programmer can face.
00:00:25.359 Wait a second, do you hear that? Everyone run! Oh no, oh no! There are more of them! Okay, wow, they followed me from RubyCage!
00:00:31.359 I can't believe it! Two conferences in a row! Who would have believed it? Wow, what a coincidence!
00:00:41.680 So, if you've seen that talk, know that there is a lot of new content in this one, and you will not be disappointed.
00:00:47.520 And while dinosaurs are scary, there’s something that’s even scarier: syntax errors. Okay, don’t laugh. Syntax errors are scary.
00:00:59.600 Just look at this unexpected syntax error. Okay, it’s horrifying! Where’s the problem? Here’s some different code. Wanna guess where Ruby thinks the problem is? Yeah, last line? Nope, wrong! This is frustrating! I am frustrated!
00:01:16.880 What if we had something better? What if we could take this and turn it into something that tells us exactly where the issue lies? It looks like it's missing a 'do', okay?
00:01:22.960 Have you ever seen a cooking show? They show you the final product. I want to show off Dead End in action.
00:01:28.400 Here’s a demo of how it finds syntax errors. Here’s a real document with a real syntax error: the last line, of course. And then this symbol up here means that the entire document cannot parse.
00:01:39.759 So, we're going to transform it until it can.
00:01:44.880 When a valid source code block is found, Dead End safely removes it after each step in the search. Then Dead End re-evaluates the whole document to see if it’s parsable yet. Parsing failed, so we need to keep looking. Still failing. Still failing. Expanding. Still failing. Are you the syntax error? Are you my syntax error? Still failing.
00:02:02.960 Searching for that syntax error... still failing. Now, still can't parse the document. Maybe it's up here? Maybe... no. Yeah, it doesn’t look right. I don’t think it’s in there.
00:02:25.760 Okay, the document is going to be checked, and... oh, okay. Wow! Okay, the parser reports that our minimal document is now parsable. That’s great news! Once this happens, Dead End has found a way to transform our document; it can stop searching and instead focus on the invalid code blocks.
00:02:42.400 This right here is the actual output of Dead End on that file. The issue lies on line number 36—there’s a missing end statement. You can have it today for the low cost of free—just install the gem.
00:02:54.160 By the time you see this slide, we're going to have half a million downloads, probably at least. I am talking with Ruby core to get Dead End integrated directly into Ruby. We are targeting integration for 2022, so not this year, but the next, for the release of Ruby 3.2.
00:03:06.000 This also means that there’s plenty of time for you to give it a shot, give it a spin, and provide me with some feedback. Alright, you saw one example of Dead End finding a syntax error. What else can it do?
00:03:31.040 When you miss a keyword like 'if', 'do', or 'def', Dead End finds the problem. What about a missing 'end' keyword? Can it do that? Yes, it finds the problem.
00:03:41.840 What about a missing curly bracket? Dead End finds the problem. What about a square bracket? Dead End finds that problem too. What about a missing pipe character? Well, Dead End finds that one as well. But have a problem with a missing family member in a Korean thriller drama? No, sorry, Dead End can't assist with that.
00:04:07.680 That was too dark, too deep. So, let’s rewind and try this again. I have a problem with a missing Marvel universe character! Crocodile Loki, come on! Still funny, still good, and still top. It's Crocodile Loki! This year is lasting forever. Oh, that was such a good show.
00:04:19.120 Today, we're going to dig into Dead End. You can actually follow the algorithm yourself by running it with this record CLI flag. You'll see each step along with the annotated source code. That’s actually how I generated those original slides, coming from Dead End. If you don’t want to generate slides, just write a program to do it for you.
00:04:39.199 So today we’re going to talk about syntax errors, lexing, and parsing, and I’ll touch on some of Dead End's internals. But first, who’s this guy letting up on stage? I mean, allowing anyone on stage these days.
00:05:04.320 I go by Schneems on the internet. If you forget how to pronounce my name, you can go to my blog, shneems.com, and click the little play button. I also created Code Triage, which is a platform for learning how to contribute to open source.
00:05:17.840 To date, I’ve helped over 60,000 developers—with the number being around 62,000 now. You can sign up for Code Triage if you want to start contributing to open-source.
00:05:29.039 Speaking of open-source contributions, I’m glad you brought that up. I'm actually working on a book called 'How to Open Source'. You can go to howtoopensource.dev to buy the book as a pre-release; it's not quite ready yet but it focuses on contributing to projects, especially for developers who are unsure how to start, or those who started and got stuck. I've been running Code Triage for years, along with conducting research and interviewing developers.
00:05:57.600 So, this book is kind of like the synthesis of all of that work. The book is at howtoopensource.dev. You can also sign up for codetriage.com, and I’ll email members whenever the book is ready.
00:06:15.360 When I’m not working on open-source, I also like to get paid. Currently, that's happening through Heroku and Salesforce, where I'm working on Salesforce Functions. It’s an easy way to work with data inside of Salesforce using the language you love. If you love Java or JavaScript, you can use it right now, and we will roll out Ruby support later.
00:06:29.120 People also tell me that I am an exceptional programmer, mostly because my programs generate a lot of exceptions. It’s okay; I know what I am. Syntax errors, though, are the main exception we’re going to focus on today.
00:06:40.639 Let's start from the beginning: What is a syntax error, and why are they so hard to understand? Well, this code works wonderfully, comparing 'a' and 'b' all day inside a while loop. When Ruby parses the code, it converts it into an abstract syntax tree, and you can see it now.
00:07:05.199 The tree is beautiful, and Ruby's parser looks upon it with great happiness and purpose. But then a stranger comes upon the land, and the stranger has a secret power. Behold the octothorp! With one key, the stranger transforms the code.
00:07:23.520 A critical line had been commented out, and without that line, the tree is no longer whole. Huge sections of the code are no longer reachable; the code no longer parses. Our parser is sad, and honestly, I’m a little sad too. With that, the stranger leaves, and behind them stands a syntax error.
00:07:59.599 In short, a syntax error occurs when Ruby's parser cannot build a valid parse tree. But why is it difficult to understand? Well, when the parser tries to parse code and finds an error, it will often conceal the actual error, which isn’t always where the developer made a mistake.
00:08:27.280 For instance, here a developer forgot a bracket. As the parser builds the tree, it hides an error because it wasn't expecting a comma. Ruby's parser has several rules it knows about method definitions and what they should look like. When it finds something unexpected, it throws an error.
00:09:06.320 The problem is that the error isn’t caused by the comma. It’s caused by a missing bracket; that’s a major difference. The location of the parse error isn’t always where the developer made the mistake.
00:09:22.880 Here's an example: A developer forgot a space after the 'def'. Ruby believes the error is in the last line because when parsing the module definition, it starts looking for a matching 'end'. When it sees this unexpected character combination, it raises a syntax error. The human result isn't helpful. It doesn't precisely indicate what to delete or change to resolve the issue so parsing errors are different from human errors.
00:10:41.120 Dead End's goal is to turn the parser's problems into something a human can recognize as an issue. How does Dead End work? Well, it uses a library called Ripper. No, it’s not a band name; Ripper is Ruby’s parser that ships with Ruby.
00:11:06.480 You can just require it, and there are no external dependencies for this gem. Ripper can evaluate code and indicate whether there’s a syntax error. We saw the code before. Ripper confirms a syntax error is present, and if we fix it, Ripper will confirm that too.
00:11:37.200 But that’s difficult to do! It can be hard to guess what the developer intended, so we often comment out code we don’t want to run. This is our method; however, that approach didn’t yield any positive results.
00:11:58.000 So instead, Dead End uses indentation and lexical parsing to deconstruct the source code from the outside in. Commenting out lines that do not match removes nodes. But if we reach a point where all the orphaned syntax nodes are gone, we have manually reached a valid state. The output demonstrates the actual issues.
00:12:26.080 It reveals that the 'else' and 'end' are present, but as a developer, it's obvious that there should also be an 'if' present. However, inadvertently commenting and expanding based on indentation can lead to misunderstandings.
00:12:52.960 Let us look deeper into some gotchas. We know the developer has a syntax error, but we can't make uniform assumptions about everything else. If we look at a syntax error caused by a missing 'do', it can lead the parser to identify an extra 'end' below it.
00:14:02.560 Should we only rely on indentation? Well, we’d begin deleting lines which could remove the wrong 'end', thus leading to a valid parse. Why? Because the 'do' matches the other 'end'. This is an example of the complexity this approach can introduce.
00:14:52.160 We can fix this by using lexis output as part of Ripper. The lex output tells us what's in the code—such as our 'do' and 'end'. Using this information is paramount; if we remove specific lines and find keywords, we can identify which blocks can be removed.
00:15:54.000 Even with correct indentation, removing the wrong line may yield a false positive: our syntax error might be on lines that are missing context like a 'def hello'. If we use indentation or comment out properly, we could easily miss the targeted blocks—failing to address the true issue.
00:16:44.960 When we reach syntax with ambiguity, we face errors where the context through which we are examining the code keeps leading us to inadequacies in fixes. This scenario highlights that searching through code must maintain perfect balance, reinforcing that not all errors occur with precise characteristics.
00:18:36.720 For example, Ruby's parser cannot differentiate between a missing 'if' or missing 'end' based solely on errors it encounters. It's tricky and complicates the issue significantly! Knowing that ambiguity exists means we can account for it after searching; our algorithm can still access all pertinent content, allowing ground for flexibility.
00:19:45.440 What’s more, a document can contain multiple syntax errors, with errors piled atop one another. Multiple coding issues can confluence, and we don’t want to evaluate code linearly without a proper pairing which leads us towards real validation, nor should we carelessly show all code indiscriminately.
00:21:00.960 So, how do we handle these multiple errors? We can modify our searching to evaluate multiple pass attempts from both ends towards the middle. This approach helps us locate errors 1 and 2 without failing to find the last. Using this simultaneously gives us flexibility and results.
00:22:06.560 Now that we’ve addressed some technical points, let’s move to artificial intelligence—who knows AI? Raise your hand or type in the chat!
00:22:27.160 Artificial intelligence often refers to algorithms in code. One common example relates to pathfinding. Dead End uses a search algorithm to find our problematic code. This variation is part of uniform cost search, which is sometimes called Dijkstra’s algorithm.
00:23:06.640 Those that want to learn more can benefit from links to informative pages about search processes. In particular, seek out visual representations of search algorithms; they make for fascinating understanding and scope.
00:23:28.880 Now we know a little bit about AI, let’s look at some internals of these algorithms. If Dead End were a cake, we’d start with messy code having syntax errors. We’ll clean it up, tidy it, and present it to our searching algorithm, which is an exhilarating yet complex structure.
00:24:07.280 When syntax errors are found, we'll add context where needed to send detailed feedback back to users. Every app has its own syntax errors; it’s common! So, like every other good library, Dead End utilizes monkey patching.
00:25:19.120 It hooks into the require method, and when a syntax error is raised, conflicts get passed to Dead End. The source code causing the error is read into disk and passed into the search object. However, we do not simply use that raw document.
00:25:57.359 The initial step cleans it up via a cleaned document class handling the various gotchas discussed previously. It clears comments, whitespace, and joins lines of chain methods with trailing slashes or those leading to here-docs.
00:26:32.320 With all this, it converts the cleaned stats into an object type known as 'code line'. Once we have our lines representing the document ready, we pass them to our search class.
00:27:03.520 Next, the code and search class flow smoothly together. This functionality is driven by while loops to explore a frontier which holds all generated code blocks within the source code. The frontier checks if there are remaining problems left to address!
00:27:46.960 If we have highlighted code block lines, we check against the parser to locate remaining issues with the initial document. The decisions here pivot. We expand and adjust based on indentation, only confirming valid ruby code.
00:28:49.680 Results return multiple syntax errors, leading the output towards a detailed illustration. In this final stage, Dead End reveals all adjustments taken from cleaning to searching and formatting!
00:29:09.440 Thank you! I will be here all week, folks.
00:29:16.000 Beyond Dead End, there's another amazing gem called Error Highlight, which shows you which method got a no method error.
00:29:21.520 It was created by Yusuke, known as Mame on GitHub, who gave an excellent talk at a conference I used to run called Keep Ruby Weird. I recommend checking it out!
00:29:48.799 With Error Highlight and Dead End, I also want to touch on the importance of community values when handling errors.
00:30:07.360 I've been writing some Rust code over the past couple of months, and I've observed that their community takes an aggressive stance towards error messages.
00:30:17.600 They not only state there's a problem but also strive to accurately suggest how to fix it. When I opened up an issue, the community tackled it quickly and merged within a month.
00:30:35.040 Though not perfect, this community treats user experience issues as critical bugs. If we invested more energy into our error handling, we could elevate user experience.
00:30:46.640 You can add Dead End to your project and try it out today! Feel free to give feedback on what works or doesn’t work. Hopefully, we can have that finalized before Ruby 3.2 ships.
00:31:06.880 You can also pre-order my book on how to open-source at howtoopensource.dev or sign up to triage issues on Code Triage.
00:31:31.760 Today, we talked about lexing, parsing, syntax errors, AI, and pathfinding! But remember, technical details aren't the most important part.
00:31:42.000 The important part is that everyone sitting in this room is the future of developer tooling. Programming is inherently difficult, but our tools can help us.
00:32:19.200 One of the best ways to judge a system is to see how it fails. Care, grace, and beauty applied to our failure modes create experiences that delight us, teach us, and elevate our code.
00:32:37.200 A syntax error doesn't have to mean the end; it can be the beginning of a beautiful programming story. My name is Richard Schneeman.
00:32:48.880 You may have heard I'm writing a book. You may also have heard I run Code Triage.
00:32:59.039 I want you to go forth and be an exceptional programmer! Bye!
00:33:06.890 You’re still here? It’s over! Go home!
00:33:20.720 Alright, bye!
Explore all talks recorded at RubyConf 2021
+95