Debugging Rails Itself

by Sean Griffin

The video titled "Debugging Rails Itself" presented by Sean Griffin at RailsConf 2018 explores the complexities and processes involved in debugging the Rails framework, which has been contributed to by around 5000 developers. The talk focuses on the various steps one must undertake to diagnose and fix bugs within Rails, showcasing the intricacies of navigating a large legacy codebase.

Key points discussed include:
- Understanding Bugs: Griffin begins by defining what constitutes a bug, emphasizing the difference between observed and intended behavior. He illustrates that not all surprises in behavior are bugs; if it is documented, it is expected behavior.
- Initial Steps: The first step involves verifying if the bug resides in Rails or is an application-specific issue. Documentation must be checked to confirm discrepancies.
- Creating Reproducible Test Cases: A critical step in bug reporting is the requirement for a minimal, isolated test case that demonstrates the issue. Griffin explains how to create an inline gem file pointing to the Rails version relevant to the bug report and emphasizes the necessity of not including other gems.
- Debugging Techniques: Griffin shares various debugging techniques once the bug is confirmed, such as using Git bisect to find the commit introducing the bug. He also discusses using Git context to understand the history of changes to specific lines in the code, thus providing context rather than blame.
- Fixing the Bug: After identifying the problematic code, Griffin suggests that fixing the bug involves understanding the context of the previous code and the initial intentions behind its implementation. He encourages collaboration and suggests taking breaks when stuck.

- Encouragement to Contribute: A significant takeaway is that anyone can contribute to Rails. The speaker reassures viewers that the skills required are much the same as those used in their applications, thus demystifying the process of working on a large codebase.

In conclusion, Griffin's talk demystifies the complexities of debugging in Rails by breaking down the process into manageable steps and encouraging more developers to engage with the Rails community. The overall message emphasizes that tackling bugs in a framework, regardless of its size, can be approached using familiar techniques and tools.

00:00:11 Hello everybody, it's story time. Let me tell you a tale. It's 2 a.m. and you get a page; production is down. You're searching frantically for a bug in your code, but you find nothing.

00:00:18 Slowly, a realization begins to creep over you: the bug isn't in your code. You will need to venture into the belly of the beast. It's time to enter the lion's den; it's time for debugging Rails itself.

00:00:36 My name is Sean Griffin. I am a tenx hacker ninja guru at Shopify. Please tweet at me while I'm giving this talk; I would appreciate it very much. I recently moved back to America after living in Canada for a few years, and I work remotely now.

00:00:50 I find it a lot harder to find somebody to pair with, so to remedy that, my wife and I made a pair for me. Her name's Ruby, and she's the youngest attendee that RailsConf has ever had.

00:01:09 If you don't like baby pictures and dad jokes, boy, you are just not in the right talk. Today, my pair and I are going to walk you through the exact process that we use when we think we have a bug in Rails.

00:01:27 First of all, I want you to know that anybody can work on Rails. Big open-source projects like Rails can seem intimidating; they are complicated and not terribly well-factored.

00:01:41 Everything includes 15 million modules, sometimes twice, and so it can often appear to be something that the average programmer wouldn't be able to comprehend. But I'm here to tell you: anybody can work on Rails. I know it can be surprising to hear. It's just not as scary as you think. It's just a large legacy codebase that's had far too many people touching it.

00:02:02 Most of the techniques that we use to debug Rails are the same techniques that you would use in your application. The first step in this whole process is to determine if we even have a bug in Rails, or if it's a bug in our application, or if it's a bug in another gem.

00:02:14 We really have to answer: what even is a bug? So we should probably define what the term 'bug' actually means. The term originates from the days of vacuum tube computers. Sometimes a bug would fly into the tubes, causing the system to malfunction.

00:02:35 So, there's a bug in the system used to literally mean that a physical bug was located somewhere inside your computer. Unless you're running Rails on a vacuum tube system that has a fly in it, then you don't have a bug in Rails.

00:02:59 If you want an actually useful answer, I generally go with the definition of a difference between the observed behavior and the intended behavior of a system.

00:03:13 One thing that's important to understand is that the intended behavior doesn't necessarily always mean the behavior that you would expect. Saying this violates the principle of least surprise does not make it a bug.

00:03:29 I'm going to immediately ask you 'least surprising to who?' because it's a subjective assessment. Generally, we won't make breaking changes because somebody found the API surprising.

00:03:49 A lot of bugs will crop up from the usage of a feature that we didn't anticipate or some interaction between two features that we didn't test for. When these sorts of bugs come in, it can often be difficult to assess whether there are bugs or not.

00:04:09 Sometimes the answer is just going to be 'no, this is not supported usage.' I'm sorry, but other times we do eventually change how we intend to feature.

00:04:29 Active Record enums are a great example of this. One of the things that you could have almost always been able to do was have an enum that was backed by a non-integer column. The feature was designed very specifically to be backed by an integer column.

00:04:57 For every single value, it was mapped to an integer. That's what we could most easily automatically generate. But we also gave you the ability to specify for each new variant what the actual value is.

00:05:16 There was nothing to stop you from declaring the value to be a string. If the backing column was a string column, that would just sort of work, but it was never officially supported.

00:05:33 In Rails 5, there may have been a big rewrite of that feature to use the underpinnings of what eventually became the attributes API. Because this wasn't a supported usage, we didn't have tests for it.

00:05:56 The type that we created for this inherited from the integer type, and so the inversion broke string columns. When the bug report came in, our initial inclination was: "we kind of wanted to make it work, but it's just never been supported, so let's close it."

00:06:16 But then ultimately we decided that this was a reasonable thing for people to expect to be able to do. The easiest way to determine if something is a bug or not is to check the documentation.

00:06:34 If the behavior differs from what's documented, then that's 100% a bug. It might be a bug in the documentation, but there is a bug somewhere.

00:07:00 Keep in mind though, if the method you're calling doesn't appear in our API documentation, it's not part of our public API.

00:07:14 Rails does not define public API as public visibility in Ruby. We define it as what appears in our API documentation.

00:07:28 Alright, so we think it's a bug. What's next? Well, buckle up and put on your serious business glasses. We need a reproducible test case. This is something we insist on having for all issues opened on Rails.

00:07:49 You can find templates for these in the Rails guides. We have one for each of our gems. This is what the one for Active Record looks like.

00:08:06 The first thing we do is create an inline gem file to point at the version of Rails that we're reporting for. This is intended to be a single-file gem. We use an API to put this in that single file.

00:08:23 Now we have two templates: one for the latest release and one for Rails master. I usually just like to point it at the master for these because there's not a ton of value in reporting a bug that's already been fixed.

00:08:50 Even if I am specifically reporting it for an older version of Rails, I'll usually point it at, for example, the 5.2 stable branch instead of the actual 5.2 release, to make sure there isn't just a fix on the branch that hasn't been released yet.

00:09:08 The next thing we do is create some minimal isolated environment for that gem. This will look very different depending on which gem you're reporting a bug in.

00:09:31 For Active Record, this involves creating a database connection and a minimal schema, and maybe declaring a few models. We're using SQLite for the template because most bug reports won't be specific to any single database.

00:09:48 Not requiring people to have an existing installation of Postgres with credentials is a good thing.

00:10:07 Finally, we have the test in this template, or in your reports. It doesn't have to be a literal test; it can just be some print statements and calling a few lines of code. What's important is that the output of this script needs to be something that we can use to easily identify whether the bug is occurring or not.

00:10:26 The test also does not have to be a single file. It's completely fine to just have a full Rails app that we can clone down to reproduce the issue. We prefer the single file if you can do it that way.

00:10:48 Regardless of whether you use this single file executable test case or a full Rails app, what's important is that the bug needs to be demonstrated with no gems other than Rails.

00:11:08 The reproduction script serves two purposes: first, it makes tracking down the bug infinitely easier since we have an easy way to verify whether it's fixed or not. Moreover, this forces people reporting issues to actually verify that the bug they are reporting is a bug in Rails.

00:11:27 Not a bug in their application or a bug in another gem. We will reject the report if the gem file contains gems other than Rails or SQLite, which are the things you need to use Rails.

00:11:47 If the test case has private APIs, that’s another reason we will reject these reports. Bugs are just differences in behavior in public APIs only. You'd be surprised how many reports come in.

00:12:10 We ask for a reproduction script, and then the issue gets closed because, as they wrote the reproduction script, they realized, 'Oh, I'm sorry, this isn't a bug in Rails.'

00:12:28 Okay, so we know we have a bug. Let me know it's in Rails. What's next? Before we fix it, there are two things I want to find: I want to find the code that's causing the bug, the line that we eventually need to change, and then I also want to find the commit that introduced the bug.

00:12:48 So put your serious business glasses back on; it's time to go bug on table. We can find the commit and the problem code in either order, but if at all possible, I prefer to find the commit first.

00:13:07 It's generally easier to do, and there's a good process to find it as long as we know the code was working in some previous version of Rails.

00:13:27 To find the first commit that introduced the bug, we're going to use git bisect, the greatest tool that was ever gifted to us unworthy programmers. If we know a commit where the bug happens, then a commit where the bug doesn't happen, and we have a test script that we can run, bisect will tell us the commit that introduced the issue relatively quickly.

00:13:54 If we're going to be bisecting, the first thing we need to do is take our test scripts and point them at a local checkout of Rails, where Rails has internal dependencies like Active Record.

00:14:23 Even though now, that is actually just in the main repository, up until 5.2, Active Record lived in a different repo, and oftentimes any given commit of Rails will also only work with a specific commit of Active Record.

00:14:42 It's not just enough to have the inline gem file point at your local checkout of Rails. We want to use the gem file that's in the Rails repository so that we know that any other internal dependencies of Rails are also pointed at the right commit.

00:15:00 This is how you do it, and this is the basic process you'll use to bisect an issue. You go to a commit that has the bug; I think master is usually a perfectly fine place to start.

00:15:19 Then you go to a commit that you know doesn't have the bug. If I don't know for sure which version of Rails the issue first appeared in, I'll just go back to 4.2.

00:15:34 Even if 5.1 was the last working version, bisecting all the way back to 4.2 actually only adds one or two extra commits that you have to test.

00:15:54 From there, it's going to take you through a binary search of the commits between those two points. That means that's going to take you to the commit that is right in the middle of them, and ask you if the bug is occurring or not.

00:16:19 If it's occurring, it's going to take you halfway through the half that's left, and so on and so forth. Usually, it will take about seven steps for it to be able to tell you exactly which commit introduced the issue.

00:16:37 If you don't feel that you can fix the bug at this point, that's fine. If this is all you can contribute, please do that. This is one of the most helpful things a new contributor to Rails can do.

00:16:58 If I go to an issue and there's a comment with a reproduction script, and then there's another comment that bisected and says, 'This is the commit that introduced the issue,' that makes my job significantly easier.

00:17:20 Thanks a lot! The issue was able to get fixed in a few days, as opposed to I mean it would have gotten fixed eventually.

00:17:39 But just booting it makes people also much more likely to pick it up, as there's less work for us to do.

00:17:58 This is great if somebody opens up a bug report and says, 'Yeah, this worked in 5.1 and doesn't work in 5.2,' but what happens when we don't know what version—when we don't have a version where this worked?

00:18:18 What happens if the last version where this worked is so old that our test script doesn't actually run against it?

00:18:34 Well then, it's time to put on your legacy code hat and Ruby like addresses when she is working on legacy code. We're just going to have to do some old-school debugging here.

00:18:54 Now I'm a dinosaur debugger; a lot of you are going to be rolling your eyes at me and saying that you could just do all of this with Pry way faster, and that's probably true.

00:19:12 If you are good with Pry, then you should continue to use Pry. My strategies are not as efficient because it is very hard to fight decade-plus of muscle memory of just adding more print statements.

00:19:31 But here's a list of things I like to put in print statements. When I first started working on Rails, I knew nothing about its internals or how anything worked, so I'm going to kind of give you some examples of how I went about fixing bugs.

00:19:50 I would start from whatever method is getting called in the tests and work inwards from there. So, to do that, we're going to call the method method to get me a method object for our method.

00:20:09 It's a useful method of debugging when all you have is a method name. Some objects override the method method. For example, a Rack request will return GET, POST, or whatever the request method is.

00:20:27 Some Active Record models have a column called method. Somebody actually had that in a reproduction script they sent in once, and I spent longer than I should have trying to figure out why the method method was just telling me I had the wrong number of arguments.

00:20:48 In these cases, we can still call the method method, but we have to use this method instead. We get an unbound instance method object for the method method, call the bind method, and then call our method.

00:21:04 It helps to be methodical when working with the method method.

00:21:23 Once you have a method object, you can call useful methods like source location to get the file and line number where the method was defined. If this method returns nil, that means that the method was defined in C.

00:21:45 What you are looking at is not a bug in Rails because Rails does not have C code—or at least that method is not part of the bug.

00:21:57 You can also call the super method to get the method that super would be calling in that method, which is very useful if the thing that you're looking at is 'save' which is defined 20 times.

00:22:12 When you're using this form of debugging, it's important to tunnel-vision yourself on the problem at hand. You don't have to understand how the entire system works to figure out how to fix a single bug.

00:22:32 For example, if you are looking for a bug in callbacks and you're digging in through save because that's the method that got called, when you get to the definition of save that has to do with pessimistic locking of Active Record models, it's probably safe to assume that’s not where the bug is actually occurring.

00:22:50 Just move on from there. You don't have to try and understand every single thing you see if it looks like it's probably unrelated to the bug that you're trying to fix.

00:23:05 This isn't always the best place to start. If the bug is very clearly related to callbacks, for example, it's often much easier to just see what called the callback rather than starting at save.

00:23:27 In this case, I often just pass a kaput scholar in the callback, and I'll show you the file and line numbers of every caller in the stack at that point.

00:23:50 Or if this is something that's getting called a lot and you just want the output of the first time it was called, you can stick arrays in there.

00:24:02 Which will also give you a call set. Finally, the 'p' method is one of the most useful for this sort of debugging.

00:24:20 It calls inspect on its argument, prints that out, and then just returns it. You can just sort of litter all over your code without having to restructure anything.

00:24:41 One thing to keep in mind with 'p' though is that inspect can have side effects, and this can sometimes cause bugs to disappear. Shoutout to obscure MRI bugs!

00:25:00 I had a recent issue where calling String#start_with was hanging forever, and calling inspect on an unrelated class somehow made that not happen.

00:25:23 The most well-known instance of inspect having side effects is if you call inspect on an Active Record model, that will show you all of its attributes.

00:25:43 And that causes typecasting to occur. We've had bugs where things were getting mutated that shouldn't have been, and the act of typecasting caused that to change.

00:26:01 So, once we've tracked down the problematic code this way, we still want to find the commit that introduced it. To do this, we're going to use a tool called git blame; except we're not gonna do that because this isn't about assigning blame; it's about additional context.

00:26:26 So we're going to be using git context instead. Blaming people is terrible, and you should not blame people.

00:26:43 If you would like to stop blaming people, this is some code you can stick in your global git config, and you will never have to blame people again.

00:27:01 So git context does is show you the last commit that changed each line in the file.

00:27:14 Sometimes the last commit that changed the line will not be the commit that introduced the bug. Sometimes it will be a formatting change, or it will have moved something to a new file.

00:27:34 We try to avoid these sorts of changes in Rails since we use git context so frequently, but sometimes they do appear.

00:27:54 In this case, we have to do what's called a recontext for that commit, which is where we call git context on the parent of that commit.

00:28:19 If you use Vim, the Fugitive plugin that most people use for accessing Git from Vim has some really helpful shortcuts for this.

00:28:41 If you hover over a commit and then you press tilde, you will recontext it on the parent. GitHub also has a really nice UI for this.

00:29:03 I typically just open the commits in GitHub whenever I need to do this.

00:29:23 So you might be wondering why we even care about the commit that introduced the bug.

00:29:41 The answer comes back to the definition I used for a bug earlier: it's when the observed behavior is different from the intended behavior.

00:30:00 This means that we need to know what the intention was. Even if we are fairly certain that we do have a bug and we don't care what they were intending in this commit.

00:30:17 Seeing the intent behind the code that broke it is still very useful for writing a fix. One of the things that we want to make sure is that we do not regress whatever they were trying to fix when they broke this code in the first place.

00:30:40 Hopefully, the test suite will catch that, but legacy codebases aren't always as well tested as you would like. Sometimes finding the commit is less helpful.

00:30:57 If you're lucky, it'll be a very long commit message that talks about why they are doing it the way they are, what their intentions were, and what was on their mind.

00:31:15 This is why I always advocate for people writing very long commit messages.

00:31:34 If you're less lucky, the commit will say, 'changed stuff.' If you're really unlucky, the commit will say, 'initial.' So now that we've tracked down the code that's the problem and the commit that introduced it, it's time to fix the bug.

00:31:51 And I actually don't have much advice to give on this one.

00:32:05 The process of fixing the bug becomes apparent fairly quickly. Not always, but what you're going to do is going to be incredibly dependent on the bug you're trying to fix.

00:32:24 I know this can be not a great thing to hear, but there's no set solution for how to fix a bug in Rails.

00:32:38 What I can tell you is that the process of fixing the bug is no different than any other legacy codebase. You try and understand the code path that's the problem, try to understand the problem, and hope that the fix becomes apparent.

00:32:59 And sometimes that just takes a lot of time. Sometimes you'll feel stuck, like Ruby when these situations occur.

00:33:16 Take a break! Maybe sleep on it, go home, play some video games, come back at it with a fresh set of eyes.

00:33:29 Another thing I like to do is find a pair. Having someone else come help you work on your code can really change your context on a problem.

00:33:46 It's worth going through the struggles when it starts to feel rough because when you finally fix it, it feels like this. I know a lot of you might be saying: this is not about anything specific to Rails!

00:34:09 Almost everything you've said can apply to any large legacy codebase, and yeah, that's all Rails is: it's just a large legacy codebase.

00:34:25 Anyone can work on Rails. We just use the same techniques we use in our applications on a framework that's used by more people.

00:34:41 The same tools, the same process, the same strategies they all work. You just have to get past the fear of open source.

00:35:00 If you want more Ruby, if this one's not enough, baby pictures for you, my wife is also giving a talk that's going to be in room 315 at 2:30 p.m. I highly recommend it.

00:35:18 Having been married to her, I had the privilege of seeing it early, and it's a good talk.

00:35:31 I want to thank Shopify for sending me out here, even though none of you made it because you got buried in snow.

00:35:48 Thanks to the Diesel Corps team for watching Diesel's channel, I'm at a conference, and I know you all are watching on the livestream, so hi guys!

00:36:06 I have some Ruby stickers; if you would like a Ruby sticker, come get one afterwards. I will be taking questions down off the stage as soon as I get off, so come ask me questions if you have any.

00:36:25 This is my contact information if you'd like to reach out. Thank you very much.

00:36:45 Thank you!