Deletion Driven Development: Code to Delete Code!

00:00:15.190 Welcome to a talk that I like to call 'Deletion Driven Development.' My name is Chris Arcand. Here's where you can find me on Twitter and GitHub. I'm a really social person and love meeting new friends at conferences, so be sure to say hello. My username everywhere is just Chris Arcand.

00:00:21.290 We are here in the lovely city of Cincinnati, Ohio. I've never been here before, but I'm really enjoying the week, especially the giant plates of cheese with a bit of chili on them—they're really good. I hail from northwest Minnesota, right on the Canadian border. Minnesota goes by a bunch of names that you've probably heard. One is the Land of 10,000 Lakes, and another is the North Star State. It's a super cold place. If you're from Canada, you might know it as the great white south.

00:01:10.220 If you're at a Ruby conference like this one, you might vaguely remember it as that one place where those JRuby guys live, right? Well, I live in the Twin Cities of Minneapolis and St. Paul. We have absolutely gorgeous summers there, with beautiful forests and lakes to enjoy. In the winter, we love playing hockey, and the winters often look as beautiful as they do in pictures. However, if you've been there, you know that I'm lying; it often looks a lot like a scene of a man on a snow bicycle riding through a blizzard.

00:01:32.750 After such a blizzard, sometimes things look like a line of cars parked on the street. But hey, just to repeat, the summers are lovely, and you should at least come visit during the summer sometime if you're nearby. I'm a Ruby developer at what Aaron Patterson has always described as a small start-up you might have heard of, called Red Hat. I work remotely out of Minnesota, as there’s no engineering office there.

00:02:10.560 At Red Hat, I work on ManageIQ, which is an open-source cloud management platform that powers Red Hat CloudForms downstream. It basically aggregates all your enterprise infrastructure into one place and adds a bunch of functionality on top of it. The codebase is hosted on GitHub; it’s easy to find. If you want to learn more, feel free to talk with me afterward or visit manageiq.org. We're always on the lookout for good developers, so if you're interested in joining us, please see me, send me an email, or whatever.

00:02:59.760 I also have a ridiculous amount of swag with me at this conference. If you feel like you don’t have enough stickers, shirts, lanyards, screen cloths, or even ManageIQ candies, please seek me out here at the conference. Now, you might wonder why I am here. I am here because I love programming, and as you can probably imagine, I love writing code. However, there's something else I love even more: I love deleting code.

00:03:54.900 Ruby has been a successful programming language for some time now, and we, as Ruby developers, often maintain legacy applications that have been developed over many years. A consequence of our long-term success is that these applications may contain unused, obsolete, and unnecessary code. Today, I'm going to tackle a specific case. I'm going to talk about methods that stand worthless and dead, unused by any callers in the application.

00:04:10.220 Now, that might be fine for frameworks where the public API is exposed and never called within the framework itself, but in terms of an application, it just adds cruft. So, how does code like this end up in our projects? There are a couple of reasons I can think of. Consider a developer from the beginning of the project years ago who adds methods that they think will be useful someday, but they actually aren’t. They never end up getting used, and the implementation might change underneath them, rendering them ineffective. That’s a kind of over-engineering.

00:04:57.550 There could also be poorly written code. Imagine a brand new, inexperienced developer joining the project—they write a very specific method that isn’t very flexible and is completely unhelpful beyond the single spot where it's used. Perhaps it could be refactored and written better more generally. Now, hopefully, these two examples are just caught in code review, but sometimes they aren’t.

00:05:30.790 It could also be that things have just been refactored over time and methods aren’t needed anymore. So, you might ask, who cares? The short answer is that unnecessary code is confusing. It adds complexity where there shouldn't be any, creates an unnecessary maintenance burden on future developers, and makes you scroll more in your text editor, which is annoying.

00:06:06.530 But don't just take my word for it. Other people think so too. There's a great post by Ned Batchelder called 'The Leading Code' from 2002. In this post, there's a snippet that I'd like to share: "If you have a chunk of code you don't need anymore, there's one big reason to delete it for real rather than leaving it in a disabled state." This is to reduce noise and uncertainty. Some of the worst enemies a developer has are noise and uncertainty in their code because they prevent effective work with it in the future.

00:06:52.700 Now, before we get wound up trying to implement a feature, we might already be lost in the noise and uncertainty. I ask: What if we could programmatically find unused code to delete ahead of time, before we try to implement something else in that area? It turns out we can, to an extent. So today, I'm going to describe how a static code analyzer can be built to find potentially uncalled methods. Now, because Ruby is a dynamic, duck-typed language and this analyzer is only static, it's not going to be a hundred percent accurate. However, it has the potential to point out areas in our code where we can clear out some cruft and add a bunch of deletions to our GitHub stats.

00:07:46.880 We start with some Ruby code that we want to analyze. The first thing we need to do is transform the code into a data structure that we can reason with, which brings us to part one: parsing the code. Some of you know how language parsing works and are very familiar with how it works with Ruby, while others may not, and I think it’s important for everyone to understand how things work from the ground up. So this is really a high-level overview of how general language parsing works—from a grammar, how Ruby does it, and how we’re going to do it.

00:08:46.230 So, do you understand the following sequences of characters? 'The boy owns a dog.' Okay. 'A boy bites the dog.' Kind of weird, but okay. 'Loves boy, though.' Now, how could you programmatically determine which of those are correct and which are not? There's a way to do it, and I'm going to provide a couple of definitions that might be familiar. One is context-free grammar or CFG—this is a set of rules that describes everything contained within the language, essentially answering the question of what sentences are in the language and what are not.

00:09:41.460 We also have Backus-Naur form or BNF. This is just one of the two main notation techniques used to describe this CFG. Here is a context-free grammar for all the sequences of characters that we showed you earlier. It’s really simple to look at. Everything you see on the slide is a symbol, and symbols are split into two groups: the non-terminals (with little brackets) and the terminals (which I’ll represent as all caps). The way it works is that a symbol on the left is replaced with an expression on the right. An expression can be a combination of non-terminals and terminals, and a non-terminal is always replaced via some rule.

00:10:38.220 Now, terminals are the actual tokens found within the language; they terminate at that point, thus the name 'terminals.' So let’s look at the first example: 'The boy owns a dog.' Now, notice I didn't say sentences; I said sequences of characters. We don’t actually know that this set of characters is an actual sentence, but we have a rule for it, and we can apply it. If this is truly a sentence, that means it has to be a subject followed by a predicate at least within our simple grammar.

00:11:51.580 If we try to split it out and say 'the boy' is the subject and 'owns a dog' is the predicate, we can keep following the rules by replacement. If 'the boy' is truly a valid subject, it has to be an article followed by a noun. Let's see—'the' is an article, and 'boy' is a noun. We go through and see the rule for the article: it has to be one of the terminals, 'the' or 'a,' which it is.

00:12:39.900 We can continue parsing down until we find those terminals. In the end, we’ve identified every part of what we call a sentence in our grammar, and if we rearrange a bit, we see this:—this is a parse tree, the discrete representation of our language that we can use to reason about the sentence. What about 'A boy bites the dog?' Again, if you go through it all, it has the same structure; it parses out correctly and is totally fine. However, do boys often bite dogs? Maybe, it could happen, but it seems a little weird.

00:13:34.860 We'll come back to 'Loves boy, though.' As you can imagine, this one doesn’t work out; there's a reason why. If this is a sentence, it has to begin with a subject. If that’s a subject, it has to begin with an article, and an article isn't valid if it's anything but 'the' or 'a'—it’s a syntax error. It doesn’t belong in the language.

00:14:18.000 So, in programming terms, the written sentences—the code we're talking about here—equate to these conclusions: 'The boy owns a dog' makes sense; 'A boy bites the dog' is technically correct, but might not be what we meant. In software terms, that could be a software bug, right? You write Ruby, you don't get a syntax error, but it might be the wrong thing. And 'Loves boy, though,' is a syntax error—it doesn’t work.

00:15:02.560 What does this all have to do with Ruby? Well, you can ask Ruby the same thing: how does Ruby know the meaning of these characters? How does Ruby know that this is a class definition named 'Person,' that it has two methods 'initialize' and 'say hello,' and that there's an instance variable named @etc? Well, Ruby does the same thing that we did with my English examples.

00:15:48.960 My English examples were easy because we skipped lexing, tokenization, and actual programmatic parsing. We just kind of relied on intuition, saying, 'This looks like a subject.' But hopefully, that captures the high-level essence of parse trees for you. It’s a bit more complex, and how Ruby actually accomplishes it is as follows.

00:16:18.610 Ruby has the infamous parse.y grammar file, which it gives to Bison. The resulting parser code is used to scan through your Ruby files, tokenizing them and then parsing the tokens into an abstract syntax tree, which is then compiled to instructions for the virtual machine. The parser generated from Bison is what's called an LALR(1) parser, and I won’t describe how an LALR parser works today because it’s a bit out of scope.

00:17:08.800 However, I'll plug this excellent book called 'Ruby Under a Microscope' by a fantastic person named Pat Shaughnessy. In it, he explains all about Ruby internals. The first chapter is entirely about tokenization and parsing, including an in-depth explanation of the parsing algorithm. It’s a fascinating read, and you should definitely check it out. So how are we going to do it?

00:18:11.760 We’re going to use a gem called Ruby Parser, which is a Ruby parser written in Ruby. Using Rack, which is an LALR(1) parser generator, let’s take a look at an example. We'll have a class named Person with a method greet that takes in a name and just says 'Hello, name.' You can initialize a new person and say hello. Ruby Parser has a class method called for_current_ruby that gives you an instance of Ruby Parser for the current running version of Ruby.

00:18:54.760 You can then feed it the parse method to that and it gives you back an S expression, or sexp. This is a notation for nested list data that originally comes from Lisp. Nested list data is tree-structured, so it’s the perfect notation for describing parse trees. This data contains the structure of our code.

00:19:36.210 You can see the block node up here is the top-level context with a class name 'Person.' Inside that is a definition node named 'greet' that takes an argument 'name,' etc. Now that we have a parser to put our code in a format we can work with, we need a way to process it, which brings us to part two: processing the S expression.

00:20:02.130 Before we do something useful with our S-exp, we need a way to easily manipulate it and start gathering general information that we care about, such as finding where exactly methods are defined and in what classes. We will begin processing everything by building a very minimal class called MinimalSexpProcessor.

00:20:50.040 The goal of this class is simply to run dispatch, calling a method given a node type in our S expression if it exists. Our initialize method will build a sort of dispatch hash. We will take the public methods in the interface, find all that start with the prefix 'process_' and key them within a hash according to their suffix. For example, if we had a method named 'process_definition,' we’d seek out the method from its prefix, take the suffix, and place the method name as a symbol within the processor's hash key by the suffix. Note that the name corresponds to a node type in our S-exp.

00:21:45.300 Next, we’ll write the main method for our processor. Every node in the tree will be passed into this method. It simply looks into the processor hash to call the correct processor method given the current expression's node type. If there isn’t one, we call a default method that we can set as an option. If we don’t have a method to call and haven’t set a default, we just return nil. Additionally, we’ll put a cute little warning output to indicate we didn’t recognize the node type.

00:22:34.880 If we're calling the default method without recognition, we can log that it happened. The class, as you’ve noticed, is pretty simple and doesn’t provide processors yet—it’s merely a base class. To demonstrate an example of a processor subclass, I’m going to define a subclass called SillyProcessor.

00:22:52.350 Within our SillyProcessor, we will define two processor methods. If we encounter a method definition in the expression, 'process_definition' will be called, while 'process_not_definition' will serve as the default for all other node types. The methods simply call 'puts' to tell us information about the nodes. You can see if we encounter a method definition node, we’ll say 'Processing a method definition node'—and for everything else, we’ll just indicate the node type.

00:23:39.960 Both of these methods call another method called 'process_until_empty.' This method iteratively shifts and calls 'process' on the next node in the expression until the expression is empty. Every processor method calls this in the end to start parsing at the next node. Lastly, we’ll fill our initialize method to call super and the parent first, and set the options we desire.

00:24:23.260 We'll say that if you don’t understand or don’t have a processor for the current node, we’ll call 'process_not_definition' and turn off warnings because we expect most of the nodes probably won’t be identified. We’ll demonstrate an example.

00:25:11.140 Again, we’ll get the expression from Ruby Parser and then call process on that. Then we can see the outcome—it goes through and finds all the nodes. For the method definition node, it performs a unique action. You might be thinking this seems pointless, and that’s because it is.

00:25:46.200 However, we’ve now added the next tool to our toolbox. Now we can run whatever code we want at a given node, allowing us to build more complex things like a method tracking processor. Using information from Ruby Parser, we can now record where we see method definitions in classes and their line numbers. We’ll set up this class with the same options as our SillyProcessor, but we’ll implement a couple of new things: two new stacks, a method stack and a class stack.

00:26:34.030 We will utilize these to keep track of where we are in the code tree. We also have a method locations hash that we’ll populate with the method signature as keys and the file name and line number as values. Now the file and line number will be taken directly from Ruby Parser. We will define a couple of processor methods, so 'process_definition' will shift off the node type from the expression.

00:27:30.970 The next thing will be the name, and then we’ll call a method that gives a block and processes it within that. This doesn’t accomplish much other than signify we’re in a method by calling 'in_method,' and 'class' does the same, so 'process_class' will do the same. 'Process until empty' remains the same as you saw before.

00:28:09.470 Here are the two location methods that actually do the work. Now, this might be a little complicated to see, but it’s very simple. They both just add the current method or class onto their respective stacks and pop them off once we yield to the block passed in within the method. We’ll also record the current method signature in our method locations hash with its location. Additionally, if you enter a new class, that creates a new method space—different methods could be in that context.

00:28:56.130 So, we save the old method stack when we go into a class, use a new one for that class, and then revert back to the old stack when we finish processing it. Lastly, we have some little helpers: the current class name would be the first thing on the class stack, while the current method name would be the first thing on the method stack.

00:29:24.480 We’ll record a signature that looks similar to the one you’re accustomed to seeing, with a class name followed by a hash and the method name. Let’s expand our example a bit. We have a Person class with greet, and we’re going to add a little say goodbye method that does nearly the same thing. We also have a Dog class with a bark method.

00:29:55.850 The important thing to pay attention to is where the definitions are defined. We have greet on line 6, say goodbye on line 12, and bark on line 12. So if we do the same thing as before and pretty-print the method locations, we can see that we found 'Person.greet' on line 6 and 'say goodbye' on line 12, along with 'Dog.bark' on line 12.

00:30:29.230 Awesome! Now we know where methods are defined. The generic processors we’ve built so far process the S expression tree and record method locations, providing the footing we need to build our dead method finder.

00:31:04.590 The only thing we need to do now is process the call nodes within the tree and see what methods are being called and line them up with what calls we were tracking.

00:31:12.850 That brings us to part three: building the dead method finder. Yes, we've finally reached the point where we can build the dead method finder that we’ve been working towards!

00:31:23.890 Our dead method finder will subclass from the method tracking processor. In this one, there are two important collections that we will maintain: known, which is a hash containing sets. We will use this to maintain a mapping of method names to the set of classes that define them.

00:31:42.130 And also, we will maintain a set of called methods. Here are two processor methods: for 'process_definition,' we’re going to key into that known hash, adding the current class name to the set of classes that call this method.

00:32:06.030 Next, we will process until the sexp nodes with 'process_call,' which will add the method being called to the set of called methods. Next, we will define 'uncalled_methods.' This is where we will take the difference between known and called methods—in other words, the ones that weren’t called.

00:32:55.490 For each uncalled method, we will key into the known hash to find where a method by that name is defined by class and line number. We also have a little helper method called 'plain_method_name' because the method name from Ruby parser is a string, but we’re going to use symbols.

00:33:41.750 Let’s expand our example once again. With 'greet' and 'say goodbye,' instead of calling puts directly, let’s create a little helper called 'speak' that does it for us. We’ll also add a 'pet_dog' method in 'Person' that takes in a dog object and sends it to 'pet.' With 'Dog,' we’ll add a little attribute accessor called 'fed' because, hey, maybe you want to keep track of whether or not you've fed the dog.

00:34:48.280 Lastly, we’ll add a 'pet method' as well. To add a bit of realism, I have a dog named Reuben, so I will call myself the person who knew my dog, and we’ll mess around with saying hello to Reuben. I can even get Reuben to bark, and then I will pet him. Now we process the S-exp and print the uncalled methods. We should see 'Person.say_goodbye' is supposedly uncalled and 'Dog.pet' is supposedly uncalled.

00:35:40.230 Now, if you’re paying close attention, you’ll find that this is incorrect. There’s a problem here. Supposedly on line 38, I’m calling 'pet' for Reuben, but the method doesn’t get recognized as being called. Why is that? It’s because we hit an edge case. The problem lies in that our 'process_call' is using 'send' as the method being invoked, and while it does use the method 'send,' we want to take into account that sending implies a method call directly.

00:36:16.550 We’ve encountered an edge case. So let’s review our S expression to find where the actual method being called is located and add some logic to determine that this is the method being called. We will say, 'Hey, when the method called is 'send', public 'send', or '__send__', look through that and find the literal being sent.' We’ll then establish that that is the method being invoked.

00:37:02.250 Let’s try it again! Great, it's there and no longer appears as uncalled. This is improvement, but it’s still not correct. What about this identifier here? We never utilized this attribute accessor 'fed,' right? So, it should indeed be in the output.

00:37:36.860 Looking at our S expression for that, we realize that 'attr_accessor' is itself a method call, delineating methods for a getter and a setter, which is why our method tracking processor doesn’t identify them. Let’s add another case in our caller processor to handle that. We’ll say, 'Hey, when you encounter 'attr_accessor', record that as a known method.'

00:38:10.890 Now record_method does the same thing as we were doing in our process definition method, except it also double-checks that the method’s location is recorded. While we’re here tinkering, it’s a good thought to pretty up that output, making it more readable. Let’s define a method called 'report' that looks through the uncalled methods for each of them.

00:38:44.890 Let’s view the location of that method via method locations in our parent class. We’ll throw all of that into a pretty formatted report, and skip this class if there are no uncalled methods. If there are, we will join it all together and print it out. Processing the code now becomes getting the S expression and processing it with a call to report. Here's what it looks like. After fixing the attribute accessor case and formatting the output, you can see now that 'say_goodbye' is marked as supposedly uncalled.

00:39:45.520 And our attribute accessor is not used either, which is great! So now, remember that this static analysis is not always a hundred percent accurate and that these methods are potentially uncalled. Therefore, we do some manual checking ourselves to ensure that these elements are indeed deletable.

00:40:10.030 I suspect that looks like they could be deletable in this straightforward example, so we can proceed to delete them. Doesn’t it feel great? Isn’t deleting code fun? It’s perfect! So, hey, we've done it! We created a dead method finder.

00:40:37.600 Now we can start finding code to potentially delete; the time has come to open a dozen pull requests with a bunch of red deletions, right? But wait— are we truly done? No! Ruby is complex to parse and Ruby has a lot of edge cases.

00:41:15.450 But the good news is that adding edge cases is easy. Think about, for example, 'Ruben.fed = true'—if we actually did use that accessor, it would still be marked as uncalled.

00:41:46.320 Now, if my dog were here and saw that being fed was an issue, he would probably say, 'Just deal with it!' And that is an actual picture of my dog wearing sunglasses. Well, back to the edge case! It’s a natural sign node, meaning attribute assignment. The variable 'Ruben' has the message 'fed =' sent to it with the value of true, so we can again record that method as known.

00:42:15.260 It looks great! There are still many other edge cases, though. What about Rails methods? Every bit of Rails DSL in controllers and models that you're used to using would present edge cases, and there are a lot of them.

00:42:43.540 For instance, there’s 'after_commit', 'before_create', 'after_create', 'before_update', 'before_destroy', and 'before_filter'—except it’s not called 'before_filter' anymore; now it's 'before_action.' Then there's 'around_save', 'validate', 'validates_length_of', after validation, etc. It can feel overwhelming! What about my own DSL?

00:43:39.750 In ManageIQ, we have our own virtual column implementation for ActiveRecord. It digs deep into ActiveRecord internals to allow us to treat any methods as a database column, among other things. It’s mainly used for reporting attributes of entire tables, sprinkling in extra attributes.

00:44:21.930 For example, you could have a class Disk that has many partitions, and you might define a virtual column called allocated_space with an integer type that utilizes those partitions. We, in fact, add a DSL to Rails models in the form of these virtual column calls.

00:45:18.340 The point I’m trying to make is that while all of these DSL and Rails DSL calls seem identical, most of the time, they look similar. They have arguments of symbols and methods to be called.

00:45:35.440 Thus, we can go in and call them all. Essentially, the takeaway is this: as with most things, with the right tools, the job isn’t very difficult, and customization is easy. It’s easy to execute this code on your projects right now!

00:46:08.780 There’s a Ruby gem called DeBride. The author of this gem told me last night that it’s pronounced as DeBride, but apparently someone joked that it’s pronounced as DeBried. So we’ll stick with that. To debride something means to remove dead, contaminated, or adherent tissue and/or material, which sounds a lot like deleting useless code.

00:46:51.300 When I first thought about programmatically finding dead code to delete, I went down the exact same path we just explored, starting with Rack and Ruby Parser. I then discovered the lovely simplicity of processing S expressions with a gem called SexpProcessor. It was created to easily carry out generic processing of S expressions produced by Ruby Parser. Importantly, it provides a method-based SexpProcessor subclass to facilitate the method and class tracking we performed with our method tracking processor today.

00:47:47.880 Then to my delight, I stumbled upon DeBride, which does precisely what we developed today with our dead method finder. After all, everything you see here, from Ruby Parser forward, is written by the same person—Ryan Davis.

00:48:28.680 And you are quite fortunate because Mr. Ryan Davis is here at this conference! Ryan, are you here in this room? There you are! Can I get a round of applause for Ryan for all the fantastic work?

00:48:51.320 I have been hacking on DeBride intermittently over the past several months, customizing it for ManageIQ and finding crufty code to delete on a project that started on Rails 1.2.3 nearly a decade ago. I thought it would be fun to rebuild the basic concepts for you today. All the code you’ve seen is a modified and minimalist example of Ryan’s very excellent work.

00:49:38.580 What does DeBride provide that we haven't covered today? It addresses more edge cases.

00:49:49.260 Our simple method tracking processor and dead method finder form the core of what DeBride accomplishes, but there's so much more to consider! How about finding methods in singleton classes? What about numerous other uncommon Ruby syntax, like calling methods with colons? All those tiny edge cases add up.

00:50:16.920 Ruby is a highly flexible language, and there are many cases yet to be handled. DeBride also adds plenty of lovely options like excluding particular files, white-listing methods based on a pattern, or focusing on a specific path. There’s a Rails mode just like the one you saw earlier.

00:50:44.190 As I mentioned, I’ve been busy hacking on DeBride to find even more dead code. Now, besides adding your own DSL, white-listing patterns, and so forth, you can easily do something like this to find other criteria for what might be deletable code.

00:51:24.490 Remember what I said about methods that might be overly context-specific with only one caller? There’s a very hacky way to find those methods. Instead of keeping a set of called methods, we can keep a hash with a call count for each value.

00:51:56.360 Every time we encounter a call, we just increment that count, and the ones that are called once are then the methods that are called once.

00:52:05.520 Perfect! These are cases where a method might not even need to exist, or perhaps the calling method itself could just handle it. It really depends on your context.

00:52:21.140 I’ve been busy enough hacking on 'Leading Code' that, even though I’ve opened a few minor changes in the project, I still have plenty more I want to refine and share. Pushing more work upstream is certainly ahead of me!

00:52:38.430 Yes, please! Ryan says yes, please! Remember, tools like this are just one option in your toolbox for cleaning up your codebase. Today we looked at one way: parsing and statically analyzing Ruby itself to find potentially uncalled code.

00:53:11.850 But there are many additional tools to use in combination! For example, there's an old code finder by Tom Coplin, which is a Ruby gem that checks code content by date and authorship. Maybe a person named Fred used to work at your company years ago and no longer does; you might want to inspect their code because there's a good chance there’s more you could get rid of.

00:53:51.600 There’s also 'Unused' by Josh Clayton over at Thoughtbot; it’s written in Haskell and utilizes CTags to statically find unused code. It's not particularly constrained to one programming language. I’m a huge, huge fan of using CTags, and although I haven’t explored that yet, I really want to. Others have mentioned a library called 'Scythe'; I haven’t heard of this one yet.

00:54:26.500 But I see a couple of nods, which suggests it might be worth checking out! Before I go, here’s a parting message for you: if you used Merb before it was merged into Rails, you might recognize this: 'No code is faster; has fewer bugs; is easier to understand; and is more maintainable than no code at all.'

00:55:03.390 So, when you go home from this awesome conference, delete some code—it feels fantastic! Thank you!