Refactoring

Summarized using AI

The art of deleting code

Claudio Baccigalupo • November 08, 2022 • Denver, CO

The video titled "The Art of Deleting Code" features Claudio Baccigalupo discussing the process and techniques involved in identifying and removing dead code from a Ruby codebase. As projects grow, certain code blocks become irrelevant, and this presentation aims to help developers find and delete such code effectively. Claudio shares his experience of removing 50,000 lines of Ruby code from a large codebase, showcasing methodologies that can aid developers in their own projects.

Key Points Discussed:

  • Mindset towards unused code: Claudio begins with a personal reflection on his inclination to maintain clean, unused spaces in both his physical and coding environments, emphasizing the challenge of handling a large codebase effectively.
  • Process for code deletion: He details a three-step process for deleting code:
    • Finding code to delete: The first step involves recognizing code that may no longer be necessary by examining it.
    • Confirming code is unused: He discusses using tools like Git to investigate the history of the code—utilizing commands such as git blame, git log, and git bisect to understand how and when the code was used.
    • Communication with team members: It's essential to articulate the rationale behind removing code, especially in Pull Requests (PRs), so that team members can review them without confusion.
  • Example of deletion process: Claudio narrates a case where he investigates an option called header icon, discovering it's no longer utilized by checking its history in Git. He emphasizes the importance of writing a well-structured commit message that links to the history of the code for reviewers' convenience.
  • Best practices for code removal: He advises on keeping commits small and manageable and mentions the Git stash command for pausing current work to handle code deletions effectively.
  • Advanced techniques: Claudio also presents two advanced strategies:
    • Using code coverage tools to identify which sections of the code have not been executed, suggesting places where dead code may reside.
    • Utilizing git bisect to efficiently find the commit that caused the code to stop being used, rather than manually searching through many commits.

Conclusion:

Claudio concludes with a call to action for developers to take ownership of their codebases by removing unnecessary code, likening it to cleaning up trash on a beach. His talk encourages attendees to submit proposals at events, suggesting that sharing knowledge is valuable within the programming community. By embracing a clean code philosophy, developers can contribute to maintaining more manageable and efficient codebases.

The art of deleting code
Claudio Baccigalupo • November 08, 2022 • Denver, CO

As the size of your project grows, some blocks of code become irrelevant. How can you find and delete them? And how can you be sure code is actually dead?

This talk will review a few techniques I learned from removing 50,000 lines of Ruby code at work.

We will see how git can help, with commands like blame, log --follow, and bisect. We will talk about static analysis, and running code coverage in development. We will explain how Ruby meta-programming can conflict with the "Find in Project" approach. We will show how to be nice to reviewers when submitting Pull Requests that delete code.

RubyConf 2021

00:00:11.200 Come in.
00:00:12.880 Hello.
00:00:14.000 Welcome.
00:00:14.920 Everybody, welcome to RubyConf.
00:00:22.320 It's exciting to see so many people gathered here after so long for the love of a programming language.
00:00:25.119 I personally love Ruby.
00:00:28.160 I know they say that the spot right after lunch is the worst one because people just want to close their eyes.
00:00:34.079 But I actually want to indulge that in a second, if you let me. I want to do an experiment for one minute, so actually just go and close your eyes.
00:00:44.160 And, you know, if you're home or watching this video, just look away from the screen. Okay, now I want you to picture an object that you own, that you have in your room, like your bedroom or your office.
00:00:53.199 Think of something that you haven't used in a long time. It can be a book that you haven't read in a while, or maybe a shirt that doesn't fit you anymore. And I want you to think about how it would feel to get rid of it.
00:01:09.520 Is it pain? Is it joy? Is it anxiety, or is it freedom? You can open your eyes now.
00:01:32.799 I asked this question to different friends and co-workers, and I got very different responses.
00:01:36.000 Some people really like to collect objects, and they feel good with many items around them, while other people, like me, prefer the opposite.
00:01:45.200 I like pristine spaces; I like not having unused objects around me. I think it might be a personality trait. Different people have different personalities, and I realized that in my case, this doesn't only apply to my real-world environment, but also to my code.
00:01:58.960 I love Ruby, and I especially love opening a blank new Ruby file, or maybe a file that's like 10 or 20 lines of code. Last year, I joined a corporation that has probably one of the biggest Ruby codebases in the world.
00:02:15.680 To give you some numbers, I actually checked, and in the first quarter I was there, the amount of code that was added was 500,000 lines of code. Of course, there are many employees, but for somebody like me, that's overwhelming. I open a Ruby file, and it has 2,000 lines of code, which gives me a headache.
00:02:39.519 And, I don't just want to complain about it; I want to do something about it. What I've been doing about it is deleting code.
00:02:58.560 It might sound weird, but I'm passionate about deleting code, and I've never heard a talk about it. That's why I'm here to give this talk. In fact, in my first quarter, I deleted about 50,000 lines of Ruby code, and I'm sure there is more.
00:03:18.640 I'm sure if you work at a company, you can probably find some code that can be deleted as well. So, if you're passionate about it, or if I manage to make you passionate about it, then this talk is for you.
00:03:36.239 This talk, as I was saying, really comes from experience. I have made many pull requests, and when I looked back at those, I realized a pattern was forming. Every time I was creating a pull request, there were some steps that I was repeating.
00:03:54.480 To give you the alternative, I didn't just make one pull request that says 'delete 500,000 lines of code' or something, because my co-workers would be like, 'What are you even doing? You just joined; you can't do that!'
00:04:01.360 So that's not what I did. I made very small pull requests that followed a pattern, and that's what I'm going to talk about today.
00:04:05.760 If you're still in the back, you can come closer. The technical part hasn't started yet. So this is kind of like the pattern or process I found, which is easy to understand. The first step is: how do you find code that can be deleted?
00:04:35.199 How do you even get there? That's kind of like the recognition part. Then, of course, you want to make sure that you can delete it. You don't want to just type backspace and pray that production doesn't go down.
00:04:50.160 Finally, you probably have co-workers, so they also need to understand why you did it. They might need to review your code, so there’s also some effort that goes into that.
00:05:06.560 Let's get started! Here's some Ruby code, since we are at RubyConf. This is a method, a pretty sure one, and it's nine lines of code. To give you some context, this is a method called 'model' that displays a model in an ERB view.
00:05:31.280 You don't have to type HTML and CSS; you can just invoke this method we wrote in your view, and it accepts some options regarding how the model should be displayed.
00:05:45.199 So, this method already existed, and one day I opened this file because I had to add an option to this method. The options are a hash, and there are different options.
00:06:03.759 So, you know, you might be curious about this code, just like I was when I opened it: what is this code doing? Because I knew I had to edit it, but I wanted to read it a little bit and understand what was going on.
00:06:24.639 I see there's an option called 'header_icon,' and I might guess what that is doing. Then I see this conditional option, and I'm like, 'What is that option exactly?' I have this curiosity; I want to understand the code because maybe I can reuse it.
00:06:45.280 Or maybe that code is actually code that can be deleted; I don't know.
00:07:01.680 So before I even start typing, I get there. I'm just curious: what is this options header? Could it even be used anywhere?
00:07:05.760 So the first tool that we can all reach for is 'Finding in Project.' This looks different based on your editor; it can be like Command+Shift+F, or you can use it in your terminal or graphical interface.
00:07:29.639 Basically, what I'm doing here is I see this header icon, and I want to see where else it is in the code: who is actually using this option? That might give me some insight.
00:07:48.639 And it turns out that this option is only used in this file, so now I have a suspicion: I'm like, 'Wait, why is there an option that nobody's calling?' You know, maybe I'm on the right track; maybe this code is not used.
00:08:01.679 However, let's not forget that Ruby has metaprogramming. Maybe somebody, some evil co-worker called 'dot send' and passed an interpolated string or things like that you can do in Ruby.
00:08:25.440 But I can't be sure that I can delete this yet, so I want to go a little deeper.
00:08:40.640 The next tool I reach out for is Git. Now, not every Ruby programmer uses Git; there are other version control systems, but it's definitely one of the most common.
00:08:58.400 So, with Git, what I can do is use 'git blame' and the name of the file.
00:09:01.679 What 'git blame' does is, for each line of code, it tells me who wrote it, when, and why—there's a commit message. That number you see at the beginning is also called a SHA, and it tells me when the line was introduced.
00:09:19.919 For instance, I'm looking for this one. The reason why I'm doing this exploration is that if someone added it, maybe that commit message is going to tell me why. What was it doing?
00:09:30.240 So I can look more specifically at this commit.
00:09:44.720 There are more seats available if you want to come and join us over there.
00:09:51.040 There is another Git command to look at a specific commit, and that is 'git show.' So I type 'git show' with a SHA, and now I'm investigating this commit.
00:10:05.760 It's titled 'create an icon helper' and was created three and a half years ago. It does indeed add this option called 'header_icon' that I'm curious about.
00:10:31.760 Not only does it do that, but also in a separate file called 'delete_app,' it used this option. This makes sense, right? If somebody added an option and also used it, why would someone add an option just because?
00:10:55.040 So, what this is telling me is that three and a half years ago, somebody added the option and was using it somewhere. But then 'Finding in Project' is telling me that it's not used now.
00:11:18.320 So, what happened? Now I'm kind of like Indiana Jones trying to find the Holy Grail. What I want to do now is look at this other file, 'delete_app.rb.' What's the story of this file?
00:11:30.720 So there is another Git command for that: it's called 'git log.' You type 'git log' with a file, and you know, it normally shows the history of the file.
00:11:48.320 But I'm getting an error here that says there is no known revision or path not in the working tree. This error simply means that the file doesn't exist anymore.
00:12:04.800 So I might think 'cool,' the file doesn't exist anymore, so I'm good—nobody's calling it. But once again, I want to be more specific. Okay, it doesn't exist anymore, but can I see when it was deleted?
00:12:22.320 Like can I still look at the history of a file? It turns out you can do that with 'git log'; you just need to be more explicit with the options.
00:12:35.040 So you can do 'git log --follow' which will continue the history of a file even beyond the names. Then you can do the '--' to pass the files. Now I can actually see the history of the file even though it's been deleted.
00:12:56.760 The most recent commit called 'remove_the_delete_app_card' actually makes sense. It says basically this file was deleted, and it was done two and a half years ago.
00:13:22.960 So now the story in my mind is complete: three and a half years ago, somebody added the option and used it. Two and a half years ago, somebody else removed the only usage of this option from the code but forgot to delete the option itself.
00:13:42.080 That happens; we're all busy building million-dollar features.
00:14:02.720 So that is the whole story: there was a commit where it was added, and then there was a commit where it stopped being used.
00:14:19.199 Finally, I have everything with me to write what I would call a good commit message.
00:14:35.360 So I can delete the code and type 'git commit.' The way I write the commit messages goes along this way: remove unused option 'header_icon'; it was introduced in 27b but its last invocation was removed in 86e.
00:14:53.600 The reason I call this a good commit message is that it's compact but poignant. What I mean by that is that anybody who reviews this commit has all the tools to understand what I'm doing.
00:15:12.160 Especially if you use GitHub, GitHub translates those numbers and SHAs into links so you can click on them, and it actually takes you to those commits.
00:15:31.760 For this reason, I don't need to type out 'three and a half years ago, x did this' and so on, because you can just click on those links and see for yourself.
00:15:51.680 So this commit message is short but really powerful, and I'm doing this basically to do a favor to myself. What I mean is, I'm going to submit this commit, and at my company, just like many other companies, somebody has to review the code changes.
00:16:11.680 Maybe the person who reviews this is going to be reviewing it tomorrow or three days from now. If I just type something like 'deleted unused code,' they might come back and say, 'Are you sure? What is this for?'
00:16:32.080 Three days from now, maybe I'm working on another feature, and I have to interrupt myself. Why do that if I have already done all the work today?
00:17:00.320 This is the way in which I share this kind of code with my co-workers.
00:17:06.720 Another thing I do to make sure that my changes are approved in a very easy way is to keep this commit small. So this specific commit is just deleting those lines of code.
00:17:30.720 So when somebody sees it, they're like, 'Oh, okay, I get it.' I look at the code, I get what's happening, I can click, I can ask, but it's not 300 additions, 400 deletions, and so on.
00:17:51.840 Now, normally, when I find code that can be deleted, I'm in the middle of a gigantic feature. I'm like, 'Oh, I'm doing all these things; I'm building the next big feature!'
00:18:05.680 But now I also want to delete this.
00:18:20.399 So it's easy to put it all together in one commit. It's easy to think, 'I can just do one commit, and I have a commit message that's like a book,' saying it's doing all these things at once.
00:18:41.920 But that would still not be nice to my co-workers. So, instead, what I suggest is that if you're working on a big feature, pause whatever you're doing, just put it aside, make a small commit to delete code, and then bring your work back.
00:19:02.720 In fact, this pause is not hard to do. There is a Git command for that; it's called 'git stash.'
00:19:18.960 Let me show you an example. Let's imagine I'm working on a feature, and meanwhile, I've already touched this file; I've already added another file.
00:19:24.960 And I'm like, 'Wait, I want to pause this; I want to, you know, put this aside and delete this.' So what you can type is, you can type 'git stash.'
00:19:47.760 If you add the dash included untracked option which is simply '-u', it’s going to stash all the files, including the untracked ones, which by default are not.
00:19:59.200 So remember to use that option. You stash, then you do your very small commit; you know, you push it, and then you bring your work back with 'git stash apply.'
00:20:16.800 So maybe this is simple, maybe this is complicated, and you know I'm here to hear your feedback after the talk. But it's really just a process.
00:20:43.360 So if you are in the mindset, and you really just want to make sure that some code gets deleted and that the approval for that PR is, you know, just a thumbs up, this is something that has worked for me so far.
00:21:03.920 In the last part of this talk, I want to talk about a couple of other techniques that are a little more advanced, but they still fit into this process.
00:21:33.760 The first one is recognition. I said before, I happened to open a file that had some unused code, and I saw it.
00:21:52.160 But if you work in a very big codebase, you probably have files that you never open, and maybe those are the files that have code that can be deleted.
00:22:08.160 So it's kind of like a paradox. How do you even find it if you never open them? I developed my own strategy. I don't know if anybody does this.
00:22:27.680 The way in which I do it is with code coverage. Code coverage is a tool that's normally used for testing.
00:22:42.720 It's available in almost every programming language, and in Ruby, there is one that I really like, which is called SimpleCov. Normally, when you do code coverage in testing, you write your tests, you run them, and then this tool comes up and tells you which lines of code have been executed by your tests.
00:23:03.680 But nothing prevents you from using these tools in development or in production, which is what I do. In development, I start the tool, then locally I just play around with the app, you know, try to hit all the methods and stuff for a while.
00:23:23.200 Then I stop the tool, and it comes up telling me which lines of code I have executed.
00:23:43.120 Sometimes I do this in production as well. I let users roam around for like an hour, you know, do whatever they need to do.
00:23:59.920 If a file has 100% coverage, it means that it's been completely executed; all the lines are used for one reason or another.
00:24:25.200 But if that's not the case, that gives me a hint. Maybe there is a method that customers have been going around for an hour that’s never been executed.
00:24:44.000 I have a hint; next time I have some spare time, maybe I can start there. I don't have to randomly open a file and hope that I find a good one, especially in a very big codebase.
00:25:00.960 So code coverage has helped me with that, and it just gives me hints or places where I can start. There might be other static analysis out there that do similar things.
00:25:19.680 The good thing about code coverage tools is that probably you already have them in your codebase for testing. They’re really easy to use.
00:25:38.360 Finally, I want to talk about another technique, and here is where you need to be awake because this is the hard part. It's another Git command that I want to talk about.
00:25:57.960 So this is the exact same slide I had before where I said this 'delete_app' file was using the option that I was mentioning, but this file is gone.
00:26:09.480 It’s pretty easy to understand that it’s not invoking the code anymore; the file has been deleted completely.
00:26:24.960 Now, when I talk about another scenario, which goes like this, the file itself still exists but has different code.
00:26:41.600 Three and a half years ago, somebody was using the option in this file. Today, the file is still here, but the code is different.
00:27:05.760 So in other words, three and a half years ago, the code that I'm trying to find was there. Today, I do a 'Finding in Project,' and it’s not there; it’s not in that file.
00:27:19.440 So somewhere in the middle, somebody went and changed this file and removed the only usage of that option.
00:27:31.440 The question is, where? When you know there might be dozens or hundreds of commits in the middle, I don't want to go one by one and check. Maybe there are hundreds of them.
00:27:57.440 So is there a faster way to do that? There is, and it's a command called 'git bisect.'
00:28:04.720 So to the right, these are all the commits that have ever affected this file. I know that somewhere here, there is one commit that removed the only usage of that option.
00:28:22.440 So what I do is type 'git bisect start.'
00:28:39.520 Then I type 'git bisect good' with the very last commit (good means the option was invoked; the code was there; we know that).
00:28:56.480 Then you type 'git bisect bad' with the most recent commit (we know the option is not there).
00:29:02.720 When you type this, Git takes you exactly in the middle. That's why it's called bisect. It takes you, you know, no matter how many commits there are, it takes you in the middle and it says, 'What about this commit? Is the code here invoking that option or not?'
00:29:18.000 Now, you have to do that work. You can use 'Finding in Project'; what I use is 'grep,' which counts how many occurrences of 'header_icon' are in that file.
00:29:39.840 In this case, zero means it’s not using the option, so now all I have to type is 'git bisect bad', because I know this code is not there.
00:29:54.720 What 'git bisect' does now is take me to the middle of the second half. It says, 'What about here?'
00:30:12.640 And now you do this again. So was it there? In this case, it was there, so you just type 'git bisect good'.
00:30:23.760 Now, that takes you to the half of the third quarter, and you do that; it only takes four or five steps until you get to a single commit.
00:30:42.960 In this case, it tells me which was the first bad commit.
00:31:08.560 Once you're done with 'bisect,' you type 'git bisect reset' to just go back to your normal work.
00:31:20.960 Now that we have this, we can use that in the same way we were using before; we can say, 'Remove unused option header_icon. It was introduced, but its lasting location was removed in 1927.'
00:31:41.040 Then people can click there to see that it's not that the file was deleted; it's that that specific line of code was deleted, and then the rest of the commit is similar.
00:32:03.760 So, to wrap up, these are all the Git commands I mentioned in my talk and I find them pretty useful to delete code.
00:32:20.720 The first one is you're just curious. You're looking at some code: when or why was this added? You do a 'git blame.'
00:32:41.040 Then, maybe, you find the line of code and you want to see what else happened at the same time; you can do 'git show' with that commit.
00:33:03.840 If you're looking at a specific file and want to know the history, even if the file was renamed or deleted, you can do 'git log' with those options.
00:33:28.400 If you know there was a commit that changed something but you don't know which one, you can do this in a rapid and efficient way with 'git bisect.'
00:33:51.920 With all this, you have what you need to write a good commit message.
00:34:14.720 So, first of all, if you're in the middle of something, put it away, stash it—do 'git stash.'
00:34:32.960 Then write the commit. My suggestion is to never use 'git commit' with the '-m' option, meaning in line in your terminal.
00:34:44.960 Just do 'git commit,' and open your editor. So then you have all the space that you need to write your message.
00:35:00.000 Then you can bring your work back with 'git stash apply.'
00:35:14.240 To conclude, I just want to mention that, as I said at the beginning, this is a talk that really comes from experience and passion.
00:35:43.680 It's really not about looking back. I never really care about who left the code there or why it's there; I'm perfectly aware that these things happen.
00:36:02.560 Probably, it might have been me; I forgot about it, and that's really not relevant to me doing this.
00:36:19.680 It's kind of similar to when you go to the beach, and there's a piece of trash. Some people don't even notice it, and then others see it and think, 'Well, I didn't leave it there, so I don’t have to pick it up.'
00:36:36.480 But then there are people who pick it up because it doesn't matter; we don't want to linger in the past; it's just more about building a better future.
00:36:54.880 Finally, I want to thank the committee for accepting my talk because, in my mind, this was kind of a very personal and even weird talk.
00:37:06.400 But it actually was accepted, so if you're in the audience and you have any passion or theme or anything that's related to Ruby and you have doubts and think it will never get accepted, mine was!
00:37:24.560 So you really only have to submit an abstract which is like 300 words. I encourage you all to do it because maybe next time, you're going to be standing on this stage giving a talk.
00:37:41.600 That concludes my presentation. Thanks for coming, and have a great rest of your day!
Explore all talks recorded at RubyConf 2021
+91