Talks
Harry Potter and the Legacy Codebase
Summarized using AI

Harry Potter and the Legacy Codebase

by Kerri Miller

In this engaging talk titled "Harry Potter and the Legacy Codebase," Kerri Miller explores the challenges and techniques associated with dealing with legacy code in software development. She likens approaching legacy code to navigating a mysterious labyrinth at Hogwarts, emphasizing the importance of understanding its history and the developers who created it. The key points covered in the presentation are as follows:

  • Understanding Legacy Code: Legacy code is often perceived as problematic and poorly written; however, it represents past solutions to development challenges. Knowing the context of its creation can offer insights into the development culture.
  • Code Archaeology: Miller introduces the concept of Code Archaeology 101, where developers must gather knowledge about legacy code through three key steps: surveillance, excavation, and analysis. This approach helps unlock the code's historical context and offers direction for improvements.
  • Collaborative Learning: The talk encourages developers, or 'students,' to engage with previous developers (the legacy people) to understand decisions that led to specific implementations. This dialogue can be invaluable in gaining perspective on the codebase.
  • House System: The use of a Hogwarts-inspired house system organizes participants into groups (House Reorg, House New Hire, House Fed Up), assigning roles in the 'battle' against legacy code, fostering collaborative problem-solving.
  • Importance of documentation: Documentation is highlighted as essential in bridging communication gaps between teams and new developers. Recommended practices include keeping a comprehensive README, creating a glossary for terms used in the codebase, and formulating visual aids for understanding complex relationships in code.
  • Evolutionary Perspective: The talk emphasizes viewing legacy code through an evolutionary lens—acknowledging that what exists serves a purpose based on historical constraints, and improvements should be gradual rather than revolutionary. The method also highlights the importance of focusing on small, manageable code changes instead of attempting to rewrite large segments of it.

Throughout the presentation, Miller uses witty references to 'monsters' to illustrate the intimidating nature of legacy code, while encouraging attendees to embrace these challenges as opportunities for growth and learning. The main takeaway is the significance of uncovering the stories behind legacy code and learning from those who came before to ensure better practices for future generations of developers.

00:00:20.560 Yeah, I told my father I was coming to Dallas to do a talk for a convention, and he said, 'You're gonna need a hat. So get a big hat!'
00:00:32.000 Okay! So, as we said, my name is Kerri Miller, and I'd like to welcome you all to this very special convention that we're having here today—a special workshop by our mutual employer, Hogwarts Online.
00:00:39.760 For those of you who don't know me already, I am a member of House Minnesota, which was created 20 years ago by the great wizard Matz and, of course, propelled forward by the enigmatic DHH and the speed demon mage. We have a very fine tradition of two-space indentation and funny gem names, and I expect all members of the house to uphold those traditions. Now, I am based out of Seattle, Washington, although just like Schnems from the Hat and Sandals, you might think Austin, but no, it’s actually Seattle, Washington.
00:01:11.760 Do you guys mind if I take this off? I keep whacking into things. There we go. The hat will let me know if I’m doing anything wrong.
00:01:32.479 I work for a Ruby and JavaScript consultancy called Northwest Independent Ruby Developers. I'm also the lead instructor at Ada Developers Academy. We had a very successful Indiegogo campaign over the summer. We're a developer training academy. The unique thing about us is that we're not for profit, and we actually offer a 12-month program. It’s a six-month classroom curriculum followed by a six-month internship, and it’s entirely free to the students. In fact, we actually pay them a thousand dollars a month during the program!
00:02:00.320 This is not a staged shot by the way; the team here on the left had just figured out OAuth, and everyone else is like, 'How the hell? What's going on? You got Facebook to work? This is great!' I’m very proud of them.
00:02:31.040 But today, we’re here to talk about monsters. Hogwarts Online has organized this workshop because we have a plethora of monsters infesting our basement right now, and there’s a battle coming. So we’ve invited you, the members of these three different houses, to come and help us devise a battle plan because war is coming, people, and we need to be ready.
00:03:00.800 Now, who here has seen a legacy code monster before? Can I see your scars? Yeah, they’re on the inside.
00:03:03.360 For those of you who haven’t seen it, is anybody here like a week hard or expecting any expectant fathers? Okay, you might want to avert your eyes.
00:03:14.240 This is called Slide 10. Yeah, those are legacy code monsters. These are nasty beasts—nasty, nasty beasts. They’ve infested our home, but there’s one thing we have to know about them.
00:03:35.760 All too often, we assume that legacy code is brittle and wrong, that the developers before us deliberately set out to leave traps for us. They were doing this on purpose! I mean, who here has ever written just horrible, awful, obfuscated code on purpose? Okay, a couple of Perl developers here. Excellent!
00:03:52.239 Well, when we think about code this way, we’re looking at it outside of the context of its original creation. Legacy code is really a storehouse of domain knowledge. It is literally a record of every choice that was forced upon developers as they moved along to try to solve a problem. Today’s bad code was yesterday’s best solution given the resources and the knowledge at hand to solve a particular problem at a particular point in time.
00:04:39.360 It’s by studying the code that came before us that we understand the culture of the wizards that came before us and how we want to deviate from that culture going forward. We’ve created an ideal breeding ground for these beasts, and they exhibit a lot of these characteristics.
00:05:01.600 You may perceive this assignment as some sort of punishment for that recent incident involving the visiting professor, the polyjuice, and Hagrid’s pumpkin, of which we will speak no more. This is not a punishment. This assignment, in fact, you should embrace it as a wonderful opportunity to earn many points for your house.
00:05:46.720 As we unravel this puzzle, we'll be handing out some rewards. This mess comes about, as I said, because every behavior that you see someone exhibit is perfectly influenced by a world that has been perfectly built to support that decision. Somebody decides to commit stuff into master, like say, S3 keys, not because they don’t know better—well maybe they don’t know better—but because the environment let them do that.
00:06:24.800 The environment being the tools or the people around them. We’re all forced to take shortcuts, and we all get sloppy sometimes. We focus on the details of implementation rather than a strategic picture, or maybe we just don’t have complete information about the requirements. Really, you know, someone gave us a really crappy user story.
00:06:57.600 Or maybe we just don’t have expert level knowledge about how to build an SOA architecture or implement Redis appropriately, or God help you, tune MySQL. We don’t set out to write code that decays into sticky masses. We really don’t do that. So I’d like to take an evolutionary perspective, which is kind of what we’re talking about today.
00:07:36.160 Code has a lot of—somebody early this morning said—a vestigial code, which I really love. It’s that idea of a creature having a trait, like tails on humans, or the fact that whales have small bits of hair still because they’re mammals, and it’s totally useless in the current environment. Evolutionarily speaking, it will stay there until the environment forces that adaptation into the role of a hindrance, and then evolution will pull away from that.
00:08:14.080 Code works the same way. We have bad code, we have dead ends, we have things that are slow, but it doesn’t matter until it matters. We’ve all written a really bad Active Record complicated thing, and it’s super slow, but it doesn’t matter because that page gets three hits a week. Then all of a sudden it blows up and now it’s bringing everything down, and people are talking about, ‘Well, let’s ramp up more dinos and let’s get a bigger database server,’ and all you have to do is change two lines of Active Record to fix it.
00:09:02.720 That’s the evolutionary perspective coming into play. To learn some of the secrets of this beast, Hogwarts Online is now offering a new class this semester for first-year and above students: Code Archaeology 101. Now, within the context of legacy code, archaeology is really the best metaphor from the social sciences for understanding what we do as developers working with legacy code.
00:09:52.200 Just as an archaeologist goes out and digs up pot shards or looks at ruins or tries to translate some ancient codex, we too are working with the artifacts of a culture that came before us—whether that be the developer, the team, or the company that created those sorts of things.
00:10:17.920 Working with legacy code, however, the truth is that more often the research and the methodical nature of exploring the code is where the true reward comes from. It’s through this analysis that we create these narratives, and that gives us the insight we really need to solve the problems of today. Those who came before us dealt with these problems before. In the Ruby community, we have a really bad habit of not ever really knowing that there were developers before about '98 or '99.
00:11:01.920 I mean, the MVC pattern came out in the 70s, and sometime DCI was written about then. These things had been solved before.
00:11:43.920 In this class, these are some of the potential things that we’ll be learning over the course of the semester, and I encourage you to pick one and dive in. In the real world, archaeology is carried out through these three steps: surveillance, excavation, and finally, analysis.
00:12:05.440 Now, each house here—House Reorg, House New Hire, and House Fed Up—you all have a role to play in executing on these three steps. You will, of course, be graded on a final presentation at the end of class at the Triwizard Cup, and of course, bonus points will be given out for dioramas.
00:12:29.440 I have a lot of love for Ron; I think it’s a ginger thing—we have to stick together. Souls aren't easy to steal, people; they fight back! I encourage people to be brave because whenever you’re dealing with legacy code, it’s scary. These beasts rear their heads and they chomp at you and they bite.
00:13:00.640 But if legacy code wasn’t hard, it wouldn’t have the reputation, so there’s some truth to this. It can be difficult to deal with. I know it’s tempting to dismiss wizards who came before us as doddering fools full of quaint ideas, but in truth, resist this because every line of code is part of a greater story.
00:13:38.640 Legacy people are our storytellers; they’re the ones who can tell us, 'Oh, that line of code is there because we have to do this.' And maybe you don’t know about that, and that is super important to the business.
00:14:04.000 Readmes are great; I love readmes. I love code samples; I love tests. But these are documentation, and documentation is knowledge, and knowledge is not wisdom. True wisdom can only come through the telling of the story. So, find your legacy people—even if you hate them—and take them to lunch! Go out for coffee. Get them to tell you their stories of fighting these demons and these battles, and you’ll learn an awful lot.
00:14:44.800 So, House Fed Up, you’re tired of things. You’ve been complaining for a while. It’s time to get to it. Your job is to collect information about the activity and the context of the current time. The contexts you’re going to be studying are the overall architecture of the system, implementation details of classes down through methods.
00:15:25.440 Who calls this class? What are its neighbors? Who is it entangled with? Who does it go out on dates with, and who is it cheating on? You’ve got to find these things out. You do that by taking inventory when you get started.
00:16:11.760 Look at code coverage; look at the tests. Take some benchmarks, look at New Relic, and see how things are performing; where are the hotspots? Myself, in the grand tradition here, I will talk about one of my gems, which is called Turbulence, which charts churn rate versus complexity to identify hotspots in your code where bugs are going to crop up.
00:16:52.560 Run these sorts of tests; wire them into your CI so you have a running record, and you’re establishing a baseline for improvements. Are we improving or not? Are we moving at least in the right direction?
00:17:34.240 Now normally, I wouldn’t include a slide like this. Professor Weasley, as I said, fellow ginger, but asking dumb questions is very important when you’re new to a team. It’s probably the worst mistake you can make when you’re joining a team.
00:17:42.080 If you’re trying to fix a problem, the same old same old isn’t going to cut it. You have to question the assumptions of those people around you.
00:18:11.680 Do that through flagging mythology. Listen carefully to what people complain about. What are the anathemas and heresies of a team? If you mention PostgreSQL, does everybody scoff at you and say, 'Oh, they’re going to lose data'? Do people say, 'No, Ruby can’t scale'? Do people say, 'No, Erlang is too hard for that particular service'? These sorts of things are going to inform you a lot about the culture of the team.
00:19:04.200 And then finally, the culture of the code. I don’t want you to be spying on your fellow students, so this isn’t any sort of—we’re not going back to those days here at Hogwarts Online. But when somebody says, 'We can't use technology XYZ because they won’t let me'—why not? Why can’t we use technology XYZ, and who are they?
00:19:48.720 All too often we move from, 'Here’s a cool idea,' to all the way to, 'No, it’s a rule, and it’s in the employee handbook.' Remember that every rule has somebody’s name attached to it—even the dumbest rule and the dumbest decision made in the code; there’s a reason for it.
00:20:43.440 When I worked at Amazon, there was actually a line in the employee handbook that said all developers must wear proper undergarments at all times. I can honestly say that the reason was 9/11.
00:21:24.920 When we’re exploring legacy code, we encounter legacy processes as much as we do legacy code and legacy people. It’s important to keep this in mind. House Fed Up, it’s your job to separate which is which—what is legacy that is holding us back and what is legacy that will propel us forward.
00:22:07.920 Now, I want to talk about a little bit of code. It recently came to my attention that we have basilisks as well in the basement, which I did not understand. By the way, I am looking to adopt a new dog. However, one really cool trick we can do is remove their poison, so that’s good. They will just be stone; they won’t be dead.
00:22:55.920 However, the bean counters down in the potions department have decided that this is incredibly wasteful and they want us to actually keep the poison. At some point, somebody made a poison barrel class that looks very Rails-ish, doesn’t it? That’s pretty good.
00:23:43.440 Then they bolted that onto the remove poison so that we grab a poison barrel to throw the poison in the barrel. What’s wrong with this code? Anyone? Yeah, this code lies to us about what it does. Because of that, there’s more points of complexity and lines of code about barrels than about removing poison. The actual storage of the poison is a side effect here of remove poison.
00:24:41.840 This violates the single responsibility principle, and we’re propelling forward a mythology about what this particular piece of code does. Once our survey tells us where the worst code is and what really needs to be fixed, now it’s time to dig in. House New Hire, this is where you shine.
00:25:27.680 Because you are not carrying forward any sort of sense of, 'They said we can't use Vim,' or 'They said we can't use Redis.' You have the most to gain, and you’re going to learn the most during this time. So it’s your job to dig into the code’s provenance to find its place in the logical architecture and control flow and subtly change it to improve these problems.
00:26:14.720 Sometimes finding the relationship between a derelict code fragment and nearby objects is going to be difficult, but you can come back to House Fed Up and talk to them about why these things are tangled up.
00:26:59.440 Before you get in there, though, you need to make a plan. Now, plans don’t always work out—in fact, most of my plans always fall through. Except for the hat; I think the hat worked out pretty well. TSA loved it, by the way.
00:27:40.160 Legacy code monsters love to scare us. They love to jump out of the shadows and say, 'Ooh!' and then you say, 'Oh, look at the tentacles, and oh, look at the teeth, and ah, it's a yak!' Don’t fall for it! Make sure to stick to your plan. If you have a goal to fix a thing, fix that thing.
00:28:13.680 If enough very inviting corridors open up to one side, or you start to see traps and other beasts in the shadows, you can defend yourself against these by creating GitHub issues or emailing people about it. But focus on your one target; shave as few of these evil yaks as possible.
00:28:45.680 You want to avoid overwhelming force whenever possible. You want the lightest touch that’s called for because it would be super satisfying—even for me—to go down and cast Stupefy on all of these beasts in the basement, just blast them to oblivion or lock them in a dungeon and starve them out somewhere, and just replace them. We can rebuild this from the ground up, three weeks no problem!
00:29:46.840 How many people actually said that? Just me? Okay, I think pretty much everyone though in your heart has been like, 'Oh, I could just rewrite this whole thing; it’d be no problem.' And then nine months later, you ship.
00:30:20.320 Because we have Conway's Law that we keep running into, right? Code represents the social fabric of those who are writing it—the social structure of the teams and those divisions can be reflected in our code.
00:31:27.360 The problems we have today that our legacy code represents—even if we rewrote it, we would still have the same problems. So back to basilisks. We go through our existing code and replace remove poison with this new method called harvest poison I’ve created, which at least is honest about what it’s doing.
00:32:09.040 We extract the caring about the container and everything; we put that down, and then we leave a breadcrumb behind in the warning that we’ve deprecated this. As soon as other people start running their tests—because we know they’re all running their tests, right? We’ll start to see deprecations popping up.
00:32:49.480 And you’ll start spreading that knowledge in case we miss one of those legacy things that we’ve replaced. Now, experienced developers, I’m sure you can look at this, and you’ll find other improvements, but I think this gets the point across a little bit. This is all about being graceful and doing small things.
00:33:35.760 So, documentation, House ReOrg, this is kind of your baby because you own it now. You know it’s your big ball of mud, so you’ve got to tend to it a little bit. It’s your garden.
00:34:05.920 Once we get the physical and logical position of all the stuff fixed down, we need to determine the implementation details and the edge cases that aren’t accounted for and where the tests go—with just the value of this piece of code. We fixed it, but what are we going to do with it next week? What’s going to be the problem in a month?
00:34:44.800 The most critical part of analysis, of course, is documentation. We’ll do that through blog posts or internal or external consumption, emails to people, readmes, and hopefully some tests. Bonus points, of course, will come for interpretive dance at the Triwizard Cup.
00:35:29.920 Whenever I start a new project or company or a new team or anything, I just look at the readme. I know how to set up a Rails environment; I’ve done this a billion times. I can check it out, get cloned, and do that. I look at the readme and I just follow the steps in the readme.
00:36:12.960 You know, the steps. You know what they are. Most projects usually say, 'Yes, hi, welcome, you’re riding the rails,' which is great, right? But not so much great because every single step of setting up a new project should be documented. Surely we all know that we have to go find Tony, and Tony’s got a copy of the test database, and like we gotta copy it over.
00:36:55.760 And oh okay, you gotta get a USB stick and put it over there, and oh you don’t have Redis installed and a particular version of Rabbit. Why isn’t that in the readme of the project for a new developer to check out or a contractor? I bill a lot of hours for it not working!
00:37:34.640 Additionally, every single project and team has its own concepts and TLAs, so I encourage you to create a dictionary, a wiki, or a thesaurus of some sort, somewhere—a grimoire that notes what this means when we say it. If you describe the word spec in a different way than other people, that’s important, and you should note that down.
00:38:23.520 If you have a TLA—I actually used to have to fill out TPS reports when I was at Amazon, seriously—transactions per second. Right? Every time we did it, we were just like, 'This is great,' and nobody ever got it. None of the bean counters ever understood, so write things down and document it!
00:39:05.680 You are developing a specialized language, you know, even amongst your friends—like jokes and sorts of things, right? You’ve got a special language that identifies you as a tribe and as a team of people working together on a project. You are developing your own language.
00:39:49.600 If you’re going to be welcoming new people in, document it because this is about alleviating and paying for alleviating the pain of the developers who are going to have to deal with your crap in six months when your code of today becomes legacy code.
00:40:34.000 Another artifact that we can create here is a map. Most legacy projects are usually Rails-y, so there are a couple of gems. One is Rails ERD; another one is called Railroading, and they'll generate an NCD relationship diagram. Are people familiar with those?
00:41:21.680 It’s basically a bunch of circles and blobs—it's UML with arrows pointing to, like, 'Here’s the basilisk, here’s the poison barrel, and here’s the barrel superclass,' with arrows showing inheritance and methods that get called back and forth. It shows the relationships between objects in your system.
00:41:56.240 Now, I love to go down to my local Kinko’s with this on a zip drive—oh my God, somebody showed a zip drive this morning; it was Clint, and I was like, 'Oh man, the zip drive and the jazz!'
00:42:29.440 Anyway, I put it on a USB stick, head down to Kinko's, and print out a large-format parchment, then put it up on the wall. I worked on a Rails project once with 400 Active Record models, and the lines were going everywhere. But as soon as we printed it out and put it on the wall, people would walk by and say, 'Oh, this whole thing we can throw that away—why do we have that?'
00:43:09.680 Or this piece over here: 'This needs to be a service.' By making it visual, not everybody is a code person—some people are visual, some people are kinesthetic, some people are synesthetic. We can encourage people to relate to our code in different ways and leverage the strengths that people bring to our team.
00:43:55.440 Better yet, do what I did on that project. Find a senior developer or a veteran of the code, if you will. If they’re not senior, lock them into a room with a whiteboard and order in lunch. Just go through the code: 'What does this do? What does this do?' Draw it on the board. Draw your own relationship maps between objects in your system to really understand how they relate to each other and make a battle plan for attacking the seams.
00:44:31.840 So, you've all done well so far, and it's time to prepare for a presentation. Now you completed the surveillance, the excavation, and the analysis of this legacy code beast, and I've decided you've done such a wonderful job that we are going to forget the incident with the visiting professor, the polyjuice, and the pumpkin. Didn't think I'd mention that again, did you?
00:45:13.760 But there will be other legacy code beasts in the basement to contend with. We may have fixed this particular one with the basilisk and the poison barrel, but others are going to come up. However, I have every confidence in your abilities. You are a self-selected group here at this conference. Your companies have sent you here because they have faith in you, or you are here by choice because you have faith in yourself.
00:46:03.440 And you want to do better, and you can do better. Attack these things and bring people along with you because, remember, legacy code beasts are phantasms—they're boogeymen that scare our young apprentices.
00:46:41.440 They are really the unintended side effects of a series of choices made by wizards who came before us. These were the best choices that they could make at the time they made them with the resources, materials, and knowledge they had at hand. By keeping that in mind, we can make assumptions and ideas about how we will create our code.
00:47:34.080 Code that will become legacy code for the next generation to come past behind us, and we do not want them to curse our names as strongly as we curse those who came before us, right? Seeking out an understanding of those who stood and made those decisions is a critical clue to unlocking the secrets of legacy code beasts and finally defeating them.
00:48:22.720 My name is, in fact, Kerri Miller. I’m a whole bunch of these things, but not necessarily in this order. I do everything on the internet at Kerri’s or GitHub’s and the Twitters and the whatnots. Cool! Well, thank you so much; this has been a lot of fun.
00:49:02.640 I have one more slide because I’d be remiss if I did not have a bonus pony slide for the bronies in the audience. Yes, my people!
00:49:10.080 Okay, well thank you very much!
Explore all talks recorded at Big Ruby 2014
+13