Here Be Dragons

Talks

Here Be Dragons

Katrina Owen

#software-maintenance

#ethics

#collaboration

Here Be Dragons

by Katrina Owen

In the talk "Here Be Dragons" by Katrina Owen, presented at BathRuby 2015, the speaker addresses the complexities and moral dilemmas developers face when maintaining legacy code and the need for responsibility in software development. The central theme revolves around the inherent entropy of code and the challenges of balancing immediate fixes against potential future consequences. Owen shares her experience with a particularly mundane bug in a gossip-sharing application, illustrating how seemingly minor issues can reveal deeper problems within a codebase.

Key Points Discussed:
- Encountering a Bug: Owen recounts a personal experience fixing a bug related to email notifications in a gossip-sharing application, which evolved into an exploration of the deeper issues within the code.
- Moral Responsibility: Despite being able to walk away after making a quick fix, Owen emphasizes the ethical responsibility developers have towards their code, driven by a fear of unaddressed issues.
- Code Quality Examination: The talk critiques the lack of testing, dead code, and confusion caused by poor documentation and coding practices in the legacy code.
- Impact of Defective Code: The speaker argues that when good developers experience rushed timelines and unclear requirements, it often leads to the creation of lower-quality code, emphasizing situational pressures over individual failings.
- Cooperation vs. Defection: Using a game theoretical approach, Owen discusses how decisions in code quality can be seen through cooperation (striving for better coding practices) versus defection (neglecting best practices), ultimately influencing the overall quality of the software.
- Conclusion on Software Development: The talk concludes with a call for developers to recognize their contributions to code quality, urging them to consider each commit as a choice between cooperation and defection in the ongoing effort to improve code standards and reduce entropy.

Main Takeaways:
- Developers have a moral obligation to confront and improve legacy code, instead of ignoring underlying issues.
- Choices made during development have long-term impacts, and fostering a culture of code quality can lead to more stable and maintainable software.
- The accountability lies not just in fixing immediate problems but also in understanding the broader implications of our coding practices.

00:00:23.180 Every once in a while, I come across a bug that I can't help but admire. Bugs like these consume me; I forget to eat, and I have trouble sleeping. It's exciting and stimulating, and when I figure out the devious ways in which they cause the system to fail, I feel victorious. Princess, one dragon, zero.

00:00:54.320 However, this particular bug was not like the others. This was the mind-numbingly boring kind of bug—the kind that causes all of the other items on my to-do list to suddenly seem urgent. The bug was in an application whose sole purpose was to allow villagers to post gossip on the web. Who got hitched? Who had a baby? Who has a birthday coming up? The gossip would then get piped straight into the village rag's production system so it could be printed on actual paper.

00:01:11.190 The broken feature was one that allowed Bob to tell everyone that they should come gossip about it when Alice posted a picture of her new baby. In other words, the broken feature was the ability to gossip about the gossip. The full bug report read as follows: I tried to share by email, but nothing happened. I opened the application, put in my email address, clicked the button, and lo and behold, the villager was right—nothing happened. I activated the JavaScript console and tried again, which resulted in a 500 error. The JavaScript was trying to make an asynchronous call to an endpoint inside the application, and the response that came back in the console was a wall of text formatted as HTML.

00:02:02.520 I copied out the call with all the parameters so that I could call the endpoint using curl, redirect the output into a file, and then open it in the browser. This showed me a stack trace that was exactly one line long. The execution had not even reached the controller action; a before filter exploded while trying to send a message to nil. A quick investigation revealed that the before filter was completely irrelevant, and we could simply bypass it altogether. So I was now expecting to get a 200 OK response, a snippet of JSON, and an email, but instead, I got another stack trace formatted as HTML. This time, the code managed to get all the way down to the email template before it blew up due to something related to translations.

00:02:58.180 In the stack trace, the entire feature was outlined. It starts in the controller in a method named 'share by email or SMS.' This method calls out to a module called 'sharing,' which has a method called 'send to recipients,' presumably a collective term for email addresses and phone numbers. The next stop in the stack trace is 'send to email' in the same module, which further delegates down to an action mailer that sends an email using a text template that is internationalized and interpolates strings from a locale file. The error message indicated the exact line of code that was causing the explosion. The internationalization function was failing due to fancy string interpolation that took four arguments, while the ERB template was apparently only sending in three of them. Through a process of elimination, I determined that the missing key was 'tidbit.sender.' I patched the ERB template and then kicked off the email.

00:05:02.240 At this point, I did receive a proper response and the email arrived in my inbox, so my work here was done. Management would be impressed, but the truth is I did what the French call 'le minimum sans chicane.' This term mocks the bare minimum and pokes fun at bureaucracy for setting standards so low that you literally could not do worse because at that point, you wouldn't have done it at all. Now, having fixed this feature, I was sorely tempted to tiptoe away. I knew that if more bugs were discovered a few months down the line, I wouldn't be in trouble because my name isn’t even in the commit history for the feature, aside from the two lines I just touched.

00:05:54.290 Still, I was unable to turn my back on this code. I'd like to pretend that the reason I couldn't leave it alone is that I am a good person, but the truth is I am driven by fear. The code might be decent, or it could be a stinking quagmire of rot and decay, and without at least glancing at it, I can't know. I fear stinking quagmires of rot and decay; I dread getting sucked into them and being thigh-high in mud and leeches while fighting off stack traces with one hand and wrangling boolean gates with the other.

00:06:06.110 The last thing I want to do is wade into the swamp, yet I fear that it's going to cause me X hours of pain—let's say three. If this thing blows up in my face one night, then I'm very likely to spend those three hours getting it wrong, which could lead to three to four days of hell and pain writing scripts to parse access logs to put lost orders back into the database. Guess how I know this? It's this fear that motivated me to take one tiny step—to eyeball the code, look at it, and make an evaluation about how likely it was to blow up later.

00:06:28.060 Before we look at the code, I'd like to take a moment to talk about tests. Moving on, in the controller in the 'share by email or SMS' action, the first thing that caught my eye was a constant that I had never seen before: 'tinyurl.' It wasn't present in the stack trace, nor did the email I received contain a shortened URL. The logical explanation for this is that it is only used in the SMS portion of the feature, and it is true that we are building the text message for the SMS here in the controller on every request, whether or not we've received any phone numbers to actually send this SMS to.

00:07:00.830 However, the text message also does not contain a shortened URL. In fact, once we've generated the tiny URL and assigned it to this local variable, that variable is never referenced again. Let's take a look at the tiny URL module. We have a base exception class, and we also have an exception class. Incidentally, the only thing the base exception class is used for is to define the exception class, which in turn is used exactly once. This is a bit cryptic, but the method name does a good job of explaining the comment. This is both cryptic and misspelled, but the method name explains what's going on. This is the core piece of tiny URL.

00:08:02.089 You don't have to read the code; just notice that we are calling out to an external service over HTTP every time we call the 'share by email or SMS' action. We call out over HTTP and then throw away the result. The author of this code has thoughtfully added caching, so we might not actually make the call on every request. Besides, it only happens in staging. In production, you'll notice that the case statement always returns. Therefore, the last line of the method will never be reached. Additionally, you can see that the case statement has exactly two outcomes: true and false. The caching mechanism manages to use both a module variable and a module instance variable in the same breath.

00:09:02.630 It turns out that the tiny URL module is not used by any code anywhere ever, and so it can safely be deleted. The code has been racking up expletive points pretty quickly. You know, expletive points, or XP, if you're American, are awarded for poor code quality. We found two bugs where only one was expected. Bugs are known to congregate, so there are likely more of them. There is dead code, a gratuitous case statement, and the code uses both a module variable and a module instance variable. The comments either state the obvious or are incoherent; we are building up a text message even when we're only sending emails. The entry point into the feature is a method named 'share by email or SMS.' Obviously, the method does two things that are so unrelated that we can't even use that in the method name.

00:09:54.710 There are no tests. The code is calling out over HTTP and never using the result. So we've accumulated 72 expletive points. But wait, there's more. An 'unless' else is being used to determine the response for this action, and it's switching on a flag called 'invalid_name,' giving us a double negative for the happy path. The 'invalid_name' flag itself is kind of spectacular: a name must not exceed 40 characters in order to be valid, yet the significance of the number 40 is unknown. Out of curiosity, I kicked off the curl command with an invalid name, and I received a reasonable error message and an email.

00:10:36.340 The process goes like this: having determined that the name is invalid, we send out the emails and text messages, and then return an HTTP 400 Bad Request. Now, you'll notice that nil is totally legit, and it seems likely that hitting the controller action without a name would result in an email being sent out. Instead, we get a 400 Bad Request, and the error message tells us that they've already been told. Obviously, it turns out we don't get an email because the 'send_to_recipients' method has a guard clause against nil names. Let's see what else the sharing module has to offer.

00:12:02.030 The 'send_to_recipients' method is very complicated-looking; fortunately, there’s a comment explaining what the method does. Unfortunately, every single part of that comment is wrong. There is no 'message' key; the hash key is called 'SMS.' Additionally, there is no 'email_recipients' key; it's just 'email,' and there is no 'SMS_recipients' key; it's called 'lumpers.' You might think that it can't be more wrong, but even if all the keys were named correctly, the comment would still be wrong because we are not sending an SMS to emails and numbers. We just fixed a bug in the email template that the action mailer uses to handle the email portion of the feature.

00:13:37.980 The method contains code to clean up the text message, but that code never gets called because, if the hash never has a 'message' key in it, it then cleans up the name. This is reasonable, aside from the fact that the controller validated that the name was less than 40 characters long before it passed into this method. So having cleaned up the name, it could now be shorter and therefore valid. Then it goes ahead and validates the length of the name, this time with a cutoff of 41.

00:14:42.750 Now, once we've gotten past all the validation and cleanup, we get to this block. We combine emails and phone numbers that came in separately into a single array, and then we iterate over the whole lot and use regex to figure out whether we have a phone number or an email. We can finally dispatch the message using the correct protocol.

00:15:04.500 Now, according to the comment, the 'send_to_email' method sends emails, but the comment is only partially correct. This is quite confusing, so I'm going to illustrate it with an example: Alice posts a picture of her new baby, and Bob goes to the website and enters Charlie's email address to tell him about it. The tidbit was created by Alice. If we don't have Alice's email address, we will skip blithely past the part where we actually dispatch the email to Charlie, whose address we happen to have because Bob just gave it to us.

00:15:43.490 Incidentally, it's quite likely that we don't have Alice's email address because, in this application, you sign up with your cell phone. Now, we do prefer to report good news. Finally, in the other branch of the conditional, we send the SMS message, which, according to the comment, sends SMS. What do these codes say about the developer who wrote it? It's easy to criticize, and a whole lot of fun.

00:16:14.730 It's tempting to believe that whoever wrote this feature is stupid, a bad developer, or at the very least, a bad person. This is known as the fundamental attribution error. When somebody writes terrible code, we make assumptions about why that is. At one end of the scale, we attribute bad code to individual traits—suggesting that the person isn’t skilled enough or doesn’t care enough. On the other end, we assume that it's situational—a momentary lapse due to contextual pressures. So on one end they're a bad person, and on the other end they’re just having a bad day.

00:17:31.480 The forces that push us around from moment to moment are invisible. They're obvious to us in our own lives; we cannot help but notice them because we're in the thick of it. Those pressures are hard to see from the outside, and the further you are from the situation, the harder it is to imagine them. The result is that people we don’t know are bad developers, while we and our friends are just having a bad day.

00:18:30.430 So how does an experienced, smart developer like the author of this 'share by email or SMS' feature end up with a mess like this? Most likely, what happened to them is what happens to all of us: rush jobs, rapidly changing requirements, and then again, who knows? Maybe there was no deadline; maybe they understood the code perfectly; maybe they were just bored or drunk.

00:19:10.710 If you strip away all the fuzzy, muddled humanity of it all, it really comes down to an algorithm based purely on rational choices. Imagine a game where player 1 and player 2 may either cooperate or defect. If both players choose to cooperate, they both get three points—a reward for mutual cooperation. If both players choose to defect, they get one point each—a punishment for mutual defection. If player 1 cooperates and player 2 defects, then player 2 gets five points known as the temptation to defect, and player 1 gets zero points, known as the sucker's payoff.

00:20:20.890 In terms of software development, cooperating means whatever it means to you and your team; it could be taking the time to write tests or making sure that your methods are short, that their names are expressive, and that you've deleted all the trailing whitespace. Defecting is not doing those things—like committing with no thought to the next person who will see it. Here’s the thing: if player 1 knows that player 2 is going to defect, then it is in their best interest to defect as well; both players then get one point.

00:21:23.290 On the other hand, if player 1 knows that player 2 will cooperate, then it is in their best interest to defect because they’ll get five points, while player 2 will get none. The rational choice, then, is always to defect. Yet in doing so, everyone loses. Notice that this is not a zero-sum game; the players do not have strictly opposing interests. The only way to attain the highest combined score is to cooperate; if everyone looks out for themselves, nobody wins.

00:21:53.900 And of course, this is overly simplistic; it draws out a world where strangers meet once and never interact again. The more interesting question is when to cooperate and when to defect in an ongoing relationship with another person. This arithmetic of cooperation maps disturbingly well to the costs we pay to deliver software.

00:22:08.490 When everyone on the team cooperates, the team delivers good software with a reliable cadence. Everyone looks good, and everyone can be productive. If one person consistently defects, that person appears efficient to outside observers; they seem to be checking off features on the backlog and fixing bug after bug. Never mind that they most likely introduced a number of those issues in previous fixes; they seem productive, delivering heroic saves quickly and reliably.

00:23:05.850 The rest of the team slows down, both from trying to understand the code produced by this person and from having to shore up, rework, and rewrite it whenever they need to make changes. The time it takes them to make a change increases. If the whole team defects, there’s an initial rush where features are delivered rapidly, but we quickly hit a point where each change takes longer and longer to make, until we reach a pathetic but stable equilibrium where progress inches forward in an endless, painful struggle.

00:23:44.500 Humans are biased towards cooperation; we will readily cooperate even if doing so makes no rational sense. Yet, in being irrational, we will gleefully defect when we have no incentive to do so. Sometimes it’s easier to just not—neglecting to write that test or rename those variables. We want to do the right thing, but good intentions are meaningless when we're tired or distracted, and we decide that just this once we’ll commit and move on with our day.

00:23:56.560 Every commit we make tips the balance ever so slightly in one direction or the other—an inch toward entropy or an inch toward increased order. Perfection is unattainable and, ultimately, irrelevant. If most of the commits you make are shifting the balance toward increased order, then no matter how slight that shift is, you will have a codebase that constantly improves.

00:24:26.450 So, I urge you to ask yourself before you commit: am I cooperating or am I defecting? Thank you.

BathRuby 2015

Explore all talks recorded at BathRuby 2015