Talks

Here's to history: programming through archaeology

Here's to history: programming through archaeology

by Eleanor Kiefel Haggerty

The video titled "Here's to History: Programming Through Archaeology" presented by Eleanor Kiefel Haggerty at RailsConf 2018 explores the parallels between programming practices and archaeology. The essence of Haggerty's discussion is centered around understanding the coding process as an archaeological venture where the history of code and development decisions are preserved like artifacts in an excavation site.

Key Points:

  • Introduction to Archaeology and Programming:

    Haggerty begins with a playful introduction, connecting her background in archaeology to programming. She emphasizes how, just as archaeological records reveal human culture, code commits preserve the decisions of developers.

  • Human Intention and the Archaeological Record:

    She discusses the importance of recognizing human intention in both archaeology and programming. Codebases can reflect individual programmer's styles and intentions, akin to how artifacts reveal the behaviors and choices of past societies.

  • Understanding Context and Stratigraphy:

    Haggerty draws a parallel between the stratigraphy of archaeological sites and the commit history in version control systems like Git. Commit logs serve as historical snapshots, allowing developers to piece together the 'why' behind programming decisions.

  • The Importance of Clear Commit Messages:

    The speaker reiterates the significance of maintaining clear and informative commit messages in code history. She cites examples from her work where a lack of context led to confusion, just as archaeological artifacts can lose their meanings without proper documentation.

  • Case Studies in Debugging:

    Haggerty shares personal anecdotes illustrating how tracing back through commit history is similar to archaeological excavation. She describes issues encountered while changing code, emphasizing how good commit messages and context helped identify and resolve problems efficiently.

  • The Relationship Between Technology and Humanity:

    The talk concludes by reflecting on how modern programming is a continuation of human adaptive behavior. The development process should be treated with pride, and decisions made in code should be seen as valuable contributions to the heritage of technology.

Concluding Thoughts:

Haggerty encourages programmers to take pride in their work, acknowledging that every line of code is not just a technical decision but a historical artifact that contributes to the archive of human technological development. She champions the idea that understanding the past—both in archaeology and programming—can inspire better practices and meaningful contributions in future coding endeavors.

00:00:11.000 Hi there! I'm going to take you on a bit of a trip to the past today. Hello, this is a greeting in ancient Greek, and that's hello in Australian.
00:00:18.210 Now, ancient Greece might seem like a long way away, and well, that’s because it is. I've traveled a really long way to get here today, all the way from Australia.
00:00:30.119 I’m from Melbourne, which is at the bottom. Some people think it's a little upside down on the other side of the world; maybe it is! Many of you know Australia for our wildlife.
00:00:46.590 These are regular photos my mom sends me. If you'd like regular Python updates, feel free to follow me on Twitter. Not about the language, I’m afraid—generally more related to the reptiles.
00:01:04.010 Australia is also known for its other animals, like kangaroos. This is a marsupial known by something that young children like to ride to school.
00:01:17.400 We also have this little thing called a platypus, also referred to as a duck-billed platypus. What makes it so interesting is that it's a semi-aquatic, egg-laying mammal.
00:01:30.479 That means it lays eggs instead of giving birth to live young, and it can also produce milk. I think they're actually one of the only animals that can make its own custard—that's kind of weird!
00:01:42.260 And this is a crocodile, and it looks just adorable! This is really cute.
00:01:55.370 I work at a place called The Conversation. Most of our small team of developers is based in Australia, but we have some in London and one in Brazil. We also have 150 editorial staff spread around the world for the eight different regional editions that make up The Conversation.
00:02:02.870 The Conversation is a non-profit, independent source of news and views and academic news. Basically, our editorial staff works with academic experts to report on a huge range of topics: economic trends, politics, climate issues, and cultural reviews.
00:02:20.180 One of my favorite parts of The Conversation is called 'Fact Check,' where we strive to tackle misinformation and test claims made by political leaders against the evidence they present. This is particularly important in our current political climate.
00:02:40.730 We have an on-site readership of 10 million and 30 million readers through republication each month. Currently, there are 1.1 billion range records in our data warehouse, and we're continuing to expand. And yes, we do have a U.S. edition. I promise we don't publish fake news.
00:03:00.320 But I wasn't always a programmer. It was just over two years ago that The Conversation gave me a wonderful opportunity to learn on the job. Before that, I had very briefly dabbled in Ruby and written a few small programs here and there.
00:03:20.420 I generally spent my time working with artifacts in antiquity and heritage museums, as well as studying classical languages. This is generally what you first learn when you learn a modern language: here's the 'Hello, World!' in Ruby.
00:03:39.739 And here's what you first learn for classical Greek.
00:03:46.030 The Greeks did tend to be a little dramatic. This is from Homer's 'Odyssey,' if you're interested.
00:03:51.670 When I first began programming, one of the first things I learned about was the Law of Demeter. I remember this specifically because I thought it must have been a coincidence that something in programming, and something prevalent enough to learn early on, could have ties to Greek mythology.
00:04:10.000 Now, the Law of Demeter was named for its origin in the Demeter project, which was named in honor of Demeter herself. Demeter was, in Greek mythology, the goddess of agriculture and fertility, known for her association with sacred law and legislators.
00:04:28.120 And it's pronounced 'Demeter,' although this does depend on which dialect of Greek you speak. Thanks to Demeter, the connection between archaeology and classics became clearer.
00:04:39.850 Decoding didn’t seem so dramatic anymore—all the links between these two disciplines, which may seem tenuous at best, suddenly made sense.
00:04:47.020 Over time, I found that there are some really strong parallels between the two fields, and now I'm beginning to identify many facets of programming with history.
00:04:59.719 For example, Kubernetes comes from the Attic Greek word 'kybernētēs,' meaning 'to steer' or 'to govern,' and frequently referring to a helmsman or captain. With Kubernetes, you might be most familiar with deployment containers.
00:05:22.090 There's also the project by Aristotle, which was a study by the People Analytics team at Google that tried to determine what makes a team effective, taking its name from Aristotle as a tribute.
00:05:40.419 This project loosely translates to 'the whole is greater than the sum of its parts,' meaning we achieve more if we work together. Then, of course, there's Zeus, the god of the sky and thunder and ruler of the gods, which is also an IDF (Integrated Development Environment) in programming.
00:05:54.709 So, although we don't realize it, we are surrounded by history. I'd like to go a little bit deeper today, and talk about human intention in the archaeological record and what this means for us as programmers.
00:06:06.229 We'll discuss stratigraphy in context as well as the technology itself. Although you've already had a small taste of history, there will be a few lessons along the way.
00:06:20.989 How often do you dig through a commit history, peeling away layers of complexity and sifting for clues, trying to answer: why does this code do what it does? You're seeking to separate the important from the unnecessary or irrelevant, only to be stopped in your tracks.
00:06:42.429 You're derailed by historical debris: an ill-defined method with an amusing but uninformative commit message. We've all been there, and I know I certainly have.
00:06:55.129 Every day we write code and commit it, and those commits, for better or worse, preserve a piece of history: our reasons, our approach, and often our emotions. This history lives on throughout the life of a project, yet the whole story—the reasoning behind certain decisions—is rarely clear.
00:07:15.709 The obvious decision may not always have been the best one, and when we finally manage to peel back the complexity, we sometimes find ourselves in a worse state of confusion than where we began.
00:07:22.219 So, how can we make sense of this? To me, the answer is clear, and it lies in archaeology.
00:07:33.799 What is archaeology? What does an archaeologist do? Thanks to popular culture, you may have thought that maybe one of these represents archaeology, and that's totally okay.
00:07:42.280 However, I'm afraid that these are common misconceptions. Archaeology is not about dinosaurs, it's usually not piles of gold, and it's definitely not Indiana Jones busting through booby traps. Adventure stories like these, prevalent in popular culture, ignore the painstaking work involved in carrying out excavations and analysis.
00:08:05.410 Archaeology is instead a study of cultural history, material culture, people, and the traces they left behind. It is a never-ending detective story, and often, code is also a never-ending detective story.
00:08:23.680 Yet, archaeology is also the science of the past, using observations and evaluations to test ideas and theories about what happened in antiquity, providing us with fascinating and frequently beautiful windows into the past.
00:08:31.240 I know this sounds rather romantic—a window into the past, to interpret as you will—but it's important to remember that the archaeological record is a distorted version of past events. Artifacts are broken, buildings are burned or collapsed, and food arrays are usually only partially preserved.
00:08:52.270 If you're lucky, not all sites—and in fact, hardly any, with the exception of Pompeii—are perfectly preserved. Once archaeologists recognize that these processes influence preservation and evolution of a site, they are able to look for reliable ways to reconstruct past human behavior.
00:09:05.800 So let's talk about human intention in the archaeological record and what it means for us as programmers. One of the most frequently heard clichés is that you cannot see the individual in the archaeological record, but it does contain direct evidence of individual action and human intention.
00:09:36.250 For instance, the digging of a rubbish pit, the construction of house foundations, or the scratchings on a piece of ceramic to exile your least favorite politician. We can often see such individuality with just a glance at a code base.
00:09:54.360 We can identify how certain tests have been structured a certain way or look at the timestamp on a commit. Yet, when I first began programming, and frequently when pair programming, I’d hear my colleagues exclaim things like, 'Oh, that looks like something that Margaret would write,' or 'This class has James all over it!'
00:10:06.390 This is how I saw it: I couldn’t see the individual in the code. I could not see the quirks that made one piece stand out as being written by one person instead of another. I could not see intent, purpose, or individuality, and I couldn't fathom how my colleagues could.
00:10:41.370 Now, for those of you who have been programming and working with others for a long time, this might seem like a very small thing, but the realization that you could actually see the individual within lines of code reassured me.
00:10:51.480 It reminded me of how I would search for traces of individuality from antiquity. There's this beautiful Greek word 'praxis,' the main discussion of which comes from Aristotle.
00:11:09.990 Aristotle was a philosopher in the 4th century BC, at the height of classical Greece. He was a student of Plato and tutor to Alexander the Great.
00:11:24.060 Aristotle’s politics and ethics treated praxis as an activity performed for its own sake—an activity which is undertaken as a realization of the intrinsic capabilities of the human psyche.
00:11:36.390 Now, what I love about this word 'praxis' is its innate reflexivity. You, as the actor, are the one making the decisions when you write a method or a class model. You are making a decision to write it a certain way.
00:11:50.550 In archaeology, I like to think about praxis as gaining knowledge about the world—the realization, in a physical sense, through excavation. The result is an examination of the relationship between humans and societal structures or an interpretation of these relationships.
00:12:01.860 So, let's think about interpretation as constructing a story of past behavior—a story that has potential for alternate explanations. Think back to what I said earlier about archaeology being an interpretive window into the past.
00:12:20.010 We don't just find artifacts; we describe, identify patterns of behavior, and put them in time and space in ways that best suit our perceived interpretation. We write code that fits into particular times and particular spaces.
00:12:40.540 We write certain patterns of behavior, certain language conventions, team conventions, or personal preferences, and we are writing code for the future—to potentially come back and excavate to interpret.
00:12:56.870 Common human behavior is to take shortcuts. Many people do it all the time, and as programmers, we frequently do too because we're human. But there are many reasons why we may need to write a particular piece of code a certain way.
00:13:30.540 You might be under time pressure at work or the code itself might be forcing something to be a certain way. It can be easy to forget these things when you are the one looking back, trying to reconstruct this other story.
00:13:46.860 You might not know how to interpret it—what the reasoning was, the meaning, the story behind the code. Sometimes we forget that there are alternate explanations and interpretations for things.
00:14:09.710 Different circumstances require different actions, and acknowledging these actions can help guide understanding of the idiosyncrasies that form the whole picture.
00:14:22.029 This relationship between praxis and excavation, between interpretation, puts individual intention and desire into the picture.
00:14:27.880 These traces of individuality that we leave behind in our code personify it, and they express our human decision-making. In the same way that the individual is present in the archaeological record, so are we in the designs we create.
00:14:40.959 Okay, so let's talk about context now—context and stratigraphy. As programmers, we are lucky to be able to create a history that remains flat and readable. We have the luxury to rewrite and sanitize history in pursuit of a clean history.
00:14:55.520 Manipulating a single commit is easy, but it's essential to be aware that history isn’t polluted, and context isn’t lost. Think about the ability that gives us to view historical snapshots of a codebase at any point.
00:15:16.060 An archaeologist rarely sees more than a single reference frame at any one point. Portions of sites are uncovered, everything is recorded as data, and new reference frames are revealed.
00:15:38.220 The first layer is forever destroyed by the virtue of the second being revealed, so let's describe this in a programmatic sense from a developer's perspective.
00:15:53.590 In archaeology, these reference frames are called stratigraphy. Just as photography provides a fundamental basis for understanding chronological relationships in the archaeological record, think of your git log.
00:16:05.300 When you type 'git log,' you've got your commits in that repo made in reverse chronological order—a sequence, a story from end to start with the most recent commits first.
00:16:25.800 To understand the why of how events occur, we can use chronological relationships from the archaeological record to transition from static material to dynamic behavior.
00:16:39.490 Archaeological sites are formed in complex ways; they are not instantaneously formed and preserved, with the exception of Pompeii. The archaeological record forms as a cumulative record, much like a timeline but with a few more layers.
00:16:57.570 Let’s take a look at the Temple of Apollo at Corinth. The temple itself has had a tumultuous history, much like most of ancient Greece. Periods of prosperity were followed by seemingly unending wars and conflicts.
00:17:17.100 It was ransacked by Romans and razed to the ground in 146 BC. Its walls were dismantled, and the territory was given to a neighboring city. Corinth started again and rebuilt, but almost all that remains is the Roman version.
00:17:37.220 So how do we date what predates the Roman era? We can do that with context! The context is the place and associations of artifacts; it's the relationships we can infer from associations.
00:17:57.300 It is a precise location where an object is found and recorded before it is removed from a site. The architectural features of the temple place it in the archaic period, broadly from the 8th to the 5th century BC.
00:18:17.610 Yet the only external evidence for the dating of the temple comes from the middle Corinthian crater, which is a pot. This was found among the chips of stone lying between the cuttings for the foundation walls of the temple.
00:18:32.060 This means that the Temple of Apollo can be accurately dated to the middle of the 7th century BC, around 625 BC. We can understand the why of the temple but not entirely how we know it was dedicated to Apollo.
00:18:53.130 Not much of the site remains, and there are only two features that allow us to ascertain that. Once we accurately date it, how can we determine its dedication?
00:19:07.690 In archaeology, we rely on context and association. Just quickly, this is Apollo, the son of Zeus and Leto. He has been recognized variably as the god of light, truth, prophecy, healing, music, and poetry.
00:19:20.470 He was an oracle god and considered the leader of the muses and the patron god of music. The pigeons of Apollo were quite common throughout the Greek and Roman worlds.
00:19:32.740 It's easy to imagine that the iconography of a temple dedicated to such a god would be prolific. But remember, what we see today is not an accurate reflection of how things were in the past.
00:19:42.090 The archaeological record forms in complex ways, just as the current iteration of our code is not what it was a week or a month ago.
00:19:57.480 We know Corinth was sacked by Romans and razed to the ground, and while a few surviving fragments couldn't have been sculptures or dedications, they offer little indication of what cult it was dedicated to.
00:20:12.740 But just north of the temple, a deposit of alloy was excavated — these were perfume or oil flasks, and they are generally considered suitable dedications to Apollo.
00:20:26.130 Pausanias, a Greek traveler and geographer, visited in 175 AD, after the Roman description of the city. He described a temple dedicated to Apollo located in the exact same spot where we find it today.
00:20:41.860 It is the complementary evidence of written and archaeological records, and the context of items within the record, that has allowed archaeologists to date and dedicate the temple.
00:20:56.190 And without that evidence, we still might not have a date for or a dedication. Just like working out the size of a class is often impossible until you can see how and where it’s used, understanding where something has come from is the main challenge of archaeology.
00:21:11.180 And we're lucky to have this ability—the ability to decipher where things belong. This innate desire to belong has been a driving force throughout human history; we want to know where we've come from and how we got there.
00:21:31.890 It's often context and associations between us, between artifacts, and between code that allows us, in the future, looking back, to decipher such belonging.
00:21:52.580 From the late 1800s to the early 1900s, there was a debate raging in anthropological, scientific, and cultural circles in North America: when did humans first arrive in North America? At that time, the generally accepted timeline for occupation was between 9,000 and 8,000 BC.
00:22:14.370 But then, in the 1920s, a stone spear point was found lodged between the ribs of a bison. This was significant, as this species of bison had been extinct for thousands of years. It had gone extinct at the end of the last ice age in the Pleistocene.
00:22:32.830 The Pleistocene was the geological epoch that lasted from about 2.5 million years ago to 11,500 years ago—it is the first epoch of the Quaternary period, between the Pliocene and the Holocene epochs, and it corresponds with the end of the Paleolithic Age.
00:22:51.860 The stone point is characterized as a Folsom point, which has been found widely across North America. It is the context—this direct association between the bones of an extinct species and a spear point crafted by human hands—that conclusively proved that humans were in North America during the last ice age.
00:23:10.750 This evidence indicated that humans arrived thousands of years earlier than previously believed. So remember, archaeologically, context has to do with place and association among artifacts and the relationships we can infer from such associations.
00:23:29.230 Context allows us, as programmers, archaeologists, and historians, to boot up webs of associations. Think about methods—all around defining behaviors so that you can apply them easily to different situations.
00:23:39.540 Or consider how you use modules to group and add text. Not all knowledge about a site can be found in its history, issues, or pull requests.
00:23:49.900 Removing an artifact from a site without properly documenting it means it has lost its context, age, usage, and perceived meaning. It has little to no scientific or cultural value.
00:24:10.980 I recently ran into a surprising issue at work where I had to make a small change to an error message: it was simple and easy.
00:24:20.490 I checked my change in situ, and while I could trigger the action to show the message, I wasn't seeing the error. The logs showed me a 404 error, while I was expecting a 403.
00:24:35.370 So, I flicked back to master, and while I could confirm my change wasn’t the issue, somewhere in the past—god knows how many years back—the status code changed from a 403 to a 404.
00:24:52.300 I had a regression test, and then I began to bisect. After I’d gone back a year, I had to bring out 'git blame' for context.
00:25:05.370 I definitely recommend using an alias for 'blame.' It didn't take long to realize that the fake issue was introduced all the way back in 2012.
00:25:19.960 So, where did it go wrong? I grabbed the hash of the commit that introduced the feature and checked it out—suspecting it was an authorization error.
00:25:29.880 I did 'git log' on that five-year-old commit, to look for relatable and suspicious commit messages or branch markers around the time it was introduced.
00:25:47.990 I had gone a week or two back, and this suspicious little commit caught my eye. It isolated the authorization issue that had been broken for years.
00:26:05.580 Once I isolated where the problem originated, implementing a fix didn’t take very long at all. This small little change snowballed into an epic excavation of the codebase, leading me all the way back to the beginning of its history.
00:26:24.480 Mid last year, The Conversation launched into Indonesia as its fifth region and third language. The first language was English, the second was French, and the third was Indonesian.
00:26:44.890 Preparing a platform to support multiple languages is no small feat, and it requires continuous realization, planned changes, and import from translators. The list goes on and on.
00:27:01.890 But the Indonesian launch went really smoothly. We were happy, Anita was happy, and readership from Indonesia continued to grow.
00:27:19.420 However, we began to see Indonesian leak into aspects, which was noticeable in acceptance specs using rack tests where the tests and app code would run on the same thread.
00:27:38.890 A spec that ran at some point earlier would change locale instead of using the fallback English, and it was just a few here and there, so we didn’t worry much about it at the time.
00:27:53.800 We had more important things to do, until it became noticeable. Pretty soon, our build hygiene began to deteriorate. Enough was enough.
00:28:07.780 I pulled out my `git bisect`, and unlike the previous example, I had three specific pointers to guide me: I had a timeframe of when the issue began, I knew where the problem was isolated in acceptance specs.
00:28:26.290 I also knew that the issue had to do with i18n (internationalization). These three pieces of information meant that isolating and identifying the problem didn't take long.
00:28:44.750 But, like before, I relied on a good commit message and history to guide me. So while these are both fairly standard workflows to identify problems like these, such changes and solutions appear straightforward.
00:29:01.000 Just a small change to an overview may seem simple or easy, until suddenly you can't trigger an exception or see a change in situ.
00:29:17.590 This is why leaving developers with good context is important. Otherwise, you might have a method whose name doesn't accurately reflect its purpose, or an unhelpful commit message.
00:29:35.300 You aren’t able to decipher what's going on, leading to the loss of the whole story or context. Your code becomes an undated, unexamined pot on a dusty shelf in a museum.
00:29:58.270 It is a temple without a fragment, a spear point lying alone without any idea of its purpose. Changing code like that is dangerous.
00:30:11.080 Isolating and identifying those few problems didn’t take much because I'd been left with really good history. I had clear and concise commit messages.
00:30:36.140 While I was lucky to know what sort of changes to look for, it was still immensely helpful. I would have taken a lot more time trying to find where the change was made if the codebase was filled with vague terms like 'fix,' 'whip,' or 'oh dear'—those can be a bit tricky.
00:30:54.730 Although team conventions may differ, my team generally follows GitHub's recommended guidelines: a commit message should be a short 50 to 75 characters summary.
00:31:10.680 It’s followed by a blank line and more explanatory text below it, noted in the imperative form. The summary should complete the sentence 'This commit will…'.
00:31:34.460 It's important that commit messages don't rely too heavily on assumed knowledge because you can't always assume that the code itself is self-evident of the original problem.
00:31:52.520 This is why context is important, both archaeologically and programmatically.
00:32:06.039 But all is not lost! Remember, as developers, we have the ability to create a history that is flat and readable, and one we can sanitize and improve.
00:32:22.350 To finish, let's talk a bit about technology itself. We work with technology every day, and it’s fascinating.
00:32:35.690 Web technology itself comes from the Attic Greek feminine noun 'techne,' meaning skill, craft, or art. Aristotle had a focused but simple and restricted concept of 'techne.'
00:32:56.330 He defined it as a rational faculty exercised in making something—a productive quality. The suffix 'ology' comes from the Greek 'logia,' meaning to study or collect knowledge.
00:33:10.680 We can then trace that to 'logos,' a masculine noun, or 'lego' as a verb broadly meaning discourse, expression, history, thought, and interestingly enough, reasoning and computation.
00:33:32.350 So we can be fluid with our interpretation of technology. I like to think of it as an outwardly expanding yet nested set of actions and relationships involving human intent and desire for making and creating.
00:33:56.090 In a broad sense, technology has defined us as a species—not just because we use tools, but because we have all interacted with some form of technology.
00:34:10.800 This reliance on technology creates unprecedented complexity, leading to a confidence in technology that can solve various problems and a dependence on it for mundanity.
00:34:26.570 This negative side of technology is becoming more evident. So let's look to the past.
00:34:40.420 Archaeology focuses on technological changes as adaptations to problems—problems like population shifts, needs for stone technologies, and manipulation of natural materials through to computers.
00:35:01.890 Programming is largely an adaptation to problems, and code is a way of fixing them. Humans throughout history have taken pride in their technological achievements, and we programmers are no different.
00:35:18.920 It's important to take pride in what you do, no matter how small your contribution may seem. Take pride in your code, your intentions, your decisions, your commits—your history.
00:35:35.080 Thank you for your time!