Error Handling

Software Development Lessons from the Apollo Program

Software Development Lessons from the Apollo Program

by Julian Simioni

In "Software Development Lessons from the Apollo Program," Julian Simioni explores the fascinating intersection of software development and space exploration, particularly focusing on the Apollo program of the 1960s. Despite a lack of formal knowledge about software engineering at the time, the Apollo Guidance Computer (AGC) was an engineering marvel that successfully flew missions to the moon, built by a team of engineers navigating unprecedented challenges. Simioni conveys that while key figures in software engineering, like Fred Brooks, were just beginning to pen their ideas, Apollo's engineers were learning critical lessons about the field in real-time.

Throughout his talk, Simioni outlines several key points:

  • Origins of Software Development: The term 'software' was first used in 1950, but engineers at MIT were just discovering its importance and complexity while building the AGC. In the early stages, there were no deadlines or formal documentation, allowing for innovative experimentation.
  • Handling Failures Gracefully: Simioni emphasizes how the AGC was built to manage failures without drastic alerts, contrasting it with modern computing where a system crash often overshadows the underlying performance.
  • Rigorous Testing: The Apollo team employed extensive testing methods, particularly integration tests, to ensure various subsystems worked harmoniously, focusing on failure cases, which is still a lesson applicable in today's software development practices.
  • User Interaction: The story illustrates how astronauts initially resisted automation, wanting to retain their manual capabilities, yet ultimately benefitted from the automation the AGC provided. It highlights a common pitfall in development: users may not always know what they want.
  • Team Dynamics: The growth of the Apollo project to 400 engineers reflects challenges in scaling teams correctly and effectively communicating between various units—a core issue that remains relevant for today’s software teams.

Simioni concludes with the notion that the Apollo program teaches modern software developers essential lessons about the past, particularly about communication, problem-solving, and embracing user behaviors. These lessons transcend time and remain relevant in today's context of software development and engineering practices.

00:00:23 I'm Julian, and this is my first Rails conference. It's great to be here. We're not going to talk about Rails at all; we're going to talk about space.
00:00:30 This is one of my favorite pictures ever—it's the Earthrise picture from Apollo 8. My whole life, I've loved space. It has inspired me tremendously since I was a little kid, and I know it inspires tons of other people.
00:00:39 I got into a technical field because I loved space as a child. Raise your hand if the same applies to you. I've spoken to many people, and it seems to be a common case. Space is incredibly cool, and everything about astronomy or manned space exploration is both awesome and inspiring.
00:01:01 Above all else, the Apollo program is what inspires people so deeply. Landing on the moon was perhaps the greatest achievement that humanity has ever accomplished. People talk about impressive rockets like the Saturn V, which generated seven and a half million pounds of thrust upon takeoff—that's a lot of power.
00:01:18 They discuss astronauts who trained for years, demonstrated bravery, and took risks that most of us cannot even comprehend. They also refer to mathematicians and the remarkable things they did, such as the guidance equations that helped get people to the Moon safely and back. Then they say things like, 'Oh yes, we landed on the moon with less computing power than a wristwatch.' As a software developer, this realization saddens me.
00:01:51 For a long time, I didn't believe there was anything in the Apollo program that could inspire us software developers. Fortunately, it turns out that's not the case at all. The Apollo Guidance Computer was built to take people to the Moon.
00:02:13 To give you a sense of its scale, its display was a little bigger than your hand, about nine by nine inches. This device was built to make an impossible journey possible. It only had two kilobytes of RAM. In today's terms, that's not much.
00:02:35 It was incredibly slow. I did some back-of-the-envelope calculations, and it turns out that all the computation done on all the Apollo missions could be done by my little laptop today in under a second. However, my laptop may or may not make it through this talk.
00:02:54 The Apollo Guidance Computer never encountered an unknown bug in thousands of hours spent in space. Despite the harsh environment of space, it never failed. The hostile conditions included gamma rays and any metallic debris that could short-circuit systems and cause numerous problems.
00:03:11 This computer was remarkable; it functioned exceptionally well even during a time when much was still unknown about software development. Work began on it in 1961, and it was completed in 1968. To put this into perspective, in 1965, the MIT Instrumentation Lab—responsible for building it—acquired a new computer, the IBM System 360.
00:03:39 Does anyone here know who worked on that? Come on, everyone knows Fred Brooks, right? He authored the book 'The Mythical Man-Month.' That book wasn't published until 1975, long after we had landed on the moon.
00:04:07 Consider what that means: The software developers coding for the Apollo Guidance Computer were learning the same things as Fred Brooks at the same time. If you're a manager with software developers working under you and your boss tells you that your project is lagging, you might consider giving them more developers to speed things up.
00:04:25 Today, however, it's different. You might tell them to read 'The Mythical Man-Month' because you know that extra hands usually won't help. Moreover, the word 'software' was first used in 1950 in a research paper, but the context is crucial because early computers did one thing only.
00:04:44 They were built primarily by men, and those computers could perform a single calculation, like computing a Fibonacci sequence. If you needed a computer to perform a new task, you'd either modify the existing one extensively or build another. Over time, as computers improved, they became more configurable. Builders, often disdaining the tedious task of configuring, hired women for the role.
00:05:38 Only later did enlightened minds realize that configuring computers was not a simple task but a complex field of software development. Interestingly, during the Apollo program, some male engineers were reluctant to share the fact that they managed software teams because it was deemed uncool.
00:06:07 Here's my takeaway: If you are a woman in software development facing doubts about your abilities, ignore the naysayers. Your predecessors, including your grandma, were coding long before you and did a fantastic job. Conversely, if you're someone who claims that women are less capable of coding than men, please stop. Such beliefs are unfounded.
00:06:56 Now let's return to the Apollo Guidance Computer. How many of you have built a web application, perhaps in Rails, that relies on external services or APIs? Do you remember what happened the first time one of those services went down? I can almost hear the collective groan from broken systems.
00:07:37 After ensuring everything is operational again, it’s inevitable that another issue will arise, and you strive to ensure fixes prevent repeat scenarios. With time and maturation, ideally, your system becomes resilient enough to handle unexpected failures gracefully.
00:08:24 During the Apollo 11 landing, a critical moment occurred when Neil Armstrong and Buzz Aldrin approached the lunar surface. Within the last ten minutes, which were deemed critical, their dashboard lit up with the master alarm alerting them to a '1201' error.
00:09:01 Mission Control was frantic, trying to comprehend the meaning behind this alert. Fortunately, a resourceful MIT engineer in Mission Control had prepared a cheat sheet detailing every possible error code, and upon reviewing it, confirmed they were good to land if no other issues were present.
00:09:53 That engineer was so shocked that he physically could not communicate but managed a thumbs up, ultimately awarded a medal by the president for his critical decision-making.
00:10:22 With today's computers, the expectation is that they handle issues like infinite loops without crashing. Terms such as task management and process use were nonexistent back in the 1960s. Their approach was often simpler, running whatever could without prioritization.
00:10:56 With the Apollo Guidance Computer, they introduced a priority-based scheduling system, ensuring vital operations like firing thrusters were prioritized. Thus, the key takeaway is to manage failures gracefully and be prepared for the unexpected.
00:11:45 An essential point is that if your systems can handle unexpected failures, do not sound alarm bells. Management panic in stable situations merely creates unnecessary stress.
00:12:10 Shifting gears a bit, let's delve into the topic of testing. At the time of the Apollo program, software testing was a critical aspect of development as lives were at stake. They had unit tests, albeit the engineers referred to them as 'immune tests' since it was a new term for them.
00:12:55 They didn’t discuss their unit tests much because they were primarily concerned about their integration tests, as those were pivotal in the engineering context of this monumental task.
00:13:09 In 1972, the Apollo program engineers published a retrospective that addressed various challenges faced throughout the software development effort. Notably, they identified two consistent issues, the first being estimating project schedules—a problem we still face today.
00:14:03 The second challenge was obtaining updated specifications and requirements, as with many fast-paced projects in history, frequent changes made it difficult to keep everyone informed—communication remains a challenge in software development today.
00:14:49 The Apollo team discovered that thorough integration testing was key to identifying misalignments, thus emphasizing how collaboration and communication are equally important as coding skills.
00:15:46 Interestingly, the Apollo engineers placed a significant emphasis on what they dubbed the 'off-nominal cases.' For every one test that demonstrated success, they created a hundred that handled possible failures, ensuring systems could withstand a range of unexpected scenarios.
00:16:40 By contrast, many developers today may overlook failure scenarios, favoring basic functionality tests such as verifying a user can log in successfully without stress-testing what could go wrong. The Apollo guidance system succeeded partly because they anticipated potential failures and prepared for them.
00:17:24 Now let’s shift focus on the dynamics of teamwork during the early Apollo missions. When the Instrumentation Lab at MIT signed the contract for the guidance computer in 1961, it made no mention of software; their focus was merely on delivering a computer.
00:18:19 They were uncertain about whether it would be a simple hardware task or if they would need a fully programmable computer to achieve their objectives of moon landings. The uncertainties surrounding system navigation and overall feasibility led to a slow developmental phase.
00:19:08 At that stage, engineers enjoyed considerable freedom to experiment and innovate, as they had no defined deadlines or stringent oversight. It wasn’t until aspirations of flying to the moon became clearer that structured timelines emerged.
00:19:34 While they initially stumbled through the process, once they acquired an understanding of spacecraft design and navigation, a full team of 400 software developers sprang up around the Apollo guidance computer project.
00:20:06 This expansion led to numerous requirements, extensive documentations, and formalized deadlines. Subsequently, project teams adhered to improving communication and defining clear development goals.
00:20:42 It’s important to ensure teams match the right size for their tasks. A burgeoning space initiative with hundreds of employees might resemble inefficient startups; conversely, smaller teams in sensitive sectors need appropriate security measures.
00:21:24 Over time, the Apollo program's engineers realized that independence was crucial to their success. However, they felt pressure to integrate tighter as the moon landings neared, leading to concerns among contractors that MIT's rogue engineers were operating freely.
00:21:59 Many of the leading engineers left the Apollo guidance project after the success of Apollo 8, seeking new challenges after completing a great achievement. But the mission was a milestone accomplishment that foreshadowed the future of software and aerospace development.
00:22:46 Moving on, let’s reflect on user relationships—specifically astronauts. If you've seen 'The Right Stuff,' you know astronauts can be described as talented yet sometimes difficult individuals, akin to many software developers.
00:23:26 Similar to how developers can be inflexible, astronauts often hold strong opinions about their missions. They express reluctance towards automation, feeling confident in their rigorous training.
00:24:07 An example of this occurred when engineers suggested creating an automatic re-entry program to ensure astronauts navigated safely back to Earth. However, astronauts insisted on handling this manually, preferring to rely on their skills rather than technology.
00:24:51 Though they expressed initial disinterest, these automated systems played a crucial role in every mission, showcasing that astronauts ultimately recognized their value in critical moments.
00:25:38 Now, let's discuss interface design. The Apollo Guidance Computer required astronauts to communicate effectively with a system lacking modern screens and controls. Instead, teams used segmented LEDs and dials to build a functional interface.
00:26:33 The engineers designed an innovative system allowing astronauts to execute commands via two-step verbs (actions) and nouns (objects). They were developing a unique interaction model under enormous pressure.
00:27:40 Ironically, despite initially planning to replace this interface with something better, the cadences established with verbs and nouns became indispensable and highlighted engineers' foresight. Remarkably, even today, systems like Vim reflect similar principles.
00:28:54 As I close this segment, I encourage everyone to embrace the legacy of the Apollo program and apply its lessons—innovation, resilience, and teamwork—as we navigate the future of software development.
00:29:03 Thank you for your time, and I'm happy to take any questions.”