Integration Testing
Software Development Lessons from the Apollo Program

Summarized using AI

Software Development Lessons from the Apollo Program

Julian Simioni • March 17, 2014 • Earth

In the talk "Software Development Lessons from the Apollo Program" presented by Julian Simioni at MountainWest RubyConf 2014, the speaker explores the significant yet underappreciated role of software development in the Apollo Program. Simioni discusses how the Apollo Guidance Computer, developed in the 1960s, contained sophisticated software that was essential for the success of lunar missions.

Key points discussed include:

- Historical Context: The Apollo Guidance Computer was built before fundamental programming technologies like UNIX or the C programming language were developed, yet it achieved incredible feats, such as supporting the first moon landing without any computer failures.

- Software Complexity: Despite its seemingly primitive architecture, the Guidance Computer was a general-purpose machine that executed complex tasks through interleaved programming and prioritized processes, showcasing advanced concepts of computing at the time.

- Significant Incidents: During the Apollo 11 moon landing, the computer signaled a 'Master Error Program Alarm,' but thanks to smart contingency planning and understanding of the system, meaning was maintained, enabling a successful landing.

- Testing Approaches: Engineers highly valued integration testing over unit testing, believing that effective communication among teams was essential for tackling complex commitments on the project.

- Team Dynamics: Simioni contrasts the small, exploratory teams of early Apollo missions with the larger, deadline-driven teams later in the program, emphasizing that both had roles but required different operational approaches.

- User Interaction: Anecdotes regarding astronauts’ reluctance to use automated features highlight the importance of understanding user needs and behavior in software design, which parallels modern user experience challenges.

In conclusion, Simioni emphasizes the importance of thoughtful system design that incorporates user experience, communication, and the ability to handle errors gracefully. The lessons from the Apollo Program remain relevant in today's software development landscape, showcasing how collaboration and structured testing contribute to successful outcomes.

Software Development Lessons from the Apollo Program
Julian Simioni • March 17, 2014 • Earth

By Julian Simioni

In August 1961, the MIT Instrument Laboratory was awarded the contract for a guidance system to fly men to the moon, the first contract for the entire Apollo Program. The word software was not mentioned anywhere. Six years later, 400 engineers were employed on the project writing software.
The resulting Apollo Guidance Computer is to this day a marvel of engineering. It included a realtime operating system and even a software interpreter. Despite weighing 70 pounds, it ran on only 50W of power. Only one guidance computer was present in each Apollo spacecraft, with no backups: it never failed in thousands of hours of space flight. Before the first work on UNIX or the C programming language had begun, the Apollo Guidance Computer had already taken men to the moon.
NASA and MIT kept meticulous records, giving us the opportunity to look back today on some of the pioneers of our industry, relive their experiences, and maybe learn a few things ourselves.

Help us caption & translate this video!

http://amara.org/v/FG2M/

MountainWest RubyConf 2014

00:00:25.439 All right, can you guys hear me? Sweet. So it's true we're going to continue the trend. This talk is not about Ruby at all; this talk is about space. Let's hear it for space! All right, that's good. It's four o'clock on a Friday, and there was no guarantee any of you would be awake. I love space; I love space exploration. I've had that passion my whole life. I know I got into a technical field because I love space and space exploration, and I bet a lot of other people did too. Raise your hand if you did.
00:00:44.960 Exactly. I mean, there's just so much that's really inspiring. If there's one thing above all else that inspires so many people, it's the Apollo Program. Right? I mean, we landed on the moon—how cool is that? And there are just so many things that you can be inspired by, like the Saturn V rocket that has millions of pounds of thrust and shoots fire a quarter mile out from its engines as it's going into space. That's awesome.
00:01:03.680 There’s incredible math for orbital mechanics, and electrical engineers built fantastic hardware to make everything work. Astronauts were the best pilots at the time, test pilots for crazy stuff like the X-15. Then they went into space, and they were brave—it's truly inspiring. You hear things like, "Oh yeah, we landed on the moon with less computing power than is in my watch." That really frustrates me as a software developer. We want to hear awesome things about software, but no one talks about software in the Apollo Program.
00:01:30.000 For a long time, I figured maybe there wasn't much software involved. Maybe they just had some computer in the Apollo capsule, and it was just a bunch of circuits, smaller than anything else that came before it, and probably not pretty or programmable. But that’s all wrong. There was an incredible amount of software development effort that went into the Apollo Guidance Computer, and it is just awesome.
00:01:52.400 So, that's what it looked like. It's not very pretty, and it's true it's not very fast. A double-precision floating-point multiply took one millisecond, and I did some rough math—my little computer over there could do all the computations done in all the Apollo missions in one second. But I'm not even sure that thing will last through this talk.
00:02:05.520 Whereas in all the Apollo missions, the Apollo Guidance Computer never once failed. It hit some known bugs, and they had to do some crazy workarounds, but there was never a failure. That's pretty awesome! What was even more amazing is that it's not just a bunch of little circuits that are pre-programmed. If you look at the hardware, it's a general-purpose computer, pretty much exactly like any computer we have today.
00:02:28.080 It has an instruction set that's really familiar; it has RAM, it has read-only memory. Programming was pretty much similar to writing assembly today. And what's even crazier is that they built all of this in the 1960s! To give you an idea, the project started in 1961. In 1950, software was mentioned in a paper for the first time—the word "software." Before 1950, they had written software, but they didn't realize that this software thing was going to be different than just building a computer.
00:02:45.200 Firefox is 11 years old now; when they started building the Apollo Guidance Computer, software was as old as Firefox! Think about that. In 1965, they were about halfway through building the Apollo Guidance Computer. The MIT Instrumentation Lab, that's the team that built it, got a brand new fancy computer to do all their work on the ground—it was called an IBM System 360.
00:03:02.160 Now, there's a certain famous person who worked on that project. Who knows who he is? That’s right! What did he give us, and when did he write that? In 1975. By then, the Apollo Program was all shut down. All the software developers working on the Apollo Guidance Computer were contemporaries of Fred Brooks, learning exactly the same things and making the same mistakes.
00:03:23.040 In 1968, Apollo 8 was orbiting the moon with men on board. The Guidance Computer was essentially done by then; they had some work to finish up for other missions. But the Apollo Guidance Computer was feature-complete in 1968.
00:03:30.000 In 1969, two developers at a place called Bell Labs started working on this new programming language called C and a new operating system called UNIX. So, the Apollo Guidance Computer was completed before those were even started—that's pretty crazy. What do you learn when you build software in the 1960s? I want to first ask you all a question: who has built some sort of web app or system or something that relied on someone else's system, any sort of API or anything like that? Almost everyone, right?
00:03:53.760 What happened to your system the first time their stuff broke? Everyone’s laughing—yes, someone made an explosion! Hopefully, you got smart, and later on, you were able to handle anything that the other system breaking could throw at you. Maybe your stuff didn't work that great, but at least it kind of chugged along, and you didn't look so bad.
00:04:14.080 Let's now talk about the pivotal moment in the entire Apollo Program, which is obviously the first landing on the moon during Apollo 11. Neil Armstrong and Buzz Aldrin were in the lunar module, orbiting the moon, and they were starting their descent toward the moon's surface. The last few minutes, about the last 10 minutes or so, were something no one had ever done. They had only gotten into a tight orbit around the moon for Apollo 10.
00:04:37.680 This was by far the most workload-heavy part of the entire mission. Neil Armstrong said, on a scale of one to ten in terms of difficulty, he called it a thirteen. These guys were focused, right? They were working hard, just a couple minutes from landing, and then suddenly, on their little dashboard, a huge thing lights up, saying 'Master Error Program Alarm!' Something's wrong with the computer. Oh no! They started freaking out, distracted, and flew three miles past their landing site into a field of craters, which was no good.
00:05:00.720 Back in Mission Control, everyone was freaking out. They had never seen this particular error before. In all the simulations they did, there happened to be a guy from MIT there, who had been working on the Apollo Guidance Computer. He had written down a little cheat sheet of every single error code that could be displayed and what to do about it—an extremely smart idea.
00:05:23.680 He looked up the error code that they had never seen before, and his notes said: as long as nothing else looks wrong, then you still go. So he told everyone, "Okay, we still go for landing," and he actually got a medal from the president for saying that.
00:05:43.440 We all know, obviously, they landed on the moon. So what happened? There was this program alarm that said the computer was doing something wrong. In 2014, if you write some code that does an infinite loop or uses way too much CPU on your computer, your computer will generally not crash, sometimes, but generally not. In the 1960s, it generally would crash.
00:06:05.600 The idea that you could have one computer running multiple programs, interleaved to appear simultaneous, was brand new. They call it time-sharing, and it was first discussed in 1957. So, the concept that you could have multiple programs running together was extremely new. The Apollo Guidance Computer did this; there were programs to fire rockets, programs to fire thrusters, programs to take input from astronauts, display information, and send data back to Mission Control—all sorts of different tasks.
00:06:29.440 At the time, almost every single system that did this just let whatever program wanted to run have a little slice of time and then moved on to the next one—very simple and straightforward. They called it static time-sharing. Obviously, our computers don't do that today; we give everything a priority, and then very smart and sophisticated logic decides what we should run based on the highest priority tasks.
00:06:52.720 The Apollo Guidance Computer did this too. Some engineers at the time hated this idea; they thought, "You can't know what's running at any given time, it's hard to test well." What turned out to be causing that program alarm? It was too much CPU usage. The Apollo Guidance Computer was working incredibly hard during the landing. It calculated the trajectory to go in, fired thrusters at the right times, and sent back data to Mission Control.
00:07:10.080 The engineers knew this. They calculated everything and said, "Yeah, during landing, there'll be peaks to about 93% CPU usage, but that's okay; we've got everything under control." However, there was some other hardware on the lunar module that sent extra signals to the Guidance Computer, causing it to use an additional 10% CPU, which was not good.
00:07:30.320 Of course, they were smart enough to give these signals a very low priority—signals from the rendezvous radar, which they didn't even use unless they were going back to the command module. Without that priority-based system, which was smart enough to say, "Here’s the stuff that’s really important that we need to run. If something else happens to be running that we don't expect, as long as it's lower priority, we'll just not run it, and everything will be fine." If they hadn't done that, they wouldn't have landed on the moon during Apollo 11—they would have had to abort.
00:07:53.440 The takeaway is to spend a little extra time making your system handle failure gracefully. As a side note, if you handle something gracefully and then present a huge error that says 'Program Alarm!' and it's super scary, you could maybe do a little better. Something like 'Everything's okay, even though we're running a little low on free CPU.'
00:08:16.960 Okay, let's talk about something hopefully we all love—let’s talk about testing. Yeah, testing. Obviously, lives were at stake in the Apollo Program, and they did a lot of testing. They did a lot of automated testing and unit testing—they even called it unit testing. They put quotes around it because it was so new. They didn't really know if it was actually a thing.
00:08:36.560 The engineers saw the value of unit testing—they said it was great, it helped them, and it was cool. But they didn't love it. They didn't say that unit testing was the reason they landed on the moon, and I think the reason for this is pretty straightforward.
00:08:52.960 As you pointed out, when you write functional code, it's easy to test. All the code for the Apollo Guidance Computer was functional, not just in the programming sense but in an actual mathematical sense—it involved guidance equations with inputs and outputs. It was easy to test.
00:09:12.240 Furthermore, in the 1960s, the computer science degree did not exist. Everyone programming for the Apollo computer was trained in math or had learned enough math to essentially have a degree in math. When you put a bunch of mathematicians together and tell them to code up functions they know really well, they can generally do it. They also did a lot of integration testing.
00:09:29.840 When you read what they talked about regarding integration testing, they loved it! They said it solved so many problems and helped them a lot. In 1972, MIT published essentially a retrospective of software development for the Apollo Guidance Computer, identifying two major problems that plagued the entire project.
00:09:49.600 The first problem was how to estimate schedules and meet them. They had no idea how to solve that then, and we still don’t really know how to solve it today—it’s a tough problem. The other hard problem was getting up-to-date specifications and requirements of other systems. That's hard, and we have that problem today.
00:10:09.680 But what they discovered is that integration testing can help you figure out when you’re wrong. When you write unit tests, it’s generally all your code or someone on your team's code. The whole nature of integration testing is that it includes someone else's code too, or their hardware.
00:10:27.920 So, you put everything together, your whole system, and see what happens. If your integration tests pass, you don’t necessarily know anything, but if they fail, you know you did something wrong. What you probably did is coded a great implementation of the wrong thing—you wrote something that worked great, except you didn't know what to write.
00:10:48.320 Then you can go out, and there’s no easy process for this, which is why it’s still a problem today. You can talk to the other people you work with—other parts of your system—and figure out which part went wrong. It doesn’t cure everything, but it helps.
00:11:05.920 This leads to an interesting takeaway: unit tests prove your code, while integration tests prove your communication. We all know communication is one of the hardest parts of software, maybe even harder than writing the actual code. So, that’s pretty cool.
00:11:23.920 Let's talk about teams. This is after Apollo 13 landed, and everyone was happy. In 1961, the Apollo computer contract was signed, not mentioning anything about software—they didn’t know they'd have to write any code. Soon after, they realized they would, settling into having about 20 software developers.
00:11:41.120 These 20 developers had no managers, no deadlines, no requirements, and very little communication with anyone else. The reason is that in 1961, they didn’t know at all how they would get to the moon. They didn't know at a high level what a spaceship that goes to the moon would look like—would it be one big spaceship or two smaller ones? Would it be one giant rocket or two little ones?
00:12:00.720 No one knew how to navigate to the moon with math. They figured it could be done, but they didn’t know how, nor if they could build a computer to do it, so there was no point in setting a deadline. What they wanted to do was experiment, fool around, and prove out all these concepts.
00:12:20.640 A couple of years later, they knew how they were going to get to the moon. They decided to use two smaller ships—one was strictly for landing on the moon; it turned out to be a lot more fuel-efficient. We knew that with a sextant, just like sailors navigating the Atlantic or Pacific, one could look at the stars, the moon, and angles to figure out their position.
00:12:39.920 It was proven you could use math and accelerometers to achieve precise orbits around the moon without any outside help. Around this time, there were 400 software developers, and they had lots of managers, lots of deadlines, and extensive communication. The reason for this was that they were focused on getting specific demands done. J.F.K. said we had to be on the moon by 1970.
00:12:59.840 So they had a pretty harsh deadline. Last year, if you were here, Sara talked about open and closed modes of thought. In open mode, you're receptive to new ideas, exploring. In closed mode, you’re focused on getting specific tasks completed. This perfectly maps to these two team sizes. Early on, you're just trying to explore what can work, and later on, you're trying to get things done.
00:13:19.920 It goes without saying that you should ensure that whatever team size you're on is set up for doing what you need to do right now. No startups begin with 400 people—at least, they don’t last long if they do. Likewise, we’d all be nervous if we discovered the Google Search team had only 20 people; that would not be cool.
00:13:40.640 A side learning from this is that switching between a small scrappy team to a large process-oriented one is really painful. NASA complained like crazy when there was a really small team, saying, 'You guys got to get your act together; you got to get organized.' When they got bigger, many engineers left. They said, 'We like exploring new things, we like fooling around, we like not having managers.' After Apollo, a good number of the engineers left to pursue other opportunities.
00:14:02.640 But the point is, we wouldn’t have landed on the moon with only 20 software developers throughout the entire process, nor would we have accomplished it with 400 developers all the way through. So, make sure your team is the right size.
00:14:23.680 Let’s talk about working with users. Actually, let’s talk about working with astronauts. Who's seen 'The Right Stuff' or read the book? Great movie, great book. How does it describe the astronauts? Goofballs? Cowboys? They are super confident, super arrogant, and super hard to work with—they're just like software developers.
00:14:40.960 So, we all know a few things about working with users, and one thing that really stood out to me is that astronauts, just like other users, don’t know what they want. There’s a great story that illustrates this. The Guidance Computer was also an autopilot; you could set it to navigate the craft almost everything except the actual landing on the moon, which had to be done manually to avoid boulders and other dangers.
00:15:02.560 So the Apollo Guidance folks said, 'Hey astronauts, we can build you an autopilot that will handle the re-entry at the end of your mission for you—just press some buttons, sit back, and we’ll get you through the atmosphere.' It's a tense moment; if you come in too fast, you burn up. If you don’t come in steep enough, you fly back out into space.
00:15:24.960 The computer can manage all that pretty well. However, the astronauts had trained to do this manually and said, 'No, there’s no way some computer is going to do this for us.' Many astronauts had friends who had died in accidents during the 50s and 60s, so it was sort of reasonable, but they insisted that every single one of them would manually handle the re-entry.
00:15:45.440 The astronauts never used the autopilot for manual re-entry—on the first mission, they tested it for about a minute. When you’ve been in space for a week and a half and there are three minutes of work the computer could easily do to help you, you’ll probably just press the button and sit back.
00:16:03.760 Here’s another interesting thing: let's say you've built a computer that can navigate you to the moon. With astronauts on board, how do they communicate with each other? Keep in mind, human-computer interaction as a field will not be invented until 1980. You don’t really know much about working with users; you can’t use a CRT screen because they’re too heavy, and you can’t use an LCD screen because they haven’t been invented yet.
00:16:24.480 Again, astronauts despised the very idea of automation—they just wanted a joystick that would manually control everything with actual cables, not even fly-by-wire. So what do you build in that case? Thanks to the work of some brilliant people—who aren't me—they constructed a JavaScript emulator of the entire Apollo Guidance Computer.
00:16:41.680 It's live demo time—who’s excited? All right! In all its glory, this is the interface of the Apollo Guidance Computer. You get three lines of output—it's all the computer tells you about anything. You have this stuff that basically shows what it’s running.
00:16:46.640 You communicate with this computer with a verb and then a noun. For example, verbs might be like 'show me some data.' So, verb 0 6 will show you some data, and noun 65 is like the current system uptime. If you hit that, it’ll tell you that this thing’s been running for zero hours, twenty minutes, and two point fifty-seven seconds.
00:17:04.640 Let’s run verb 1 6—this says, 'Show me some data,' and keep updating it. Now noun 65, and it’s the same thing, but it's updating every second. You have this very simple but powerful system where information can be entered quickly—astronauts might be under some time pressure to ensure the spacecraft is doing what it should.
00:17:24.000 By the end of the Apollo program, the astronauts were so proficient with this that the engineers were astounded. It provided a really simple but powerful interface. Now, who here is reminded of a certain program we might use almost every day that has verbs and nouns, and you can combine them in interesting ways? It’s like vim!
00:17:47.680 I have no idea if this influenced the design of vi or vim—vi was developed in the 70s and then vim in the early 80s. But I think it's really cool that, even 40 years later, we're still using that same concept they devised. And I use vim every day; I'm a big fan.
00:18:04.400 So, I think that’s awesome. You all should check this out; you can simulate an entire launch, and it’s amazing, but I don’t have time to show it to you. Let’s go back to the slides.
00:18:21.680 If you want to make my job shorter, we can do it in a matter of seconds per slide instead of ten. Let’s try that! Okay, so there’s a launch checklist—let’s open this thing up. Okay, I think I did these two. Now, verb 37.
00:18:36.720 Verb 37 says 'run a program,' and we want noun 0 1—that’s like the pre-launch program to make sure everything looks okay. Nope, it didn’t work. It reset instead. That’s verb 37; now, enter noun 0 1. It’s supposed to flash for a second, so it’s supposed to be calibrating something. Maybe it's taking a second. Nope!
00:18:54.080 It says, 'Hold on, let’s try—' I have another one! I have another one open in Chrome in case I close Firefox. Okay, it said verb 37; let's enter noun 0 1 again. HCI hadn’t been invented for another 20 years; there we go! So now it says 'Program one is running.' Yeah!
00:19:10.520 This little thing over here is calibrating, and that will happen any second now—we're figuring out where the guidance computer is. As soon as this switches to two, let’s go back to our launch checklist.
00:19:29.440 Major mode two, the pre-launch procedure, will take place. Okay, we're in program two now—press the launch button, bam! Now, this thing says, 'The Saturn first stage is firing,' and switches to, uh, I think fluid velocity. One of these is altitude, and the other is downrange distance. They’re increasing! You can imagine we're in a rocket and being pressed back into our seats—it's not very dramatic when it’s just a couple of lines of text.
00:19:44.480 But you can still play around with this. It’s pretty cool. Now we’re really going back to the slides. I’m Julian Simioni, I work at 42 Floors, and you can tweet at me and read my blog when I update it. That’s a picture of me—I'm a pilot and it’s amazing. You all should get your pilot's licenses.
00:20:03.920 Thank you so much, everyone!
Explore all talks recorded at MountainWest RubyConf 2014
+12