GoGaRuCo 2014

Let's Build a Computer!

Let's Build a Computer!

by Ryan Davis

In the presentation titled "Let's Build a Computer!" at GoGaRuCo 2014, Ryan Davis explores the fundamentals of computer construction using a bottom-up design approach, starting with the NAND gate and progressing through various logical constructs to build a functioning computer. The curriculum used stems from the book "Nand to Tetris," which guides learners through the complexities of computer science by starting from the simplest logic gates and expanding to a complete computer system.

Key Points Discussed:
- Introduction to the NAND Gate: Davis begins by explaining the NAND gate, highlighting its role as a universal gate capable of constructing all other logic gates, such as NOT, AND, and OR.
- Building Logic Gates: He demonstrates how to build other gates using the NAND gate, including the construction of NOT, AND, and OR gates using truth tables and De Morgan's Law.
- Complex Gate Construction: The talk progresses to more complex gates like XOR, MUX (multiplexer), and DMUX (demultiplexer), detailing their logical implementations and relation to corresponding coding structures.
- Arithmetic Logic Unit (ALU): The ALU's function is discussed, showcasing how it processes inputs and executes various computing tasks, including addition through half adders and full adders.
- Memory Construction: Davis differentiates between combinational and sequential logic, emphasizing the role of memory components such as flip-flops in holding data and addressing memory efficiently.
- Final Computer Assembly: The presentation culminates in assembling a complete computer from the discussed components, emphasizing that once foundational elements are established, creating a computer becomes trivial.
- Conclusion and Reflection: Davis reflects on the educational value of the Nand to Tetris curriculum. He encourages attendees to appreciate the reachable complexity of computing that lies beyond surface-level understandings, signifying the unifying logic of computer architecture.

This technical journey offers insight into how fundamental logic can manifest into the devices we rely on every day, ultimately nurturing a deeper appreciation for the underlying principles of computing. It emphasizes learning through hands-on engagement with computer components, making complex subjects approachable for beginners and inspiring further study in computer science.

00:00:13.759 Thank you.
00:00:15.360 The year is 3014. After a series of demotions, I find myself as the sous chef and assistant sous chef on the intergalactic spaceship Icarus. One moment, I don't have presenter notes, and that's going to kill me. Oh, thank God!
00:00:27.599 After an unfortunate accident involving our phase matter array and a dark matter cephalopod that was terribly hungry, I find myself stranded in space. All I have is a power supply, a bucket of NAND gates, and this book. Everyone join in: lost in space!
00:01:22.000 So, let's build a computer and get home, shall we? Welcome to GoGaRuCo 3014. To set some expectations, this is a highly technical talk, but I hope it will be highly approachable. I have 229 slides, and I have to go incredibly fast. I will not have time for Q&A, but I will be out during the break afterwards.
00:01:36.320 Very importantly, this is not my field of expertise at all. The last time I did anything considered hardware was paired with an ex-Navy electronics guy, so I didn’t exactly learn anything. I'm terrified.
00:02:01.520 Seattle.rb has a study group, and we studied this book in the spring of this year. The curriculum hails from a university in Israel and is taught by two professors, which you can see there. I'm not going to try to pronounce it because I don’t want to mess it up. It’s been taught in several American colleges and even high schools.
00:02:21.280 I was frankly blown away by how good this book was. Really, I love this book. I love this book so much that I am here doing this talk. The book goes from the very basics up to a running computer with a game or another application. It's very open-ended.
00:02:31.440 It's a lot of material to cover in 12 weeks, let alone in 30 minutes. We're only covering the hardware aspects, which barely touch machine language. This is the layout of the book. The actual chapters are shown in a diagram from the book itself.
00:03:03.440 As you can see, we're only doing chapters one through three and five, with hints at chapter four. This, more than anything, is the thesis of the book and the theme of this talk: We don't just build a computer; we build Boolean logic. With that, we build arithmetic, and so on. Every layer is entirely dependent upon the layers below it.
00:03:22.720 This is bottom-up design at its finest. Very importantly, though, perhaps the most critical point in this talk is that this is not Comic Sans. See those 't's? They're all slightly different because this font is hand-drawn and called 'Sketch Note.' It is directly related to the book I just showed earlier, and I think it's really awesome, so I'm using it throughout.
00:03:49.360 So, let's get started with Boolean logic. Let's start with the NAND gate and the most obvious question that follows: What is a NAND gate? It’s really simple. It’s a NOT gate, represented by a little circle, and an AND gate, represented by the D shape. Together, they make NAND.
00:04:15.280 The lines are just pretend wires and aren't actually part of the symbol. The truth table for it is the exact opposite of an AND gate. While AND loves agreement and returns true when both inputs are true, NAND hates agreement and returns true whenever there’s a false.
00:04:36.080 If we were to code it, it might look something like this. Simple enough, though savvy Rubyists might try to add it to an object. The traditional symbol for NAND is actually pretty boring, and it turns out that the turned ampersand Unicode character is way cooler, so I decided to go with that.
00:05:09.680 NAND is special; like NOR, it's considered one of two universal gates. This means you can build all other logic gates out of just one of these types of gates. We’re going to use the NAND gate to build a whole computer.
00:05:43.520 As I said before, complexity arises from simplicity. We're going to build NOT, AND, OR, and all the others from just this lowly NAND. The first thing we're going to build is a NOT gate. How do you do that using only NAND?
00:06:30.319 The truth table is simple enough: It returns the opposite of its input. How do we get that? If you look closely, it’s already there. If you give a NAND gate two trues, you get a false. If you give it two falses, you get a true. So, wiring the input signal to both input wires yields a NOT.
00:06:55.680 Now let's work on building an AND gate. This time we can use NAND and NOT. An AND gate is simply a negated NAND gate, which we can expand to NOT NOT AND. The double negative negates itself and we get an AND. Thus, we have an AND gate where the input passes its output through the NOT.
00:07:24.000 Next up is the OR gate. We will use the gates we have available to build it. We're probably all familiar with OR, but how do we construct it from AND and NOT gates? After all, they are all kind of combined. It’s actually pretty easy using De Morgan's Law, so let's do a quick refresher on that using everyone's favorite — Venn diagrams.
00:07:55.680 Let’s start with set A and set B. The union of the two results in OR. The opposite of that would be NOT A OR B, which is the area outside the union. This can equivalently be expressed as NOT A AND NOT B, which includes everything outside the union of A or B, and that’s exactly what we don’t want, so let’s negate that too.
00:08:29.280 This yields the equivalent of A OR B using our gates, and that’s De Morgan's Law in a nutshell. So let’s implement it, shall we? First, we flip it using De Morgan’s. We take the source code and translate it from bottom up, giving us NOT A, then NOT B, then we AND the results, and then we apply NOT to the final result.
00:08:34.880 The chip layout is straightforward at this point, but if you look at it in terms of code, we can see some simple optimizations. Remember that we built AND with two NANDs and NOT with one NAND? Here we can see that NOT AND is just NAND, so let's replace three NAND gates with one. That's better! Next, let’s replace NOT with the equivalent NANDs.
00:09:01.440 This gives us an OR gate made of three NAND gates instead of five — a nice optimization that adds up in the long run. Okay, now we’re in the swing of things! Next, we tackle XOR, and then we’ll be done with standard logic. We now have OR available to us, so it's much easier to construct.
00:09:41.600 XOR, or exclusive OR, signifies one or the other. It translates to: A AND NOT B OR B AND NOT A. This may sound easy to express, but it adds up to one plus one for the NOTs, two plus two for the ANDs, and a total of nine NAND gates.
00:10:30.960 Let's convert this and see how we can clean it up. Moving left to right, let’s replace the NOT gates with their NAND equivalents, then expand the AND gates to their two NAND gate equivalents, and finally expand the OR to three NAND gates.
00:11:01.440 At this point, it remains entirely equivalent to what it was before, but you may spot some clumping, and you might detect some patterns really quickly. Specifically, NOT gates in the stream can cancel each other out; this happens when we have an AND gate feeding into an OR gate.
00:11:35.520 Here’s a little hint: Whenever you have an AND gate fitting into an OR gate, it happens because the AND ends with a NOT, while the OR starts with a NOT. So let’s identify and remove them next. There’s also a non-obvious optimization I had to work out by constructing the truth tables, but there might be a smart way to come about it.
00:12:07.799 As you know, I'm just a sous chef. This is somewhat hand-wavy, but knowing that NAND hates agreement, we can share a single NAND at the front for the same effect, dropping nine gates down to four. That represents a huge optimization in the grand scheme of things since there are tons of XORs and ORs inside a computer.
00:12:49.920 MUX is the first name that doesn't have an obvious analogy in Ruby. It sounds big and scary, but it's really just an if statement. When the select flag is true, we pass B through to the output as is; when the select flag is false, we pass A through. That's it! It's really simple, just a dumb name.
00:13:09.120 How do we create this as chips? it's a dumb name, multiplexer. Come on, it's either select flag and B or NOT select flag and A. This takes some getting used to for me as we calculate both of these all the time, using an OR gate to choose the right result.
00:13:30.720 Converting it to equivalent NANDs gives the following. I'm skipping the optimization steps for time, but it’s the same process we've gone through before. DMUX has no analogy in Ruby; it's the opposite of an if statement. However, it’s still pretty easy as it decides which of two outputs to assign based on the selector.
00:13:52.960 Ignoring Ruby's scoping rules, here's our truth table for DMUX: when the selector is true, the input goes to output A; when false, it goes to output B. Essentially, it's just parallel computation: A is the input and selector, B is the input and NOT selector. There are no optimizations to be had here, so it’s a straightforward port to NANDs.
00:14:12.320 In source code, I would keep it as and and NOT to keep the gates clear. That’s the end of our binary operations. Now, let's venture into wider branching.
00:14:31.360 OR-eight way returns true if any of its eight inputs are true; it can be implemented in Ruby simply and straightforwardly, albeit messy just to fit on the slide. We could implement it with seven OR gates with eight inputs and one output, but there's a flaw: the implementation is serial, going through seven gates to get an answer.
00:14:56.640 However, this same input and output can be achieved by implementing OR eight way as a binary tree, dropping the depth from six to two.
00:15:22.400 Wide MUXes are a mainstay in computers, as you'll see. A four-way MUX takes a two-bit selector to decide which of four inputs it will output. Much like the theme of this talk, a four-way MUX is simply built from more two-way MUXes, as you may guess. An eight-way MUX is built from four-way MUXes, and there's absolutely no difference here.
00:15:42.480 With a bit of thought, you can exercise these out on your own. It's just an exercise in editing and figuring out the macros of your editor. For the sake of time, I'm going to consider this an exercise for the reader or viewer. Just remember that complexity builds from simplicity, and you’ll do fine.
00:16:05.120 Now let's talk about 16-bit logic. It sounds complex, but it isn't. It's just 15 more of the same simple thing in a single chip. A 16-bit NOT would look roughly like this, and so on. Again, this is trivial and an exercise for the reader.
00:16:26.560 With that, we now have all the basics we need to start computing. Now, we’re going to build Boolean arithmetic. Specifically, we’re going to build adders. Not these kinds of adders, just good simple arithmetic.
00:16:53.760 So, remember back in elementary school how you were taught to add two numbers? You start with the least significant digits, calculate a sum and a carry, and repeat until done. Using two inputs to calculate a sum and a carry is called a half adder; using two inputs with the previous carry to calculate a sum and a carry is called a full adder. Computers add numbers just like elementary school kids; only the difference is computers use base two, not base ten.
00:18:50.880 The half adder is easy — the sum is one only if one of the two inputs is one, which means it's XOR. There's a carry if both inputs are one, which is an AND. If you take two half adders and connect them, you create a full adder. The sum equals one only if one of the three inputs is one, leading to a three-way XOR.
00:19:13.920 There’s a carry if two or more of the three inputs are one; you connect those together with an OR gate. Now we know how to add two bits and a carry, so let's scale that up to 16-bit numbers. You simply wire up 16 full adders to their 16 inputs.
00:19:32.880 You provide a zero to the first carry-in, and each carry-out becomes the carry-in for the next adder, chaining them through. But Ryan, you say, doesn't this have the same problem as OR-eight way? Very smart — yes, it does!
00:19:51.600 What they don’t teach you in grade school is that there’s more than one way to add numbers together. For example, I was tutoring a fellow student in junior high on absolutely rudimentary math, and he blew me away by adding numbers left to right. At the time, I had never even thought about it.
00:20:23.440 Indeed, you can compute all the carries that each column would have in parallel, but to do so you must rearrange things. Going back to the full adder, we have the sum and carry, but we’re primarily interested in the sum since that's what causes all the delay. We remove the carry and end up with what's called a partial full adder, or PFA.
00:20:57.440 This calculates a sum plus whether the calculation will propagate a carry with the previous sum or whether it would generate a carry on its own. Then it's simply a matter of using that to calculate all the carries with what's called a Carry Look Ahead Adder (CLA). The CLA takes four propagate-generate pairs to handle carries and generates four carry outputs.
00:21:30.960 You connect four PFAs and one CLA to create a four-bit adder. Going one level deeper, you connect four four-bit adders together with another CLA to create a 16-bit adder. This time it's logarithmic depth instead of linear. That’s easy, right? Well, kind of, until you look at what a CLA is.
00:22:04.720 You take all those propagate-generate bits and utilize them to compute all carries in parallel, creating what I call a 'carry lookahead monster.' This is as clean as I can diagram it; it’s all in NANDs. Sorry about that. This is as convoluted as it gets.
00:22:49.760 Should I go into all the crazy details about how a CLA works? This slide shows the scheme code that I use to generate the NAND gates; it’s not exactly readable, is it? No, not really — it’s complex. While it does a great job, I wouldn't want to look at this code day-to-day.
00:23:10.560 Here you have the raw mathematical definition, defining carry propagate and generate at every level. However, please note that it’s recursive. If we expand it fully to four levels of recursion, you get it written and formatted this way, which can be tedious.
00:23:37.680 At this stage, it's just a matter of wiring, and I can handle that! But to complete the book, you don't need all this added complexity; I just pursued this rabbit hole because I find it fun and because I might be a little insane.
00:24:10.560 An incrementer is a device that adds one to the input — that's its job! The simplest way to implement it is to wire up a bunch of half adders and feed the first one a hardwired one, enabling it to do its job.
00:24:38.560 If you care about speed, this gets terrifying. This example is from a Z80, designed to be incredibly fast, but I don't want to dive too deep into that rabbit hole. Instead, I decided that the straightforward implementation was perfectly acceptable, and I decided to stick with that.
00:25:08.560 Let's discuss the ALU, or Arithmetic Logic Unit. This represents the greatest complexity within the entire book. Hardware-wise, it consists of two 16-bit inputs, X and Y, six flags to specify which operations to execute, two output flags to report the status of the output, and the actual 16-bit output.
00:25:47.679 Here’s the hardware definition declaration for the inputs and outputs. I put this here to illustrate how ugly it is. It's gross, really, and here's the implementation, which is equally unappealing.
00:26:04.919 I included this section because the part down here took me a long time to solve, and it amazed me when I finally figured it out. You don't need to understand this code; I wanted to showcase how interesting making it bigger could be.
00:26:30.479 This MUX sends parts of its result to four different places. We don’t have a programming language analogy for this. We treat functions as mathematical functions — they’re black boxes. Input goes in, output comes out. But this MUX pushes its results to four different places concurrently.
00:27:05.679 A mind map might be more understandable. I find this approach easier to grasp than raw source code, as the eye can pull out the flow if the diagram layout is good. There are two things to note here: the AND 16 and the ADD 16 are constantly calculated each clock cycle.
00:27:40.480 The outputs go to different places; at times, they only partially go to one. The topmost wire is one bit, while the two lower wires are eight bits of a 16-bit result. I find that mind-blowing. However, since this is merely a chip, you must consider it more like a standard function.
00:28:08.400 The fascinating aspect is those six little flags — they effectively create our instruction set. With those flags, you can derive all sorts of outputs — constants, negations, increments, decrements, additions, and so on.
00:28:31.040 Mark Armrests, who is an exceptionally helpful contributor in the Nand to Tetris forums, designed an ALU with just 440 NAND gates. It's amazing — once you get used to looking at this content, you can scan the diagram and identify a tree of ANDs, rows of NOTs, and ORs. It’s not complex; it’s just busy.
00:29:02.960 Now we get to the exciting parts! All this calculation stuff feels a bit 19th-century. It suffices if all that you want to do is dial knobs, pull levers, and calculate trajectories for cannon fire. This technology was built for those purposes.
00:29:28.360 Now, let us create memory, which enables us to perform useful tasks.
00:29:32.640 So what’s the difference between combinational logic and sequential logic? Combinational chips are very straightforward. You have some inputs that feed some logic and produce outputs; sequential chips are complex by design.
00:29:49.440 They have the same inputs feeding into some input logic that interacts with memory before routing to output logic heading out. The memory's actions synchronize with a signal from the clock.
00:30:09.440 Here’s our basic memory element: the DFF or data flip-flop. Here’s how it works: while this isn’t typical Ruby, it defines the input now as the output later.
00:30:21.760 In Nand to Tetris, they treat this as an atom. You acquire DFF chips the same way you get NAND chips and build everything else from them. I’ve poked some more and discovered that the DFF itself can be created with NANDs.
00:30:41.920 Mark, the same individual, wrote a detailed article explaining this process. You start with a NAND to build what's called an RS latch.
00:30:45.920 You clock it, double it up, and create an RS flip-flop, which finally extends to a DFF. These are simply details that you don’t need to finish the book; however, I find them fascinating.
00:31:01.040 There's a huge reason why I wanted to do this book: learn concepts I didn’t at college. So, you yield a DFF that outputs the previous input; but for memory, you must hold onto the data and change it when desired.
00:31:21.040 That’s where the bit comes in. The bit takes an extra input, the load flag, so when memory is needed, it must have a DFF. Hence, we simply rename the BIT as DFF, and we already have our basic units.
00:31:39.920 Since this is just an IF statement followed by conditional calculations, translating it is straightforward. An IF is just a MUX; memory is just a DFF. Most fascinating is the DFF feeding back into the MUX, providing the previous result as the current input.
00:32:03.440 That’s the entirety of being a BIT, but a single bit isn't very useful. We desire larger numerical values, so we repeat the process we did with AND16 and scale it up.
00:32:22.960 That's genuinely all a register is: a collection of bits along with some input, a load flag, and an output. It’s quite simple, just packaged more conveniently. However, even a single register is powerful, it’s limited:
00:32:53.840 We want to address a nearly arbitrary amount of RAM. Thus, we need many registers and must find an address for the register we wish to interact with. Otherwise, it has the equivalent structure as a register — simply an input, output, load bit, plus an address of which value we're seeking to read or write.
00:33:33.760 Yet this is sequential logic, which means we need to assess the addressing component: how do we route inputs to the appropriate location? How do we retrieve the right value from the designated area and direct it to output?
00:34:06.640 For RAM 8, we utilize a 16-bit DMUX eight-way to route the input to the designated register and a 16-bit MUX eight-way to send that register's output. The load signal is distributed to all registers.
00:34:40.160 And if we want larger memory, that’s where the complexity arises from simplicity. We're essentially stacking these like Legos to construct a larger memory bank from previous memory sizes.
00:35:08.960 Each step builds toward achieving a sufficient memory amount. So, how do you build RAM 64 from RAM 8? Again, that’s an exercise for the reader.
00:35:33.760 Now we have logic and memory in place; we’re on the home stretch. The next crucial element is a counter — it remembers where we are in our program.
00:35:56.960 The counter is responsible for providing the address of the next instruction in ROM that we are to execute. It looks something like this. It may appear complex, but ignore the extra logic; examine the bottom where the register feeds back to the incrementer.
00:36:16.960 That's the part that astounds me! We don’t do that in programming. So, we’ve gathered all components we need to build a computer! We’ve achieved so much already and come so far.
00:36:48.560 Yet I'm worried that the next slide is a bit of a letdown. That’s a computer. Yep, we have memory.
00:36:55.560 We have a CPU consisting mostly of an ALU and some logic. We have input and output. Our CPU takes three inputs and calculates four outputs.
00:36:59.840 It retrieves a value from memory, representing an instruction to execute, along with a potential reset flag to restart at PC zero.
00:37:05.680 The CPU calculates the value to store in memory, determines whether to write it, where to send it, and what the next PC value should be.
00:37:25.520 Figuring out the CPU and its internals was the most rewarding task throughout this entire book; I leave this as another exercise for the reader.
00:37:47.680 ROM is where the program resides and runs from address zero, much like RAM, but it’s read-only. The keyboard is simply a single direct memory access word linked to current keyboard input.
00:38:07.760 Similar to the keyboard, the screen also interacts with memory but is addressable since it can be read and written to an input value along with an address and load bit.
00:38:25.840 Memory encompasses both RAM, the screen, and keyboard. The contents of the screen and keyboard's value are memory mapped and can be addressed via standard pointer access.
00:38:50.240 This functionality allows for impressive point arithmetic; you can traverse your screen and create fancy patterns with it.
00:39:16.600 So, our original abstract architecture diagram is almost more intricate than the real thing. Constructing the computer at this point took less than a minute in source code.
00:39:38.640 You have three components; you wire them together. The connections aren’t shown because they’re distinctly bi-directional, crossing all over the place, but that's it!
00:39:54.680 Boom! Done! It literally took me a minute to build, which was somewhat of a letdown. That’s all it requires to create a computer once you’ve assembled all the components.
00:40:18.160 So, let’s go home! It turns out that the Apollo Guidance Computer (AGC) presents an amazing parallel to this project.
00:40:41.040 Julian Simon gave an excellent talk regarding the Apollo program’s development process earlier this year at Mountain West; if you haven't had a chance to view it, I highly recommend you do.
00:41:08.360 It turns out the AGC was completely constructed from NOR gates. I'm not trying to pull the wool over your eyes with this talk — it’s definitely achievable, and it resulted in sending people to the moon.
00:41:22.360 If I didn’t have all these sizable components and felt like a complete klutz working on a breadboard, I would genuinely consider creating one of these for fun and challenge.
00:41:39.840 Thank you. Because I have four minutes left, one more thing: MiniTest is an awesome Ruby testing framework. It is fast, simple, powerful, and clean!
00:42:00.640 It’s in Ruby and modular. Maintenance Bisect is a plug-in that I released a couple of weeks ago that isolates random test failures. Has anyone experienced random test failures without knowing the cause?
00:43:03.520 We want to discuss this: it’s called a test order dependency error. I have a slightly outdated video demonstrating a previous version of this.
00:43:26.880 So, if we have numerous tests, and each dot takes a couple of seconds to run, and we have a reproducible failure, we can apply MiniTest Bisect multiple times. Bisect files no longer exist due to the way MiniTest loads and randomizes tests.
00:43:53.360 The base idea is to identify the minimal set of files necessary to reproduce this failure, ensuring that subsequent phases are faster. I’ve omitted details, but the core notion remains.
00:44:12.680 It determines that two files reproduce the failure, and now it seeks the minimal number of methods required to replicate it.
00:44:35.920 And it's finished! That’s all it takes to resolve a minimal reproduction for this test order dependency failure. You can install it now with `gem install minitest-bisect`, and it's straightforward.
00:44:58.720 MiniTest GC Stats is something Aaron and I brainstormed at Coffee Line, implemented, and released that day.
00:45:21.440 So if your tests are slow, ask yourself: Is it your implementation or GC? Is it because your tests produce too much garbage? When using MiniTest GC Stats, you can determine the reason.
00:45:46.640 Running it with a `--g` flag grants you an output of your top tests by memory allocated. Aaron ran this against ActiveRecord and pinpointed the worst test producing nearly a quarter million objects.
00:46:06.160 You can acquire it now using `gem install minitest-gc_stats`. Thank you!