Metaprogramming
Shaving with Ockham
Summarized using AI

Shaving with Ockham

by Jim Weirich

The video titled "Shaving with Ockham" features Jim Weirich as the keynote speaker at the MountainWest RubyConf 2008. In this engaging talk, Weirich emphasizes the importance of simplicity in programming, drawing inspiration from William of Ockham’s principle that entities should not be multiplied beyond necessity.

Key points discussed include:
- Programming Philosophy: Weirich opens with a reflection on his journey into programming, highlighting the flexibility and creativity involved in software development compared to other forms, like poetry.
- Language Simplicity: He uses Lisp as a key example of a simple yet powerful programming language, illustrating how its foundational concepts (atoms, lists, and core operations) can lead to powerful abstractions in programming.
- Historical Anecdotes: Weirich shares stories from his early career, including his work on projects using Fortran and Forth, emphasizing how those experiences shaped his understanding of language design and software complexity.
- Race Conditions in Multi-threading: He discusses challenges he faced with multi-threading in legacy systems, particularly with shared modifiable data, leading him to explore Erlang as a solution. Erlang's model of immutable data and message-passing is highlighted as a safe alternative to conventional multi-threaded programming.
- Complexity in Modern Programming: Weirich critiques the commonplace acceptance of code complexity and standards that elevate non-simplicity, arguing for the necessity of simplicity in coding practices.
- Practical Implications: He concludes with practical advice for developers, advocating for focusing on minimalism and the identification of core functionalities to keep codebases manageable and effective. He suggests that abstraction should serve simplicity rather than complicate it.

In conclusion, Weirich encourages programmers to resist the allure of complexity and instead strive for simple, elegant solutions that effectively solve problems without unnecessary elaboration. His insights resonate deeply with Ruby programmers and emphasize a core philosophy applicable to all programming languages.

00:00:11.719 All right, it is a real pleasure for me to introduce Jim Weirich. I'm sure Jim feels like I'm a little stalker, but he's my favorite person in the Ruby community. I love his approach and everything he's done; his work is elegant and beautiful. Last year, we talked about who we wanted as a keynote speaker, and I always said Jim. This year, I got Jim, so it's with great honor that I introduce Jim Weirich. Thank you very much.
00:00:52.960 But I didn't mention that there's a story behind that. You see, last year I was working for a company that didn't allow me to go out to conferences; I had to take vacation time and pay for it myself. This year, however, I'm working for Edge Case, and they allow me to do all kinds of things. So, instead of approaching me, Mike approached my boss and asked if I would come out. I only learned about it afterwards.
00:04:49.720 I want to talk about how I got here. I'm going to start at the very beginning. I was a physics major in college back in the late '70s. Boy, that dates me. Don't anybody say I wasn't even alive back then! So, I'm a physics major, and I needed a particular math course that didn't happen to be offered that semester. My adviser said, 'Why don't you take this Introduction to Fortran course? It'll be useful, and who knows, you might enjoy it.' Little did he know!
00:06:01.440 So I signed up for the course, Introduction to Fortran. I went into the class and I sat there, and I knew a little bit about Fortran because, as a physics major, I had to write programs that would graph the results of my physics experiments. I knew enough Fortran to drive a plotter and make graphs. But then the instructor, on the very first day, started writing on the blackboard, and the stuff he wrote looked like this: it had way too many parentheses to be Fortran! I sat in that class puzzled over what he had written on the blackboard. It took me about three classes. I would come in, look, and say, 'Huh, what is this? This is not Fortran. Why am I in here?' Then, on the third day, suddenly something clicked, and something turned in my mind. I said, 'I get it! I see what he's doing!' I’ve been hooked ever since.
00:07:10.400 I took all the computer science courses they would allow me to take and still finish with a Physics degree. I loved it. I remember this piece of code I saw on that day that clicked in my mind, and we're going to talk about this a little bit. Do you recognize what this is? This is Lisp. Giles mentioned this earlier at this conference, and we're going to run with that theme just a little bit. Lisp is a list processing language. You process lists of things, such as a list of fruits: apple, banana, and pear. They're very simple to construct: parentheses surround a list; they're space delimited. Nothing else you need to do to make a list. So, this is a simple list of three elements. This is another list of three elements, and the third element happens to be another list.
00:08:56.480 Let's ignore the funny tick marks for right now; we're going to get back to those. The difference between the first list and the second list is that the second list is also an expression that can be evaluated. Lisp has very simple rules for evaluating expressions. It says, 'Look at the first thing in the list, and if it's a function, apply that function to the rest of the list after you evaluate the rest of it.' So, we apply the function 'member', and if we go back to the earlier screen there, 'member' asks the question: is banana a member of the list apple, banana, pear? The answer in this case is true.
00:10:13.600 That's very interesting, but why do we need ticks? We need ticks because lists evaluate everything in the list. It figures out if the first thing is a function, then it evaluates banana. What's the value of banana in Lisp? It doesn't have a value; so, if you put a tick in front of it, the value of anything behind a tick is that thing, so it acts as a quote mark. We quote banana and we quote the list apple, banana, pear. Why do we have to quote? Let’s give you another example. Let’s surround this apple, banana, pear in the first list with 'setq fruit,' and that sets a variable in Lisp, so now the variable fruit has the value of apple, banana, pears. If we evaluate the list 'member', tick broccoli, and give it fruit, notice fruit is not ticked, so we’ll go grab the value of that; we’ll grab the value of ticked broccoli, which is just broccoli, and we ask, 'Is broccoli a member of the list apple, banana, pear?' The answer will be false, spelled nil in Lisp. We're used to that in Ruby anyways. Nil is false in Ruby as well, so that's not so surprising, and that's probably where it comes from in Ruby.
00:11:22.760 So, the syntax of Lisp is trivial. There are only two things: atoms and lists. We've seen the atoms; they're just identifiers for the most part. Numbers are also identifiers, but we're going to pretend for right now that numbers don't even exist. Lists are just things in parentheses, and you can nest lists. So the first thing is a list of a, b, c, and the second thing is a list of two elements. The first element is a sublist of a and b, and the second element is the atom c. We've seen the 'member' name list before, and this empty parenthesis thing here, that's the empty list.
00:12:31.720 In Lisp, there is this very bizarre relationship between nil and the empty list: they are the same thing. Nil represents the empty list. Another kind of bizarre choice in Lisp, I think. I think Scheme, which is a dialect of Lisp, doesn't have this bizarre relationship, but this is the list I learned back then. So, what can you do with lists? Well, there's some basic functionality: you can operate on a list. There's 'car', which takes the first element of the argument that you pass it. If you give it a list of a, b, and c, it will return a, the first element. There's 'cdr', which returns everything but the first element. You can think of these functions kind of as head and tail. Why they're named 'car' and 'cdr' is a historical accident.
00:13:25.160 The machine they first implemented Lisp on had two registers called the address register and the decrement register. So, 'car' was the contents of the address register, and 'cdr' is the contents of the decrement register, a little historical trivia for you. There's a third function called 'cons' that means construct, and we're going to construct a new list out of two things, and the list's head will be the first argument, in this case, apple. The rest of the list—the tail of the list—will be the list of a and c. So if you 'cons' apple on a, b, c, you get apple, b, c. Very simple: take apart a list, build up new lists, very simple list construction operations. We also have a function called 'equal' that compares the value of two atoms. It only works on atoms; if they're the same atom, 'equal' returns true. If they're different atoms, it returns nil. If you try to use 'equal' on lists, you get funky results; sometimes it returns true, and sometimes it returns nil.
00:14:36.200 So it’s really hard to tell when, so you just don’t use ‘equal’ on things that are not atoms; just one of the rules of the game. What else do we have? We got a function called 'atom' that asks the question, 'Is my argument an atom?' If it is, it returns true; if it's not, it returns false. We have five functions: 'car', 'cdr', 'cons' for list manipulation; 'equal' for testing equality of atoms; and 'atom' to test whether you're an atom or not. On top of that, we have some special forms. These are not functions in Lisp; they are handled in a special manner. The first one we want to talk about is something called 'cond,' which is the conditional in Lisp. 'Cond' takes a list of ordered pairs, and the pairs have a condition and a value. 'Cond' will go through and evaluate the condition, and if the condition is true, it returns the value. If the condition is not true, it moves down to the next item and sees if its condition is true and returns its value. If not, it'll continue going until it reaches the end of the list, in which case if nothing is true, it will return nil at the very end.
00:15:55.640 So it's like a big if-else structure. The key is that it's not a function in that you only evaluate the value if the condition is true. In Lisp, functions always evaluate the arguments; in 'cond', you do not always evaluate, so it's not a function, and that's an important distinction. Lambda is a function abstraction. We've seen Lambda in Ruby, right? Same thing: it creates an anonymous function with arguments and an expression that gets evaluated when you call it.
00:16:37.680 Now, this is page 13 from the Lisp 1.5 Programmers Manual. This is the Lisp interpreter minus some helper functions that are pretty trivial. Oh, we've got power! Okay, I have to walk over and touch the keyboard occasionally. Okay, cool! Um, this is the Lisp interpreter from page 13 of the Lisp Programmers Manual 1.5. Ancient, ancient version of Lisp, but if you notice, you see things like 'atom' and 'equal' in the definition of this pseudo code. And even though it's not written in parentheses, this can actually be transliterated to the parentheses style of Lisp very trivially. So essentially, this is the Lisp interpreter in Lisp itself, and the thing to point out is that you can see the 'car', 'cdr', 'cons', 'equal', 'lambda', and I'm just going to skip over 'quote'. 'Quote' is how the backtick or tick is handled, and 'cond' is right there. So this Lisp interpreter handles all those primitives I've shown you.
00:18:50.480 Now the interesting thing about these five functions, the two special forms, and the list way of applying functions is that this is a Turing complete language, which means that you can calculate anything that any other programming language can calculate. This language, composed of five functions, two special forms, and an evaluator, can calculate anything as well. It is Turing complete. Very simple constructs put together in amazingly interesting ways. Now how can you work this? We haven't mentioned numbers at all, right? We haven't mentioned anything like 'ands', 'ors', or 'nots', or anything! So how do you put these simple things together to get that kind of power? Let's look at this: consider the logical 'not'. Let's write logical 'not' as a function. Well, we use Lambda to build functions. It takes an argument, a. We run it through 'cond'. If a is true, then the answer to 'not' will be false or nil. So 'cond', a, nil, meaning the answer is if a is true. If a is not true, we'll go to the next pair, and it is true. T is true, so the answer is true.
00:19:53.520 So if a is true, the answer is nil, if a is nil or not true, the answer is true. So, it's the logical 'not' operation written using nothing but Lambda, 'cond', and some values. The 'and' is the same way: if a is true, then 'and' is true if b is true as well; otherwise, 'and' is false. 'Or' is written the same way down here. I'm not going to run through the logic, but that actually works out to be the 'or' function, so we can build up more interesting pieces of logic from the basic pieces that Lisp gives us.
00:20:53.000 How would you use this? Well, this is the function 'not'. Right? Well, Lisp says if this appears in the first position of a list that you’re trying to evaluate, that is the function to use. So all we need to do to call this function is to construct a list and pass it an argument. Now, it's a very verbose function name, but this is how you call an anonymous function without a name in Lisp. The blue part, the Lambda part, that's in blue is the first element of the list; that's the function you pass in at the T, and you apply the Lambda to the T and you get nil. You do the same thing and pass it a nil, and you would get true. So we can call these Lambda things, but you know what? If I have to go around and write this out every time I want to write a not function, that's going to get really tedious. I wonder if there's a way that we can attach this Lambda anonymous function to a variable somehow and reuse it.
00:22:11.520 Well, turns out that's not too hard. Consider this expression using 'and', 'or', 'not' function names that we're used to. How can we get our definitions tied to them? Well, we'll embed them in a Lambda. The Lambda takes the arguments 'not', 'and', and 'or'—three separate arguments—and then we make this whole thing, this whole Lambda, the first element of a function that has three arguments. The first argument is going to be the Lambda for 'not', and when you call this function, the Lambda for 'not' will be bound to the parameter 'no'. Likewise, arguments two and three will expand out to our 'and' and 'or', and they will be bound to the names 'and' and 'or'. So when we evaluate the gray part, the 'and t or nil', we will look up and we ask if it has a value; the value is yes, it has the value of that first argument Lambda which is the 'not' function. We apply the 'not' function, or excuse me, and would be the second function, but you get what I mean. Everything ties together.
00:24:10.680 Now, if you had to actually write a system like this, it'd be really, really tedious. Saying something is Turing complete does not mean that it's not tedious to write code in, but if you work in a real Lisp system, what you're really working with is the gray part and that big Lambda, and all the other stuff is kind of there already for you to run. It’s as if someone had written all that for you already. You can think of a Lisp system in this manner: taking tiny building blocks and putting them together in interesting ways. There are three pieces of this that I noticed when I was looking at Lisp. First of all, it's got a small core: just atoms and lists; that's the only thing it works on. It's got simple rules for manipulating that small core: the five functions, the two special forms, excuse me, the five functions here. The next part is it's got powerful abstractions for building new things out of those simple rules and small core, and the Lambda being the main abstraction we use in Lisp. That is why Lisp is such a powerful language.
00:26:02.480 The basis is trivial; it's simple. Anybody can do it. I wrote a Lisp interpreter—a page 13 I wrote that in Ruby the other week in about an hour, and I had a Lisp interpreter. It didn't do much. It's on my blog if you want to go check out a Ruby version of that page 13. That was kind of fun. I always wanted to write a Lisp interpreter. So, powerful abstractions—and that's what takes simplicity and makes it powerful—is adhering to these three principles. After college, I went to work down at RCA for the missile test project down at Cape Canaveral. I was working down there during the first few years of the shuttle launches. We didn’t do software on the shuttle, but we did software for the radar ranges that would track the shuttle and track various launches down there.
00:27:29.480 I was going through my files the other day, and I found this: this is the graphical configuration display and logging system's detailed design specification, a network communication protocol. This is a design spec I wrote while I was down there in my very first job out of college. It is humbling to read this. Here’s the thing: I just took a course on Ethernet, right? I learned about the seven network layer protocols. So, of course, our system had to have all seven layers of protocol in it. This is the state of networking back then; the physical layer, the data link layer, and the network layer were provided by a vendor. Everything else above that, we had to write ourselves.
00:28:19.720 TCP/IP was invented, but it really hadn't broken out of the Unix world very much, and we didn't have any Unix machines at the time; we didn't even know about TCP/IP in particular. So, we were working on this. The Cordial system was designed to take information and display it to really important people who had absolutely nothing to do with the launch, like generals—not the people who actually ran the launch like the launch controllers or the radar controllers or the people in the control room. They didn’t need this stuff; this was for the generals sitting back, and they wanted to see what was going on during a launch at any time, so we gave them status information.
00:29:36.640 Now, this is a snippet out of the spec, and I just want to show you how good I was at writing specs even that early in my career. I had it down! I can’t believe this. This is an algorithm that we would use at some level in the protocols we were designing. I have no idea if this works or not; it has no unit tests. We didn’t even write it in real code; it was pseudocode. Why did I do that? Because I didn't know any better back then.
00:31:07.920 Has anyone ever...you who have used 5 1/4 inch floppies? Okay, good! It had assembler; I think it had Basic too, but I wasn't a real fan of Basic at the time. And it had—and this is critical—this is the thing to remember. It had a memory-mapped color graphic system. And actually, while I was reading about this system on that ancient Computer Museum website, it had this little ditty that I had totally forgotten about but is absolutely correct. When you deleted a file, it repacked all the remaining files back to the front of the disk and it used the 8K of screen RAM for a buffer to do it.
00:31:38.320 And it’s absolutely right: when we would delete a file, the screen would go crazy with colors as it moves stuff using the screen RAM as a buffer. I'm really thankful that site's there to bring back these wonderful memories. This is one of the computers that we would have to display status information on. The other system that we used was a PDP-1, and it had a graphics terminal that was really fancy, but you communicated it through an RS-232 port, and you sent it very specialized graphics commands. So we had two different kinds of display systems: absolutely totally nothing in common about them. One was memory-mapped; one was kind of a command-based vector system—different architectures, PDP-1 versus an 8080 microprocessor, no language in common.
00:33:23.920 If you've never heard of Forth, it’s a very simple language. Here’s a string of Forth code: 6 1 +. The first thing you notice is what is the plus doing at the end instead of in the middle? It's because Forth is reversed Polish notation, just like HP calculators, which I had one at the time, so I thought this was really cool. We have a next pointer that points to the next instruction to be executed, and so we execute that instruction, which is a 6. Six pushes a value of six onto the stack, and the one numbers push themselves on the stack. So, one pushes itself on the stack. Then plus takes the two things on the stack, adds them together, and pushes back seven.
00:34:05.640 So it's a stack-based language, pushing things on and off of the stack. Just as you execute such a trivial language, we'll talk about this 6 1 +. Suppose I want to take this 1 + and refactor this. Forth was the language I learned about factoring and refactoring, and it’s a great language for that. We wanted to refactor this into a...well, a function. The Forth word for that is a word. We want to create a word that implements the 1 + operation. We'll call it add 1. So, to do that, to define a word in Forth, use a colon, which introduces a term, and a semicolon, which terminates a word. The first thing after the colon is add 1, and then you just put the code you want to execute in the middle. No parameters, nothing special; very lightweight.
00:35:02.760 Forth definitions longer than about three lines were really, really long. Definitions were probably about a line long. So, you built up your program in little, itty-bitty pieces. So how was this implemented? The green part created an entry in our dictionary. Forth was composed of words. Words were in the dictionary; they were arranged in vocabularies. The dictionary entry started with a link to the next entry in the dictionary, so 'add 1' had a link to whatever word came after it, whatever word was defined immediately before it.
00:36:05.760 Then, it had the string add 1 in it, and then after that was what was called a code word. That code word always pointed to machine instructions. We’ll get back to the code word here in a second. After that, we have the address of the code word of the word that implemented the number one. One was actually a function in Forth. Interesting! After that, we have the address, oh excuse me, and the code word of one points to machine code as well, and that's critical. After that, we have the address of plus, which points to the code word of the word that defines plus, and after that we have the address of semicolon, which points to the machine code of the word that implements the semicolon action, which is return from this function. So here we have push one on the stack, add it to the top two elements on the stack, leave it there, and then the next is the jump back into the interpreter step.
00:37:48.560 So it was trivial to build up these tiny little definitions in assembler. Okay, so what did a Forth system look like? A Forth system had a handful of primitive words, like plus, minus, fetch from this address, store at this address, print this number out—very simple words defined at a primitive level. Built upon those primitive words, it had higher-level words like if, while, then, emit, print a number—just basic operational things. Now, the key thing is to a programmer, the primitive words and the higher-level system words looked identical. As a programmer, I couldn't tell whether I was calling a primitive or not, it was no difference to me.
00:39:55.240 Above the system words, we had a whole slew of vocabulary words. We had an interactive compiler, we had virtual storage, we had a screen editor, we had an inline assembler, and this was all running on a machine that has less memory than your wristwatch. My very first computer was a single-board computer that had 4K of memory, and I was running a Forth system in 4K of memory. So, very tight, very concise, perfect for the machines of that day. So, the thing to remember is that these primitive words are machine-specific. If I wanted to port Forth from one machine to another, all I had to do was rewrite the primitives, and everything else was system-independent. It didn't matter; they were written in terms of the smaller primitive words.
00:41:43.760 In fact, when a new machine came out, it was often Forth that was the first language that was available on those new machines. I think IBM even had some machines that used Forth as a boot-up language; it was in ROM. You'd start the machine up, and there’s Forth, and you would use Forth to load the rest of the operating system. A lot of the Nintendo console games were written in Forth back then. So, how does this solve our graphics problem? Well, it's very simple: we just had to add some high-level user-defined graphics words, and we had to write a graphics driver that was specific to either the memory-mapped driver or the terminal-based driver.
00:43:36.360 We wrote those as primitives down at the bottom level. We wrote higher-level machine things, and it actually worked out great. We implemented the whole thing as far as graphic drivers, and on both machines, we were displaying the same thing. Now, unfortunately, I didn't stick around with the Cordial project to the very bitter end, so I don’t know if it was an ultimate success or not, but this piece of it actually worked out very nicely. So again, a small core, simple words, simple rules, just had a two-instruction interpreter, and a powerful way of building abstractions upon the lower words. Actually, it was a lot of fun to work with.
00:44:35.480 If you have heard of it, lately there has been some revival in the interest of Forth. And I think the language Refactor is essentially Forth written on top of some really fancy graphic primitives. It does some really fancy things, but it’s Forth underneath. So, it's a cool thing to check out. After leaving RCA down at Cape Canaveral, I came back to the Midwest. I wanted to get closer to family. I worked for the General Electric Aircraft Engines company in Cincinnati, Ohio. We made jet engines. I love this jet engine; it's cool! Have you ever seen this? This is called a UDF—an unducted fan engine. If you've ever looked at jet engines, military engines are long and skinny, commercial engines are big and fat.
00:46:30.960 The reason is military engines want power, so you make them long and skinny to get that power. Commercial engines want fuel efficiency, and if you flow a lot of air around the jet engine, it increases fuel efficiency, except it takes a lot of weight to make that cowling and the fans around it to make the air flow around the turbine portion of the engine. NASA had a great idea, and they collaborated with General Electric to build an unducted fan, which had rotating blades on the outside of the engine that would pull the air around the engine, keeping the central core a little slimmer than what was needed, reducing weight and increasing fuel efficiency at the same time. Really cool engine! Never caught on; I think it had something to do with those spinning blades on the outside.
00:48:11.520 They actually conducted fuselage penetration tests. Ever see on MythBusters how they have those air cannons that launch things? That’s what they did! The blades spinning at full speed would easily pierce entirely through the entire fuselage of the engine, so that might have something to do with why it wasn't adopted. But when I started working there, the big thing in jet engine design was digital controls. Back in the old days, jet engines were controlled by a series of cams and levers. So when you said 'give it more fuel,' mechanical things engaged inside the engine and they would lift levers and would increase the fuel flow. They would carve these cams out, design these cams, so you would get just the right amount of fuel with just the right amount of airflow to get the right kind of power requirements. It was tedious. It took incredibly long to get those cams designed right because you’d have to build an engine, try it out, and if the cams were wrong, take the cams back to the engineering department, reg grind them down, and bring them back.
00:49:28.320 The digital engine controls would keep those schedules internal in a computer, and the computer would monitor the airflow, pressure, and temperature. If you said, 'Give me more power,' the computer would say, 'Okay, this is what I have to do at this set of temperature, pressure, and whatever else was monitoring.' It was trivial to change the schedule in the computer. A guy sat at a terminal connected by an RS-232 port to that thing and typed in the new fuel schedule. Wow! The engine was running! I was sitting in a test cell once, and he missed a comma in the fuel set schedule, and the engine went, and the guy sitting at the controls just shut it down really quickly. They were not allowed to change the fuel schedule during an engine run after that, but it was still easy to change. It was a little too easy at that point.
00:50:57.960 This proves out, so we would get digital engine data from the FADEC, and the software I wrote would talk to the FADEC and pull that data down and we would convert that data to analog signals. Isn't this typical of a legacy system? We have beautiful, precise analog data, and we convert it—or precise digital data—and we convert it to analog so we can plot it on charts. We would take the digital data and run it to what we called digital to analog converters. DAX and DAX are essentially—they were simple; you write a value to the DAC between one and 100, and it sets a voltage out.
00:52:00.120 The output of that, the grapher would sense. So we had a DAC software driver that would drive all the DAX in the system. We had a table that was updated with the new digital data that would come in, and we had a data update software that would update this table. I want to point out that these two pieces of software were in separate threads, and that the data table in between them is what's called shared data. Now, what happens when you have two threads reading and writing, and at least one of them modifying shared data? You have the potential for race conditions. What’s someone say? Fun! Actually, it was fun. This was an exciting project to work on. This was the first time I had worked with threads, and I had gone to classes on it. I knew about all the all the things involved; I knew what to do: to set up locks around these two accesses to prevent them from overwriting each other.
00:53:31.760 And this worked great, except that we had to update those DAX every 20 milliseconds—that's 50 times a second, which back in that day was pretty fast! In fact, our software was barely fast enough. We couldn't ask it to do anything more without it running out of time to do all this updating. We did a lot of analysis and profiling, and we found out that these system locks were causing a lot of slowdowns. So, the idea was: the system locks are too slow; let's not use them! Now, I know you're supposed to be very careful about shared data, and we knew this, so we sat down and we analyzed the situation.
00:55:11.640 We went down to the machine instructions, timed the machine instructions, and we filtered it down to this particular fashion we calculated that it was very, very reliable way to update our software. In fact, there was one chance in a million of it failing. One chance in a million! You’d ride an airplane that had one chance of a million of failing, right? I mean, it's way higher than that in real life. One chance of a million seems minuscule! So we entered the test phase, put the software in the test cell, and started using it and trying it out—not on real engines yet, but we were just testing it. We found that the system failed once a day, sometimes twice a day, but not more than that; just about once or twice a day.
00:56:59.840 Let’s do some math, okay? Fifty times a second by sixty seconds in a minute by sixty minutes an hour by eight hours in a working day is about a million and a half. Once or twice a day! Doggone it! Just don’t run the system that long! Fortunately, we found this out in testing. We went back and revised the way we were doing it. We came up with a way of updating the data tables without using system locks that was 100% reliable, and I don't even remember how we did it! Now we came up with some trick, and it actually worked out very nicely. But this is a point—threaded programs are hard, and they are hard in a way that people who don’t write threaded programs on a daily basis fail to comprehend.
00:58:37.280 The reason they're hard is that shared data and you're updating and the possibility for race conditions that happen once in a million times. Your unit tests are not going to show once in a million type of failures; they'll only show up at the worst possible time or a case in the under testing. So shared data can be bad—too many variables. If you want to write concurrent and we want to write concurrent programs now because hardware concurrency is the big thing. It's what's getting us that extra boost of power now just throw more cores in the system and take advantage of it. But writing programs, I was writing a threaded program on a single processor. Think about the race conditions that are inherent when you have two processors running at the same time on the same memory. How do you deal with that? How can you write reliable threaded multiprocessing programs and take advantage of that extra concurrency?
01:00:19.000 How many people have played with Erlang? Anybody in here? Okay, I've only dabbled in it, so this is going to be a very high-level look at Erlang. But it's very, very interesting what they've done with that. Erlang has four types essentially: atoms, which are lowercase identifiers; variables, which begin with an uppercase letter; tuples, which are curly braces; and lists, which are square braces. The difference between a list and a tuple is that tuples are always the same size. If I have two tuples: A and B, you cannot concatenate anything to that; it's always going to be two elements. Lists can grow like in Lisp, we can cons one to a list; you can do the same thing in Erlang with its square bracket lists.
01:01:52.520 Now it has an operation that looks an awful lot like an assignment, so if you see this in an Erlang program, it assigns the tuple JY, comma WI, to the variable name, and that actually works. However, if you do this—assign John Doe to name—it fails, because name already has a value and you cannot reassign variables in Erlang. Variables don’t change values! I wonder if people told the guys in Erlang that variables change values; I have a little naming thing here that I'm having trouble with. That’s just the way Erlang works, okay? So you cannot change the value of a variable once it’s been assigned. The thing you're doing right there is not really an assignment at all; it's really a pattern match. You match the thing on the right-hand side against the thing that's on the left-hand side, and if it's an interesting thing like a tuple here, the atoms have to match exactly.
01:03:27.120 And any value that matches against a variable will be stored into that variable, assuming the variable is not already assigned. If the variable already has a value, it has to match that value, so it's a complex pattern matching operation. It's not assignment at all; it's pattern matching even though it looks like assignment. So this last operation, since the name contains Jim, comma Wich, and the Jim matches the Jim, the WI will be assigned to the variable last. So it’s a way of pulling elements out of a tuple. Cool! And you do the same thing with lists. If you have a list with this vertical bar between it, and it will match the head and the tail of the list on the right side.
01:04:38.400 So this thing will pull out A and stick it in H, which is the head, and pull out the BC as a list and stick that in T for tail. So, it's kind of like CDR and CAR kind of combined into one operation. So you can pull lists apart; you can put them together just like you can in Lisp. Now, this is a function definition in Erlang. And even though it looks like it's function definitions, it’s really only one function definition. Let’s see how this works. Suppose you call 'member' and you give it the atom B and the list A, B, and C. Erlang will go through the list of member definition declarations here and say does this match the first element?
01:05:59.440 Well, the first element has an item, which could be anything because it’s a variable, and an empty list. Well, the second arm is not an empty list, so that won't match. However, the third one will: the second one doesn’t match because the head of the list has to be the same as the first argument that doesn't matter. So we go down to the third one, which says the first argument can be anything, and the last argument has to be a non-empty list that has a tail. So the tail goes into T, we throw away the head of the list because we're not interested in it, and then we recurse and call 'member' with the item and the tail of the list that got passed in. So now we're calling 'member' again a second time, but this time with a different argument list. We go through, and the first one doesn't match because BC isn't empty; the second one matches because the head of the list is now matching the head or the first argument of the member function, so we're going to trigger the second thing and return true.
01:07:58.240 So the value of asking the question is B a member of the list A, B, C? is true. Bizarre! No loops, no assignment statements, no if-then-elses—it's all done with pattern matching and recursion. Wow! Well, the big thing about Erlang is creating processes. So let’s see how that works in Erlang. Let’s say we have a client and a server, and the client is going to form up a message and send it to the server, and it does it like this: here’s the double function. It takes two arguments: a PID, which is a process ID; we identify any process in Erlang by its PID, and that gets passed in P. It’s going to be the PID of the person we want to send the message to, and we pass in a value N, which is going to be a number.
01:09:07.760 Then we use the bang operator. Here, PID bang! And PID bang constructs the tuple. Self is my own PID; it’s the PID of the currently running process. So we construct a tuple that has my PID, self, as the first element of the tuple. The second element of the tuple is another nested tuple that contains the atom double and the number that we passed in as an argument. And that tuple gets sent to the server PID. So, PID, comma double, comma 3 in a nested tuple get sent to the server and then we sit and wait. We wait for an answer to come back, and we want to receive a PID and then a tuple with the word 'answer,' the atom answer, and the result. The result of this double function is going to be the result that gets sent back to us.
01:10:04.920 Well, we got to write the server-side too, and it looks similar but different. Okay, the server sits in a loop and we receive, we sit there and we wait for a message to come in, and the message has to be from—that's the from PID—that'll match the from PID. And since the variable the PID will go into that has to have the word double in the second tuple, and it has to have a number. If it matches that, we send back to the PID identified in 'from'. We send back to it the answer that has our PID; this is the server PID now, the atom answer, and 2 * N. So we double N and we send the message back to the client. So we send in a 3, we double it, we send back an answer of 6.
01:10:31.720 Then we sit in a loop and we reparse. We just lose volume. You might bump your mic there, I don't know. Testing, testing, testing, testing. Testing... to be plugged in? May battery light? Does that mean something? Yeah, that’s probably flashing.
01:11:14.640 I can go longer. Mike's been waiting a year for this and he's going to get every single minute he can out of it! Mike, everybody will be gone, and I’ll still be talking to you! So we can see in Erlang it’s very simple—a little bit strange syntax thing going on there, but it’s actually pretty simple to set up a client and a server that we can send messages to and receive messages back.
01:11:46.640 And it happens asynchronously. In Erlang, we don’t care whether the client is on the same machine, on the same virtual machine, on the same network. As long as it’s somewhere that we can get to on the same box or on the same network, this works across boxes. When we call double, we don’t care where it sits; we just want that number being doubled and sent back to us. There are all kinds of patterns available in Erlang to handle things like this, and we're just barely scratching the surface.
01:12:52.440 But simple message-passing primitives and simple recursion, simple pattern matching. Now, processes in Erlang are fast, and these are some numbers that came out of the Programming Erlang book. If I remember correctly, 3 microseconds CPU time, 9 microseconds wall time to spin up gazillions of processes. This is the per-process spin-up time, spinning like thousands and thousands and thousands of processes in an Erlang program. So processes are cheap in Erlang! Erlang is not object-oriented. In Erlang, the processes are our building blocks rather than objects. Just like in Ruby, we send messages to objects; in Erlang, we send messages to processes.
01:13:41.960 And I think there's some parallel in there—a parallelism in there that I don't quite grok yet because I haven't used Erlang enough. But it’s very, very interesting. Now remember what was bad about my example of a threaded program? What was the killer feature? Shared data! Actually, technically shared modifiable data. Now in Erlang, you have no modifiable data. I cannot reset a variable once it’s been set. There is no way to modify data; there's no way to share data that can be modified between processes. No external locking required; it all happens automatically through the primitive send and receive operators built into Erlang.
01:14:38.520 So Erlang totally avoids 99% of the problems involved in writing concurrent programs. Well, maybe 90% 'cause you still got to design the thing to work on processes, but it eliminates a lot of the problems. Again, we see a small core: atoms, tuples, lists, and processes. We see simple rules—no shared data—and we see powerful abstractions. In this case, it's messaging and pattern matching that we’re using for our powerful abstractions.
01:17:57.560 It all kind of depends on how you count languages or notations, right? The number is not going to be exact, but this is how I counted it. We see the <div> and the <slash> <div> and the <br> and the <a> tag—okay, so that’s HTML. We identified that. The yellow stuff, the display equals and then a little later we see a none—that's CSS. The purplish-bluish stuff that's tag live, which is a Java library that lets you evaluate expressions nested inside of all this stuff—there's a lot of that sitting there. And I had someone say, 'But that's just XML, which is the same as HTML.' They say yes, but it’s improperly nested inside the other HTML, so it doesn’t count. It’s different! It’s a different notation! I mean, you cannot put that cCo in if right there in a reg within HTML; it’s not—it’s a different language! Totally nested! In fact, this is bizarre: start at the top; we have a single quote, then we have a double quote. Go down the next line; we got another single quote, but it’s not a matching single quote; it’s a different single quote!
01:19:42.080 And follow down; now we got another double quote that doesn’t match the original double quote we first had. Now we got another double quote that does match now we got a single quote that closes the first single quote, and then we have a double quote. What? Okay, that matches the currently open double quote, and then we have a single quote that closes the original single quote! My gosh! I have no idea how you would write a parser for this! I would have no idea how to write the parser for this! There’s also Java down here at the bottom. We request the get context path—that’s Java. The <percent> angle brackets on there—that’s not really a language, but it’s part of the JSP notation, so I’m counting that as separate.
01:21:09.239 And of course, we got JavaScript embedded in there too! So, yeah, I counted seven languages. It kind of how you count it! This is a mess! Aren't you glad? I'm kidding, not kidding. This is a real page from a client I worked at. Sanitized! Okay, so you can’t tell the domain anymore; so I’m not going to put blame on them, but this is common in the Java world. I’ve been working on a Rails project, and it’s not that far off from a Rails project either sometimes! So, you know? Complexity, complexity! We are plagued with complexity in all the software we write.
01:23:11.440 I'm not going to bother to talk about C++ or Java generics; just imagine for yourself what they're like. So how do we come to this point? How did we get to the point where this complexity exists in the systems that we're writing? I think there are two basic reasons. First, we as programmers, like the quote from Frederick Brooks, we love to build our castles in the sky—our castles built on air—and we love it! This is what we’re here for, is to do this kind of stuff! So if it's complicated, that’s fun for us.
01:24:08.760 Right? This one took me a while to get! If you—I think most people have got it now—but he’s a little bit intent on solving a problem that he doesn't look around and see there’s other solutions involved! I think we get that way sometimes, which leads us back to William of Ockham. And he said—I don’t know what he said—no, this is what he said: 'Entities should not be multiplied beyond necessity.' Or if we put this in programmer terms, the simpler solution is the correct solution in just about every case.
01:25:29.120 So I love this quote. This is Tony; he was in computer science before I was. He’s the guy who invented—or discovered—the quicksort, and he worked on parsers and compilers and, actually, in the very early days of computing, built one of the very first Algol compilers. He said this: 'There are two ways of constructing a software design. One way is to make it so simple there are obviously no deficiencies, and the other way is to make it so complicated there are no obvious deficiencies.' The first is much, much harder than the second! So, quite simply, it is hard! We live in a society that glamorizes complexity.
01:26:56.680 I was reading a rant the other day. It was one of your typical Java versus Rails type rants, and he said, 'Ruby will never become mainstream because I have written a 100,000 line Java program, and Ruby could never handle that.' Well, first of all, Ruby would never need—you would never need 100,000 lines of code in Ruby to do the same thing. But complexity is what they’re thriving on. And we live in a society that counts lines of code as a productivity measure! How crazy is that?
01:28:01.400 So when I go in and refactor things down to half the number of lines of code, it might be negatively productive! That doesn’t make sense! Okay, don’t get smug. So we live in a society that values complexity. The second issue is that we as programmers love to abstract our solutions. I don’t think the world needs a foodata Slw word processor, and yet we write a generic parser that will parse everything in the world.
01:31:42.840 I haven't had a chance to dive in and use it yet, but the philosophy behind it—the idea of everything is Ruby and keep it simple all the way down and consistent all the way down. Remember: small core, simple rules, powerful abstractions. And I see that in Rubinius all the time, so my message for you—the most important thing I have to say to you tonight is this: simplify what you’re doing in your code! Don’t get so head down in solving this problem that you don’t see the simpler solutions around you. And the key to simplifying is to identify that small core of functionality that really solves your problem, develop the simple rules for that small core, and use powerful abstractions to get there.
01:35:34.560 I’ll take—I’ll take questions.
01:36:05.560 See, my message was so simple there are no questions! My brain’s like at this point! And I know mine too! I’m going to say it again: I really thank you guys for being out here for so late! This is brain-draining for me and I have the adrenaline going on my behalf. So I thank you for staying awake and listening. You do have a question up there.
01:36:35.560 I don’t have a question, but I have a comment about the VAR variables. If you think about them as mathematical variables in the equation you have—they don't change values there! So if you want to change your talk to not make fun of the developer but get the point, you can! You could change it.
01:37:19.440 But you can still make, you know, that’s a good point! Because I know when I first started learning Fortran way, way, way back before you were born, saying X = X + 1 just blew my mind! What does that mean? Yes! Although it's fair to say that the variable does change at once from an undefined state to whatever its value is if you look like that!
01:38:12.480 Yes, we will grant that point! I'm going to Yes one more comment! In your example with the 20 languages, there was actually a PA error in the CSS. Was it the single quotes in the CSS? So sign needed to be a code on? Oh right! Oh, it did display colon! That was dis—yes it did! You’re right! And I actually did copy that from a real system, so I don’t know what I was—I did genericize it a bit, so that may have been a typo. We're not going to blame this; it might be my fault!
01:39:17.440 But yes, that's real! Just a quick comment! It is probably fair to say that in Ruby, constants are.
01:40:16.960 Oh, Ruby constants are! Yes, that’s very true. Anything else? Yes!
01:40:26.640 In terms of simplification, where do you lean towards like meta-programming to make things drier versus magic? When the meta-programming’s kind of—I personally love meta-programming.
01:41:00.080 And I will confess: at least I love Lisp! Lisp was my second language—okay, after Fortran. I love it! But in terms of simplicity, you know, I will write meta-programming. You have to to play with it and do it! But in a real program, when I want to keep it simple, I say does this solve a problem I’m having, and if meta-programming solves a problem and makes the entire solution simpler, then it is the right answer!
01:41:48.800 If you’re doing a whole bunch of meta-programming to remove one line of code, then I usually say, 'Joe, that's not a good idea!' Truthfully, I do that too, so it’s not! That's not just a thing on Joe, but he’s—you gotta pair program with this guy sometime! It's fun! Time for one more? One more comment? Yes!
01:42:28.840 Are you coming to the Hack Fest? You know what? I think I'm going to grab some food and I'm going to come up to the Hack Fest. I'm going to simplify all your code!
Explore all talks recorded at MountainWest RubyConf 2008
+28