Just-In-Time Compilation
Implementing Virtual Machines in Ruby (and C)

Summarized using AI

Implementing Virtual Machines in Ruby (and C)

Eleanor McHugh • November 10, 2016 • Cincinnati, OH

In her talk at RubyConf 2016, Eleanor McHugh delves into the intricacies of implementing virtual machines (VMs) using Ruby and C, which cater to both philosophical and technical dimensions of computer languages. The session aims to demystify the concept of virtual machines for attendees who may have some exposure but lack deep understanding. Throughout the presentation, Eleanor highlights the significance of various components necessary for modeling a computer-like machine in code.

Key points discussed include:

- Understanding Virtual Machines: Eleanor explains what a virtual machine is and how they are essential in various programming languages and game emulators, often without users realizing it.

- Machine Philosophy: The talk touches on a whimsical view of machines having opinions, emphasizing the need for human empathy during the creation of software.

- Types of Virtual Machines: She outlines various VMs like the JVM, .NET CLR, SECD machine, and C virtual machine, and their roles in programming language execution.

- Hands-on Demos: Eleanor showcases practical examples, including building simple VMs using C and Ruby, where she introduces basic operations like push, add, and print.

- Memory Management: Memory challenges in programming are discussed, emphasizing the importance of handling and storing state, stack management, and the execution flow through dispatch loops.

- Optimization Techniques: She shares insights on optimizing code execution through techniques such as direct threading, explaining how this can lead to efficiency gains.

In addition to theoretical discussions, Eleanor shares anecdotes from her programming experiences and nostalgia surrounding older languages and platforms, highlighting the evolution of computing practices. She creatively relates modern programming challenges back to foundational concepts found in retro gaming and older languages like BASIC.

The talk concludes with Eleanor encouraging developers to deepen their understanding of virtual machines and their implementations in various programming languages, suggesting that deeper knowledge could lead to improved programming practices and innovative creations. Overall, attendees are left with a blend of philosophical insight into machines and practical knowledge on virtual machine implementation, fostering a richer understanding of the subject matter.

Implementing Virtual Machines in Ruby (and C)
Eleanor McHugh • November 10, 2016 • Cincinnati, OH

RubyConf 2016 - Implementing Virtual Machines in Ruby (and C) by Eleanor McHugh

Most of us who've played games or worked in any one of a number of popular programming languages will have used virtual machines but unless we've taken a course in programming language design we probably have only a loose idea of what these are and how they work.

In this talk I'll look at the various parts necessary to model a computer-like machine in code, borrowing ideas as I go from real-world hardware design. We'll use a mixture of C and Ruby as our modelling languages: C is the lingua franca of the VM world whilst Ruby is the language which brought us monkey-patching...

RubyConf 2016

00:00:15.600 Excellent! I can't see any of you, which is a bit of a relief today because there are people in this audience who are going to know what a pile of rubbish I’m about to talk about.
00:00:20.860 So, wherever I go, I find people build VMs, and I can just wander around conferences and dabble in a bit of this and a bit of that. They taught it to you in university, and today isn’t really going to be that talk because it’s kind of mad.
00:00:27.279 Hands up, anyone who has ever thought it would be a good idea to write a virtual machine in Ruby? Okay, there are not a lot of us. Now, hands up anyone who has ever seen any random talk I’ve given on this subject in the past. That's good!
00:00:36.160 I’m trying to be more artistic as I get older and actually include more human-friendly bits in my slides. I promise the wall of code effect will kick in soon.
00:00:41.260 We're going to build machines for the simple reason that we talk to machines, and machines talk to us. We spend an awful lot of time worrying about process and testing, trying to force the machine to do what we told it, but we spend very little time asking machines what they think.
00:00:48.160 That’s a bit of a strange point of view because if you think about it, you can model everything as a machine. We’ve got a whole pile of parts as a set of machines bound together in an overarching machine.
00:00:58.960 Machines obviously have opinions because I’ve got a hell of a lot of opinions. We like to think philosophically and wander around, talking in all this lovely language about thinking stuff, but I think our machines need some good engineers and some good analysts. And I do mean psychoanalysts in a lot of cases.
00:01:04.268 A lot of machines are born buggy and they need as much love and comfort, and empathy from us as they can get in their little silicon diodes. So this is sort of a talk about machine philosophy.
00:01:12.820 I promise this is the only part that’s going to be excruciatingly human. I’m going to frame it in a language we will understand: Ruby. And we’re going to love learning about machines today.
00:01:20.140 Hopefully, I’d say a thousand flowers bloom, but I can’t help but think of nuclear bombs going off whenever anyone says that. I don’t really want this audience to run away and build a thousand digital nuclear bombs and unleash a thousand awful languages on the world.
00:01:28.990 Unless, of course, you do want to do that! So, I’m going to do that by making some machines of our own. There are images of machines.
00:01:35.370 We’re going to build meta machines today. Who here has played with system virtualization—anyone? I could filter you out on Parallels, VMware, or VirtualBox users, but most people think of that when they think of virtualization. Or maybe they think of containers and Docker, which is a sort of lower-level abstraction.
00:01:42.130 We’ve all at various times played with hardware emulation, and this is the point where I now do what you’re not supposed to do and break the third wall.
00:01:47.799 There is an entire world of online game communities obsessed with playing games from the 80s—the whole retro gaming culture. They rely on emulators to do that, which means somebody has to implement old machines in software to make them work like old machines.
00:01:54.850 I still run Si on a Series 5 environment on one of my palm tops. Modern PC palm tops because it’s got one of the nicest basic interfaces for typing and getting random stuff out of my head when I’m traveling.
00:02:01.360 Plus, it’s really nice to program in old languages and their version of BASIC. But they never called BASIC, it was so dumb-headed that you ended up having to do your own memory management to do anything—which is BASIC!
00:02:07.600 Memory management is so much fun! This down here is Doom running, and it thinks it’s on RISC OS being emulated in Linux, running in Firefox, written in JavaScript. It has a full networking stack and it works!
00:02:13.390 You can actually play Doom together from your web browser. So we just kind of need... and over here, I don’t think any talk about emulation is complete without mentioning Einstein, the Newton MessagePad emulator.
00:02:19.480 Now this flattens modern phones. They're not powerful enough to emulate a Newton, which is bizarre because Newtons use StrongARM processors that are actually ARM chips. But modern ARM chips at 1.6 gigahertz have great difficulty with quad cores being a hundred and sixty-six megahertz single-core StrongARM chip.
00:02:25.450 This is just phenomenally weird. You’d think it would be much easier. The kind of VMs we’re interested in today are more about program execution. We’re interested in what a machine is, and that’s not what somebody else’s machine is.
00:02:31.209 It’s about figuring out random machines we can build. I have four basic virtual machines up here. This over here, nobody even thinks of it as a VM. This is the .NET environment—Common Language Runtime.
00:02:37.150 At one time, you’d never program that as a VM because it’s taken care of by compiler tools, but it’s a virtual machine that describes how .NET actually hangs things together. We all know the JVM because it’s almost impossible to get away from it.
00:02:46.360 I mean, my inbox is so full of Java job ads, and I haven’t done Java commercially since 1998. Just the demand for Java programmers is so high that someone who hasn’t done it since 1998 looks like a fair bet for a job.
00:02:51.599 I don't think it's even the same language in many very basic structural ways. These down here are two of the simplest virtual machines we ever deal with.
00:02:58.540 As for the fourth VM, any fourth programmers might find this interesting. This is the C virtual machine or, at least, a simplification of it. C has a virtual machine; when you write a C program, you're programming for a virtual machine—you just don’t know it.
00:03:04.769 And this over here, who loves functional languages? Right, this is very famous—the SECD machine (Stack Environment Code and Doncircuit). I can never remember what the acronym stands for.
00:03:09.970 But basically, most functional languages are built using this particular virtual machine structure inside because it’s the easiest way to actually guarantee functional properties while living in an environment that’s full of mutable state.
00:03:16.480 Inside your functional language is a mutable runtime. You’re running on Erlang, and you think you're safe—the BEAM VM has mutability in it. You need mutability at a certain level.
00:03:21.510 So this is my really bad and awful hand-drawn diagram which we will see bits of throughout this talk, explaining approximately what a computer is. These timings are all very approximate, and some of them aren't truly accurate—they just look true.
00:03:28.120 When we're programming a computer, we have a series of instructions we want to get executed and some state that makes them happen in the right sort of order. So, we’ve got this lovely CPU here.
00:03:35.370 My MacBook Air can execute a single opcode in about a quarter of a nanosecond. Thanks to yesterday, I’m finally going to get my measurement right for once: without long donations, I’m not good in three dimensions!
00:03:42.130 That’s about a nanosecond. My MacBook is accessing a register or doing an addition operation in that time. Light will travel that far in the time it takes—that's really small. We don’t normally think about doing things in that frame because light hasn’t gone very far, but it’s important to know because all of these are in orders of magnitude.
00:03:55.979 My registers are as fast as my CPU, for all effective purposes. I’ve got some kind of stack that probably lives in those registers that allow me to perform calls and returns. I’ve got a cache, first level and second level, to talk to main memory, because my memory is slow and has a narrow pipe to access.
00:04:02.070 Furthermore, if that’s a register access, a hundred of those in a row is a heap access. You can start to visualize why some things are a lot slower than others in a way that you can’t with just numbers on a board.
00:04:08.160 We’ve got a whole pile of buffers that control our I/O, so we can talk to the outside world. We've got this lovely hard drive over here—who still uses a hard drive in anything that they actually run on? Who has moved over to SSDs?
00:04:14.940 Do you think the speed gain you get out of using an SSD is just because it's 500 megabytes per second transfer speed? It’s the latency! By the time you get here and are talking to a hard drive, you’re getting something every few milliseconds or tenths of milliseconds.
00:04:25.680 Every time you want to talk to it, you’re waiting for milliseconds for the drive to move the head to another place to start reading. At this point, that’s phenomenally slow compared to a computer—it’s as if you could send a space probe to our percent and have it figure out what the space probe should be before this is finished.
00:04:32.100 That’s the difference in orders of magnitude! By the time you get to the network, you're now so swamped by the cost of light and electrons moving that this is completely decoupled from the idea of computing.
00:04:38.670 Every now and then, over long latencies, data turns up from a remote place, you do something with it, and then you send it back, and wait a very long time.
00:04:44.430 I spend far too much time poking around in these things. I’m going to start off by talking—normally when I give this talk to an audience, I’m giving it to a Go audience.
00:04:49.100 The first thing I dig into is playing around with memory because, yes, it’s a horrible problem to solve. Being able to use a piece of memory for two different types at the same time just doesn’t work like you think it would.
00:04:56.490 In Ruby, if I start talking about memory first, I’m going to start talking about Fiddle pointers. Because there’s this whole thing in the standard library called Fiddler’s following function interface.
00:05:02.400 How many people really want to think about malloc right now? So we do dispatch loops instead—they’re more fun!
00:05:08.690 Dispatch loops are where machines are actually being machines. Those are where machines have to deal with the fact they’re stateful and they live across time.
00:05:14.160 A CPU effectively consists of a dispatch loop that fetches some instructions, decodes them to figure out what to do with them, and then executes.
00:05:20.429 The execution is quite interesting because it’s not part of the decoding, it’s a separate thing, where a lot of work happens.
00:05:27.020 There are people who think, 'Oh, it’s just like I’ve got an instruction, so it just happens.' But it’s under the hood; there are millions of transistors going off in your average modern piece of hardware to figure out what the actual decoding should be.
00:05:34.120 You’re usually not running on a machine that has the same instruction set natively as you think you’re programming for. They all do dynamic recompilation effectively in core to look like some bizarre selection of ported instructions.
00:05:41.780 This is something called transport triggering. More effectively, you write two addresses, and then something happens because you wrote the right magic number; this is not how programmers think, especially if they’re doing web apps!
00:05:48.200 We’re going to start off with a C switch. I’m not really going to talk in great detail about the code—there’s loads of code in here because there were people who downloaded my slides. This is the closest thing to publication I ever get.
00:05:56.130 There are a few simple structural features, though, that are easy to point out. Firstly, I’m taking for granted we have some kind of stack abstraction we can push things on and pop things off.
00:06:02.070 That’s all we really need to know about it at this level. We don’t care what kind of stack we might have. Later in the talk, we might have time to introduce you to the two main kinds of stack, but then again we might just run out of time.
00:06:08.320 The first thing is we need some opcode; we need some instructions that our program can run. I've defined four very simple instructions: we can do a push, an add, a print, and exit a program.
00:06:15.780 Those are literally just numbers; they're just labels that I need to be able to use later to do something down here. We have a nice example of a program where I've been really lazy—these are all integers.
00:06:22.680 There’s no attempt at any clever typing to differentiate instructions from numbers. In fact, inside the average processor, there’s no level of attempt to differentiate numbers and instructions.
00:06:29.560 That’s why you have to be careful with buffers and not let people overrun them—because all kinds of weird things can happen. One on the side, many of the best games that were ever written for the Zilog Z80 used opcodes that Zilog didn’t know existed.
00:06:35.590 They were never part of the specification—they were bugs, accidental overlaps of particular sets of gate paths that thirteen and fourteen-year-olds in their bedrooms discovered by just trying to write every single possible number as an instruction to see what happened.
00:06:42.600 This led to emergent behavior even in a process as simple as that! The designers didn’t even know it was there, which I think is really amazing.
00:06:48.180 So we’ve got a stack, a stored state, and a program that’s going to run. We have a very simple interpreter. I’ve put a couple of registers in because the easiest way to think about a lot of operations is to make sure I know how to pad correctly.
00:06:55.950 It’s nice to know the left and right operands that give me the answer. I’m basically quite done with this stuff, and I need to make it as explicit as possible.
00:07:03.780 We’ve got this little loop that will just go on forever; we can leave that running, and we can just inject numbers into it, and it will try to do something within the frame of what it knows how to do.
00:07:10.600 It's a very dumb machine; this one. All of the ones I’m going to show you today are, but as you can imagine, push, add, print—these could all be much more complicated operations.
00:07:19.330 Anything you can imagine as a discrete step in some kind of process can be an operation. And eventually, we just interpreted this program. Now we can do this sort of stuff in Ruby as well.
00:07:25.600 People don’t do this sort of stuff in Ruby that way. We’ll get to some ways later that are much more elegant, I think, but it’s very simple.
00:07:32.640 Yet again, the code is almost identical; it’s identical in structure. We can use an array as a stack because arrays have stack semantics, which is nice.
00:07:39.390 We’ve got a program counter just so that we know where we are in our program; we can step through, and you’ll notice sometimes our operations themselves read the program.
00:07:45.370 Yes, this is a proper Turing machine; it can read its own tape and decide to do things with the tape based on what’s read together. So, we’ve got everything there you need to go off and build moon landers, nuclear bomb timing controllers, or probably not—
00:07:52.540 because we have any concept of time in here. Yeah, and as you can see, we have a very elegant way of expressing our program.
00:07:58.840 We’ve got symbols now. Symbols under the hood are basically just incredibly big numbers that aren't going to collide very often, so being able to read push 13, etc., compared to having to do a cast of push is just a little bit nicer.
00:08:05.400 I think I prefer Ruby to C anyway. I've been wondering if this is going to be my favorite VM for all time because I put in a proposal about six years ago for real-time Ruby; I got turned down for being ludicrous.
00:08:12.360 It’s not ludicrous anymore, which really cheers me up. So, we can do slightly more complicated things.
00:08:18.900 What we were doing previously was just having a great big switch statement, but switch statements are quite slow. C has a whole pile of boilerplate that slows down the dispatch; you know that the switch statement does.
00:08:26.850 What we can actually do is optimize that a little bit by not dealing here with a switch, but by dealing in a set of pointers that will go off and do something.
00:08:31.950 So with direct calls, effectively, when I read a program, it tells me exactly where I should go off and run a piece of code. I don’t have to interpret it.
00:08:37.560 I’ve already got it effectively compiled down to where it lives, and it’s a direct call, so it’s nice and easy to understand when you’re actually writing these things.
00:08:42.960 The code is shorter; in fact, most of it’s just disappeared. I mean, basically, we just start interpreting by calling an opcode and reading an opcode, and then we just go round and round and round.
00:08:50.130 We can sort of do something similar in Ruby. I’ve put a few bits of additional handy boilerplate in here, and I've also included a problem that I’m going to explain and then show you how to solve.
00:08:57.160 Firstly, we want to have anything that loops round permanently and also have a nice clean exit form. We can use ‘catch’ and ‘throw’ for that because that’s quite cost-effective for un-winding the stack.
00:09:03.170 That will get us out of our infinite loop at the top. I’m doing something a little bit strange—I’m taking in a program that’s basically a string or an array of different data I want to pass through to my dispatch loop.
00:09:15.590 I’m checking to see whether or not any of those things that are being partially passed in are methods that I actually understand. I’m compiling a list directly down to method calls.
00:09:22.600 I expected this to be quite difficult to port from C to Ruby, but it turns out it's trivially easy. So now when I’m actually running my program, my VM object is just off calling itself.
00:09:31.629 It’s looking up a method to call whenever it needs to execute something. I’m sure there’s some kind of stack nightmare going on, but I’m not going to worry about that.
00:09:39.480 Now, over here, I’ve got a dangerous method. This is a method I don’t want running if this program down here is actually malformed because it raises an exception; it never actually completes running the program.
00:09:46.790 So I’m going to solve that because you should never allow data into your program without properly sanitizing it to make sure it’s safe.
00:09:54.480 We have the VM equivalent of an SQL injection attack here, but we don’t because we’ve got a blacklist that says I’m not going to allow any programmed runs to load my compiled or interpret methods.
00:10:02.190 I’m also not going to allow it to run any method that's defined on an object; it’s only allowed to run things that I, as VM, have decided to allow.
00:10:08.030 And, obviously, it goes without saying that the code from before would be used in a subclass, and we’ll see that later.
00:10:14.200 Also, because we’re compiling, it’s kind of nice to give some kind of useful errors back to people to explain why you’re refusing to compile the program you’re compiling.
00:10:21.600 Down here, we allow people to pass in actual methods, not just names of methods. I thought it would be quite nice to allow them to pass in random methods.
00:10:28.150 Who knows? Somebody might have written a clever method that happens to close over my current object and do something useful, but I want to make sure that isn't on my blacklist.
00:10:35.020 So first, I’m going to check if it’s not called something I told it I don’t want to deal with. I can’t do this in this version of this talk at all; there’s no way to do most of this stuff in Go which is frustrating!
00:10:41.310 Seven years of abusing its reflection library—I cannot make it do this, which is really frustrating. But we can then go through and check, 'Is that method actually known?'
00:10:46.480 There’s no point compiling it in if it's not known, and you could say that’s a little bit naughty in Ruby because my object might need that method in the future. I’m sort of cutting it off from being able to do lazy compilation.
00:10:54.200 This is a bit of a weird concept to start with because in this case, I have a method I have to unbind from whatever object it originally came from and bind it to the current object.
00:11:00.000 This gives us a lovely thing you can do in Ruby but, I'm sure, shouldn’t do in production code. I can now effectively inject a method at runtime and say, 'Please handle this dispatch.'
00:11:06.860 It works perfectly, and I can fake that, having a very good way of doing that in Go. At the same time, I’ve got to put in the normal boilerplate to ensure people's aren't calling things in the blacklist, so this is effectively a sanitized, sandboxed VM.
00:11:11.560 It's already able to check that those instructions aren’t allowed and they’re dangerous. There aren’t any talks I’m aware of that show you how to do that in C because it would be a phenomenal amount of code to do it.
00:11:19.270 I’m really pleased with this; it took me a day to knock together, and I was just so wow! Our interpreter is exactly the same as it was; the action is heart-fulfilling.
00:11:29.680 Here’s a program code that will go off—some of its get it passed and some of its going to fail. Listen off; it’s a very small amount of code to actually have a decent VM structure.
00:11:38.060 We get into the bits that always make my head hurt. Direct calls and switches are available in, well, switches are available in every programming language; we can fake the relay statements if you have to.
00:11:45.210 Direct calls are available in any programming language that will either give you function pointers or allow you to dynamically call a function. Once we get into indirect threading, we start moving into territory that used to be specific to GCC.
00:11:51.760 C has finally acquired this ability to have a label in a program and actually treat that address as a useful bit of data with which to figure out where to do a jump.
00:11:58.509 This is a great way of speeding up the insides of dispatch loops. Instead of having to individually work out the next one and jump to it or do a function call which creates stack frames and all the additional nonsense, I can just say, 'Well, that’s like 40 bytes away; please jump back 40 bytes.'
00:12:04.570 This is going to be one opcode, and it’s going to be in the cache—it’s going to be very inexpensive. We’re running with this now at ten to the minus nine, rather than at ten to the minus seven.
00:12:10.820 This gives a hundredfold theoretical speed-up, which is lovely. I don’t like the notation that you have to use in C—double ampersand looks entirely wrong.
00:12:17.280 This bit of code I think is relatively easy to follow. We’ve got clear go-tos telling us where to go and things like that.
00:12:24.090 Indirect threading isn’t as fast as direct threading; direct threading is effectively JIT compilation of indirect threading, for one of a host of bizarre terms we use in the industry.
00:12:32.770 What we’re going to do is write a routine that knows how to compile a whole set of the opcodes to the offsets they would actually have in the function that's the dispatch loop.
00:12:39.600 When I first wrote this program, of course, I forgot about the compilation step and wondered why I kept getting seg faults. But you don’t always reliably get seg fault isolation.
00:12:46.360 This is the point where I always need a whiteboard; I have a call stack. So, I decide to go; when I call a function, I’ve got a whole pile of stuff on that call stack that now makes sense.
00:12:52.930 I’ve got addresses relative to the start of that function's code to jump to, and then I return from that. But I keep all the numbers that I have in there; I take the numbers back telling you the locations and I carry them back.
00:12:58.670 This seems quite reasonable if you’ve just spent seven years working in Go with escape analysis so that automatically everything in that state has just been shoved on the heap.
00:13:05.620 But C doesn’t have escape analysis; so the moment I dump the stack frame by going up the call stack, all of that’s gone. I’m no longer accessing memory that’s actually paid for by my process.
00:13:12.440 Sometimes it will be accessible, sometimes it won’t; it comes down to whether or not the operating system had time to clear out a stack frame and do various bits of patching.
00:13:18.740 Sometimes, I could effectively pass a version of the program from a dispatch loop when it’s been called once, and it might run perfectly well the next time I ask for it, and then other times, it will seg fault.
00:13:25.430 So I thought it would be nice if we fixed that. The only way to really do direct threading and be able to understand how the code works is to set up a compiler.
00:13:33.920 When you go into your interpret function, it’s actually going to ask that dispatch loop you pass into it; a dispatch table tells another function that lives under it roughly how to patch up the program.
00:13:40.650 Then it gives data back in just run, and you can leave it running or pass it around useful data later because you’ve abstracted away.
00:13:46.480 That probably makes no sense in English, but I assure you that the C code works for it. Yes, apparently, I am going to show some code.
00:13:52.300 Right, which means, firstly, I have to remember which piece of random software I'm using. Yes, I don’t still have the version that does the seg fault unfortunately; so we’re now going to talk about indexing.
00:13:58.170 Oh, alright! I should mention I'm a Luddite and I don't get on very well with technology. Nobody ever believes that because of the kind of talks I normally give, but it’s really true!
00:14:06.290 I’m not the person that any of this stuff was invented for! So effectively compile and run. We get a very unexciting output because all we are doing is adding two numbers.
00:14:17.600 However, we’re adding two numbers that have just been compiled down to the most efficient form of dispatch loop possible in around about 20 lines of C total for running a dispatch loop and doing the compilation.
00:14:24.300 This is kind of neat, and it makes me think I might yet be able to become a C programmer one day if I can get my head around this one.
00:14:31.240 It’s not that easy to deal with writing large dispatch loops with lots and lots of different labels. I thought it would be nice to do C’s equivalent of metaprogramming—so, hello, macros!
00:14:38.420 The previous implementation is certainly much less code for doing the same kind of problem as a toy, but now I’ve got this nice little macro language that defines primitives.
00:14:44.810 Here’s the actual compilation table that the compiler has to know about to be able to dispatch compilation. The interpreter follows almost identical structure.
00:14:53.350 The place where this technique is most often used is inside Forth interpreters. Forth has a mode that interprets a program and uses another mode to create this dictionary that consists of optimized versions.
00:14:59.930 With the right structure to understand the program and organize the memory, the end result is efficient. I believe direct threading can be done in Ruby, but I didn’t really have time to write it.
00:15:08.290 I thought I’d have time to write it, but I’ve spent far too much time playing around with Fiddle pointers, which we might get to possibly in a while.
00:15:15.420 You can malloc in Ruby, and like really efficient doubly linked lists! Unfortunately, monkey mind sometimes gets taken over when I’m doing this.
00:15:23.030 Direct threading is probably not a good idea because it's something even more complicated than the last bit of Ruby code, but it’s something I plan to play with.
00:15:31.060 What would happen in principle is that we’d be doing a whole pile of execution flow where we would keep calling another method, until we run out of stack.
00:15:37.900 But hopefully, if we have a return somewhere down that loop, we can work our way back up without blowing our computer’s stack.
00:15:45.990 So that’s why I haven’t got a version; I know it’s going to be a pain to debug. It’s going to be a couple of days and probably best done with alcohol.
00:15:52.200 Now, dispatch loop is half of what’s in a CPU; the other half of what's in the CPU is registers. Depending on how we’re doing for time, I need to know when I’m supposed to stop.
00:15:59.170 I’m probably not going to have time to play with memory today, so I’m going to jump back and show you something. When you build a machine, you pick a level of abstraction that your machine works at to do useful tasks.
00:16:07.060 To be effective, we need the standard harness we’ll be using for the next few examples. This is a virtual machine as far as I’m concerned; it’s all the cool bits, all put into one VM class.
00:16:14.850 We’ll subclass from that. I’m not sure that subclassing is really how I would want to do it in practice; I’ve fallen out of love with class hierarchies some years ago.
00:16:20.850 They always seem kind of fragile in all the wrong ways, but they do work. We’ve got a simple structure for running any kind of interpretive wealth virtual machine.
00:16:27.500 We’ve got the ability to load the program, we can run the interpreter, we can read the program, and then we can compile it to down to the efficient direct calling of methods.
00:16:34.400 The program we’re going to run is an adder. We’ve just invented an adder. We’re one-third of the way to building an ALU, an arithmetic logic unit.
00:16:41.250 I’ve done something slightly radical; we can add negative numbers! So, I’m going to hop back up there with a minute and show this actually running.
00:16:47.780 We’re going to print out a bit of state while it's running. It's going to show us what's on the stack.
00:16:53.460 We've got the ability to push some data onto the stack and when we do an addition, we consume the top two items on the stack and push the answer back.
00:16:59.900 We can run that outside of that or we can jump if not zero. Until now we haven’t thought about branching or moving around in the program ourselves. We've only thought in terms of linear flow.
00:17:06.930 There are an infinite number of ways to move around in a program that you can choose to make opcodes interesting. Our program is quite simple; we’re going to add 13 and repeatedly add -1 to it.
00:17:14.920 If we don't hit zero, we’ll keep jumping back up to the push phase where we pushed the -1.
00:17:21.350 As you can see, this isn’t the nicest format for writing programs with loops. I have to count the opcodes to know where I’m going to jump to; there’s stuff a compiler would do to make that go away.
00:17:28.550 We’re not interested in that high-level, artsy-fartsy stuff—we’re just interested in feeding more opcodes into the machine and making it work.
00:17:35.350 We’re doing Jacquard looms. Not doing anything more modern! Alright, let's see if we can run that.
00:17:41.480 I don't need to do that because I've got a hash tag on it so I won't flash it. We have sat for the requisite number of loop iterations, counting down from 13.
00:17:49.820 We won’t see the initial 13, but we will see zero because we’re doing the print state too late. But anyway, you can see we have a working loop!
00:17:57.830 It has counted; it’s trivial to implement this. The first time I started getting into this stuff I was 14 or 15, and I got a home micro running Basic.
00:18:05.020 It was called locomotive Basic; I think it was exciting because it was fast. One of my favorite magazines at the time dedicated to my particular micro.
00:18:10.130 Had a two-part article on how to program in Forth. The guy who wrote it had written this program in spaghetti Basic—the worst spaghetti Basic you could imagine!
00:18:20.420 With line numbers and colons in the middle of lines; it ran Basic programs, and I fell in love with it, all the wrong ways.
00:18:27.080 So these kinds of things make me feel a nostalgic buzz. I apologize if nobody else does, but it’s kind of neat. Stack machines have limitations; they’re based on a stack.
00:18:35.990 That’s really nice from the point of view that you don’t have to worry about decoding operands—you don’t have to take things out of program memory.
00:18:43.830 On the other hand, stack machines aren't compact as things that have actual memory addresses. You can have more compact encoding.
00:18:51.370 Also, they’re not necessarily very fast. You can do all kinds of bizarre optimizations to speed up stack programs.
00:18:57.730 I have known people who insist on writing low-level math routines in Forth without an assembler because they got faster results.
00:19:04.110 So we’re going to move on to an accumulator machine. I’m not sure if anybody ever really built one of these, but effectively, this is where you have one register.
00:19:12.760 That register is where everything happens. If anybody did ever build one of these, it would have been because that piece of silicon was on the same lump, not over there on a different breadboard.
00:19:20.780 This would have been a huge optimization in certain kinds of architectures. There’s not a lot that we have to change to cope with having just one register.
00:19:27.640 I still have the push and the pop because I have no way of referencing registers or memory—just this one register that would carry a result.
00:19:34.680 It slightly changes how some of the operations occur. I’m not having to index into an array to figure out where to check a value to do a jump.
00:19:41.780 I’m going straight to the register, which, if you’re doing it in C, would be a great speed-up. As you can see, our program is growing a bit out of control to accomplish the accumulation.
00:19:48.260 Accumulator machines are actually really inefficient, but they do function as expected.
00:19:54.240 I’m not going to bother running that one—it’s not very interesting. Register machines are what most of us use. Once you’ve got a register machine, more than one register, you start moving into territory where compilers can do a trick.
00:20:01.480 You start moving into territory where instead of saying, 'I’ve got an array of two registers,' you can actually say, 'I have a register file, please treat it as if I have an infinite number of registers.'
00:20:09.080 Then you get single instruction, multiple data (SIMD) effectively vectorized operations. There's no reason why a register can't be a vector.
00:20:16.260 Now, even doing vector math with a single dispatch. There are people in the City of London—well, before we get to the City of London, where the hyper-cube machines run.
00:20:24.160 This is where we say, 'I have a matrix,' rather than just a vector. Now you can do a whole pile of stuff with matrix transforms.
00:20:31.440 You can build processes that can naturally conduct 3D movement, or if you’re doing high-performance physics, you often wish that machines had matrix registers, and they don’t!
00:20:39.130 But a hypercube is basically what the financial trading systems in the City of London run on. We’re no longer just looking at a matrix; oh, no.
00:20:48.000 We can have an arbitrary number of dimensions in its matrix, but we’re still doing it off a single dispatch. So now, I mean, if these are the kinds of tricks, they’re the reason why NumPy is so fast.
00:20:54.940 We could have that in Ruby quite easily. Does anybody want to help? Please, we’re two decades behind; there must be someone who wants to see gas pipeline traversal.
00:21:03.160 This is something I've been thinking about over the last couple of years; my actual job is designing anonymous identity systems. I spend a lot of my time making data flow in one direction.
00:21:10.580 Trying to figure out how to find the right branch of the graph at any particular point to be useful. If you can have a register that’s a graph and another register that’s a glass, there’s a whole pile of stuff you can do with that.
00:21:18.430 You would never have previously thought of achieving this by writing code for it because it would be too complicated. But now you’re just dealing with two things, and you can talk in terms of higher concepts.
00:21:25.420 So we’ve moved effectively, from the start of this talk, worrying about having some numbers that happen to be labels in switches and some very simple things to taking any problem.
00:21:33.550 We can turn it into a machine that’s reliably scriptable. Where our program can be written in terms that are effective.
00:21:39.880 The more complex we make those registers, the more we’ve turned assembly language into a DSL. This is a totally transformative thing!
00:21:46.640 I mean, it’s like I’ve been saying for years; we can do Ruby assembler, and we’re going to skip past the memory stuff because we haven’t really got time, even though it’s really fun.
00:21:53.990 We’re coming up to the bits that I really shouldn’t talk about. Once we’ve got a dispatch loop, we want to reach out to the rest of the world.
00:22:02.100 We can either be polite about it and use Ruby C extensions, or maybe we can use Ruby inline, which is still a lightweight way of wrapping C and calling them from Ruby.
00:22:08.350 Or we could use Fiddle's C bindings to access C functions that way. Is there anybody who remembers Wilson?
00:22:15.540 It’s always good to have someone in the audience who knows they’re as mad as you are! Wilson is an x86 64 assembler as a DSL in Ruby.
00:22:22.240 We can malloc and grab chunks of memory. We can write stuff into those chunks of memory. If we have a stream of x86 64 assembly code that we’ve produced in Ruby and injected it...
00:22:30.500 Then we can decide to turn it into a function pointer and call it! I think we just optimized a Ruby program with a direct JIT to assembler.
00:22:38.330 I’m not saying that’s a good idea, because for one thing, this was written as a joke. I make no guarantees it’s robust.
00:22:45.310 You can probably guess who’s behind it. I’m not blaming them, but we all know who’s behind it.
00:22:52.760 I would have liked to do the memory part, because one of the things about playing with pointers in Ruby is you get a lot of seg faults while figuring it out.
00:23:00.130 Those seg faults aren't always bad things. A seg fault is sometimes alright. I have written a really complicated C++ program.
00:23:06.130 I get a seg fault; I’ve done something stupid and I’m never going to figure it out. I should throw away a whole chunk of code and start again.
00:23:10.950 But when I’m playing around with pointers in Ruby, I’m actually saying, 'Oh, I don’t leave outside the space; I’ve got a trailer. I wonder why.'
00:23:17.760 I can actually understand my code I’ve written! I can sometimes work back through it and I claim occasionally to do that.
00:23:23.250 Honestly, I can capture the segfault and ask myself, 'Did it really matter? Was I going to use that memory?' No? Alright—continue!
00:23:29.730 I think I can find a use case for never say die. I was expecting Aaron to give me drinks for that.
00:23:39.900 I found this talk yesterday! A friend of mine in the NGO community posted around this; it’s well worth going and reading.
00:23:47.000 He works at Valve and mostly he’s concerned about low-level optimizations of code to not have his cache lines blowing up.
00:23:53.230 It’s just a brilliant talk! He basically has 2D and 3D animations with NumPy, proper professional animators at Valve.
00:23:59.950 With cars not being able to get down those very narrow bus connections from main memory to processor—it’s a really good video.
00:24:05.860 And then, finally, there are four books which, if you want to learn more, are really interesting to it.
00:24:12.560 The one up in the corner is an Elsevier publication, which means it’s probably about one hundred and twenty dollars. I managed to finally get a legal copy of it as a PDF.
00:24:19.810 I got it for like forty bucks—I got a print copy, so I didn’t feel bad about having a bootleg before that.
00:24:29.160 It’s only about four chapters that are actually genuinely useful, because most of it’s about the other kinds of virtual machines, not process ones.
00:24:34.940 If you can track down a copy of Java Virtual Machine, it’s hilariously good fun because it comes with a floppy disk.
00:24:38.370 That’s in 3.5-inch floppy with the Jasmine assembler on it, so you can write your own low-level Java stuff.
00:24:44.960 Threaded interpretive languages is a book from the 70s published by Byte, all about the great exciting world of streaming together the inside of Forth.
00:24:50.700 It doesn’t use the word forth anywhere for copyright. And these books contain all those diagrams in computer science books of how lists work.
00:24:56.920 It’s simply solid, dense diagrams about utility. This is my all-time favorite programming book. I’ve mentioned this in many talks over the years.
00:25:06.190 The chap who wrote ‘A Portion II’ no longer exists; it’s ten chapters long, and eight of them are actually about implementing different programming languages.
00:25:12.410 The last two chapters are about how to do specific optimizations to them. All the code is in Pascal.
Explore all talks recorded at RubyConf 2016
+82