00:00:15.680
All right, well, I'm sure people will keep filtering in, but I'll kick off. So, hi everyone, I'm Alex and I'm really excited to talk to you today about two of my favorite things in the world: Ruby and Harry Potter.
00:00:23.039
The title of my talk is "Rubeus Hagrid: Writing Harry Potter with Ruby," and this whole talk is based around a kind of crazy idea. Can we use Ruby, just the regular Ruby language that we know and love, to write a brand new Harry Potter story automatically? This immediately raises some other questions, like probably the first one is why on earth would we want to do that?
00:00:42.660
Then, what would that actually look like? What can we achieve if we use Ruby to write a Harry Potter story? The big one, obviously, is how on earth do we actually do this? So let's start with the 'why' of why we might want to do this.
00:01:07.080
I realize I'm probably talking to two slightly different audiences in the room right now. First, we have people like me: true blue Harry Potter fans. For all of us, this is pretty straightforward to answer. Just imagine a nice big, beautiful pile of brand-new Harry Potter books, which means we can stay happily in the Wizarding World forever.
00:01:32.700
Now there's also going to be a lot of you who have never caught the Harry Potter bug and might be a bit baffled by it all. For all of you, I'd recommend visualizing a big beautiful pile of money, because that is what awaits you if you can satiate the rabid hunger of people like me. By the way, if you're not quite sure which category you fit into, there's a little test I've come up with. Just look at this picture, and based on your reaction, you can sort yourself into the appropriate category.
00:02:11.849
So those are some of the ways, and maybe some slightly more serious reasons, why we're hopefully, this is an interesting topic. It's going to give us a nice intro to Natural Language Processing (NLP), which is really a hot topic right now. It's also, I think, going to reveal a lot of the simplicity, beauty, and elegance of Ruby. We can actually achieve this with very little Ruby code.
00:02:41.940
Also, I hope it's going to reveal something more general about tackling seemingly hard problems. I'll talk a little bit about that at the end. Okay, so now what can we achieve? What will this actually look like when it's done? I'm going to give a bit of a spoiler and show you the output of the sort of final program that we will write. Let's give it a read.
00:03:02.880
"Neville, Shamus, and Dean were muttering but did not speak when Harry had told Fudge mere weeks ago. But now he was crying, actually crying, tears streaming down the sides of their heads. 'Who revealed a spell to make your Bludger?' said Harry, anger rising once more." So, it's definitely not Pulitzer Prize-winning stuff, but it more or less makes sense. It certainly has the style of a Harry Potter story, and I hope you'll be doubly impressed when you see just how little code goes into making this.
00:03:48.989
Now, the big question is how we actually go about doing this. On the face of it, this does seem like quite a hard problem, right? Just look at even one sentence from that extract that I just read and think about it: how do we begin writing Ruby code to create a sentence like this?
00:04:10.709
There are a couple of key ideas that I want to introduce to help us get started. One idea is that we want to tell this story word by word. At any point, we just want to be focused on generating the next word in the story. The key idea is that pretty much all of us in this room have a great source of inspiration for this problem right in our pockets or bags, and those are our smartphones.
00:04:29.700
So, why are our smartphones a good way to get started on this problem? Pretty much every modern smartphone has, if I can use my laser pointer, one of these predictive keyboards. Usually, we use a predictive keyboard as a typing aid to write things faster, but what's kind of interesting is that we can actually use this to generate sentences, basically unsupervised.
00:05:00.180
Here's a video I recorded on my phone. Basically, what I'm doing is hammering the middle suggestion button, and you can see that it starts to generate an English sentence that sounds like it's been written by a human. What else is interesting about this is that it's not just imitating any human; it's supposed to be imitating me. Some of you might know this already: your predictive keyboard adapts to you over time. It learns your style and tries to imitate that.
00:05:30.810
So, how does your phone do this? Well, let’s take an example. This is what I get as my suggestions when I type 'birthday' into my phone. The first suggestion is 'party,' followed by 'cake,' and then 'wishes' is the third suggestion. So somewhere in the memory of my phone, it knows that out of all the times I've written 'birthday,' let’s say I've used the word 'party' 30 times to follow it, 'cake' about 20 times. By knowing what I've used in the past, it can suggest the words I might want to use when I've written 'birthday.'
00:06:07.750
So why is this relevant to our problem? We can take that same idea and start doing the same thing with the way that J.K. Rowling uses language in the Harry Potter series. For example, the word 'golden' appears about 200 times in the Harry Potter books. These are the top continuations that come after 'golden.' The word 'egg' is the word that follows 'golden' most often—thirteen times after 'golden.' 'Snitch' is the next most common, and so on.
00:06:37.360
A couple of bits of terminology I'll keep using throughout this talk: the initial word that we use to generate the suggestions, we'll call this the 'head word,' and the suggestions, we'll call them 'continuations.' The third key idea is that we want to break our approach into two phases. First, we want to learn the style of J.K. Rowling and the Harry Potter books, and the second step is we want to use everything we've learned to generate new stories.
00:07:13.650
So let's first look at the learning stage. This learning stage is actually super simple. All we want to do is look at every single word that J.K. Rowling uses in the Harry Potter books and basically collect stats for each head word. We'll look at the continuations that come after it and how many times each continuation appears. We'll do this for 'golden,' as we saw, but then for ‘goldfish’ and ‘golf’ and for all the words in the books—about 20,000 unique words altogether.
00:07:35.700
How would we represent this in Ruby? Well, you can guess from the layout that a nice way to represent it is just as a simple Ruby hash. So something like this is what we want to end up with. Okay, let’s look at how we actually do this in code. The very first thing that we’re going to need is some copies of the books in machine-readable format, so just plain text is completely fine.
00:08:07.410
Oh, I forgot to mention, by the way, I put some notes and all the slides online, and you can find links to these text files. I'll share that link again at the end. We start with those text files, then we want to start by doing some cleaning—something called tokenization, which basically means getting rid of special characters, lowercasing everything, and turning everything into symbols. That’s going to be a lot more memory efficient to work with.
00:08:30.090
So, taking a sentence like this, we’ll end up with output like this. Once we've tokenized everything, we’re ready to actually build up our of the head words and the continuations. This is a really nice example of how elegant and simple Ruby can make this. So this is all the code that we need to do this. Let’s have a look at what’s going on here.
00:09:01.450
We start off by using this nice built-in method, 'each_cons', which is short for 'each consecutive'. That’s basically going to take each consecutive pair of words. We’ll start with 'the cat' and then 'cat sat' and so on. For each head word, we’re going to say: if we haven’t encountered this head word before, let’s just initialize a new hash inside.
00:09:40.560
So to go along with that head word, we’d start with a new hash that has a default count of zero, and then we’ll just say, for this combination of head word and continuation, let’s increment the count by one. That’s all that’s going on here. In the first iteration, we’ll get there and say ‘cat’ follows the word ‘the’ one time, and then on the next iteration, ‘Sat’ follows ‘cat’ one time.
00:10:06.560
We’ll continue iterating through all the words in this example sentence, and we’ll eventually end up with something like this. We do exactly the same procedure but instead of this example sentence, we do it on that corpus of all the Harry Potter books, and that’s our learning phase done. That’s all we have to do to learn the style of the Harry Potter books from J.K. Rowling.
00:10:36.660
Now we need to figure out how to use that to generate new stories. There are a few different approaches, so let's start with the simplest, which is called the greedy algorithm. Okay, so why is it called the greedy algorithm? Well, it takes the biggest, juiciest option at each step.
00:11:06.310
This means it just goes for the most likely continuation, the one that's appeared most often. In our 'golden' example, remember we said 'egg' appeared more than any other word after 'golden' thirteen times, so we would just always pick 'egg' after 'golden.' Once we pick ‘egg,’ the next most frequent continuation is ‘and’, so we'd pick that one next, and we just continue on like that until we have a story. This is really nice and easy to implement in Ruby.
00:11:42.170
Here, we’re taking all of our continuations and we’ve got the word ‘and’ and the count of how many times it appears. We’re just using the 'max_by' method to get the continuation with the highest count. That’s all we’re really doing. So if you’ve really been paying attention, you've noticed one other problem: what do we do about our very first word in the story?
00:12:11.060
When we’re doing our first word, we don’t have any previous word to continue from, so we have to do something a little different to start our story. There are a bunch of different approaches, but in this case, we can just start with a completely random word. So any random word in our vocabulary—that’s what’s happening here with the ‘dot sample’ method. Just pick a random word to start us off and, in this case, I’m going to make a 50-word story.
00:12:40.060
I’m just going to repeatedly apply the greedy algorithm and word by word, hopefully, build up the story. So how does this work in practice? I ran this for the first time, and this is what I ended up with: "Ono said Harry. A few seconds later, they were all the door, and the door, and the door, and the door." Okay, so not a great start, but maybe I just got unlucky, right? We started with a random word, so maybe this was a really bad choice. Let me try it again.
00:13:04.580
So this is my second attempt: "tisha slee several of the door, and the door, and the door, and the door, and the door." Okay, so the good news is that we won't struggle to find a title for our new Harry Potter story; the bad news is it's obviously pretty much everything else. So obviously, something's gone horribly wrong here. What was going on?
00:13:43.590
Let’s walk through what’s happening while we’re running this. Let's say we start with the word 'several'. The word that appears most often after 'several' is 'the.' The word that follows 'the' most often is 'door', and after 'door' is 'and.' Okay, all good so far. The problem is that the most common word that comes after 'and' is 'the,' so we get stuck in a loop. You might wonder, does this always happen? Sadly, the answer is yes.
00:14:21.570
Weirdly enough, the word that gives us the longest story without going into a loop is actually 'conference.' Using our greedy algorithm with 'conference' as the starting word gives us the best we can do—20 words. So obviously, we can rule out our greedy algorithm and say this simply doesn’t work.
00:14:50.320
Let’s try a completely different approach to generation and go to the opposite end of the scale. Let’s get really random with something called the uniform random algorithm. It’s a fancy name for something that's extremely simple: we just look at our potential continuations and we pick one randomly, with equal probability.
00:15:14.320
In this case, if we have three continuations, we pick one of the three with equal probability. In reality, we’ll probably have a lot of continuations. So in this case, we have 117 potential continuations after 'golden,' and we just pick one of them randomly. This is also really nice and easy to do in Ruby. We can again use the ‘dot sample’ method, and it will pick one of our continuations at random.
00:15:45.320
So how does this work in practice? Here’s an example: "Debris from boys accompanied him bodily from Ron, yelling the waters. Harry laughing together soon, father would then bleated the smelly cloud." It's probably better than the greedy algorithm, but unless you’re into avant-garde Harry Potter fan fiction, this is probably a bit weird.
00:16:27.730
Why is this? The other thing I would point out is that, apart from the names, this doesn’t really seem like Harry Potter. If you took the names out, you’d never guess this was a Harry Potter story. So why isn’t this working so well? Well, there are a lot of reasons, but let's look at one example word here: the word 'house.' After the word 'house' in the Harry Potter series, the word 'elf' appears over a hundred times.
00:17:01.000
But it’s just one of those 200 potential continuations after 'house,' so it has a 1 in 200 chance of being picked. By the way, ‘house elf’ refers to our friend Dobby from earlier, if you didn’t know. The phrase 'house prices' appears in Harry Potter, but only once. However, this has exactly the same chance of being picked as 'house elf.' So obviously a program that’s as likely to talk about 'house prices' as 'house elves' isn’t really doing a great job of imitating Harry Potter.
00:17:37.190
This gives us some clues about how to improve this. Our final and best algorithm is what we call the weighted random algorithm. What we do here is look at the situation: the word 'house' appears about 700 times in the series, and 100 of those times it’s followed by 'elf.' Logically, this means it should really have more like a one in seven chance of being picked, right?
00:18:06.210
That’s exactly what this algorithm does. We rescale the probabilities to match how many times each word appears. Again, this is surprisingly easy to implement. Just to reinforce that point, you might end up with a situation where the most frequent continuation would have a high probability, say a 1/2 probability of being picked, and others with 1/3, 1/6, and so on.
00:18:43.840
So, this is again surprisingly easy to implement in Ruby. This one's maybe slightly more difficult to understand, but the intuition is that you can think of it like a raffle where each word gets an entry equal to the number of times it appears. In our ‘house elf’ example, 'elf' gets 100 entries to the raffle, whereas 'prices' only gets one entry.
00:19:53.910
We’re going.
00:24:58.210
So, the first step is really, when you're tackling a new hard problem, ask someone familiar with the problem. What are the building blocks for solving this problem? How do we tackle this one step at a time? How do we break it down?
00:25:27.230
The second lesson is that we should really pay attention to our failures. Obviously, we had a bit of a false start when we were telling our stories. We could have simply said, 'Oh yeah, that doesn’t work.' But what we really had to do is look in detail at the failure—try to figure out why it failed, and work out where the problem was.
00:26:01.590
It may sound obvious, but when tackling a really hard problem, just really pay attention to those failures. Ask someone who knows again about the problem to explain why it didn’t work and what specifically about your approach was flawed.
00:26:55.640
The last lesson is that finding a good metaphor for a problem is really valuable. When I was wrapping my head around this, I found the metaphor of the predictive keyboard to be really helpful. Bad metaphors are easy to come up with, but good metaphors, which capture the essence of a problem, are much more challenging to create.
00:27:12.750
A good metaphor captures all the essential parts of the problem. You can strip out things that aren’t relevant and modify others, but there are certain parts you just have to keep. The second quality of a good metaphor is that it allows you to play around with it and learn something about the original problem.
00:27:54.420
Let’s take another example I really like. Suppose I’m Dumbledore trying to schedule Hogwarts classes. I have to schedule classes so there aren’t any clashes. If a student is taking two classes, they shouldn’t be scheduled at the same time, but I want my timetable to be efficient.
00:28:35.510
For example, here’s a bad scheduling: hopefully you can see the colors. I've scheduled Ancient Ruins and Arithmancy at the same time for the same student. This is a failed attempt. I can look at this in a tabular form, but there’s a nice metaphor: graph coloring.
00:29:04.470
I just draw out the classes as dots on paper. If two students are taking two classes, I draw a line between the dots. For example, if Hermione is taking Ancient Ruins and Arithmancy, I draw a line between those dots. My challenge is to color in the dots so the ends of a line never share the same color.
00:29:41.560
For example, a valid coloring eliminates all conflicts and results in a valid timetable. This metaphor is useful. After considering the timetable in a tabular form, I can draw on this metaphor and actually start playing around with it to make progress.
00:30:09.460
So those are the key lessons about hard problems that I’ve learned from this talk. Thank you for listening. I put the slides and notes online at this address. I think we're short on time, but feel free to come up to me afterward if you have any questions.