Ancient City Ruby 2014

Oh, Oh, Oh, It's Magic!

You know? Let's explore computer vision using Ruby and OpenCV! In this talk, we will learn techniques for speeding up our code, fetching data from the network, and doing image recognition, all in Ruby. Let's harness the power of Sauron's Eye (my webcam) together!

Ancient City Ruby 2014

00:00:00.840 Hello, everybody! First, I want to say thank you to everyone here running the conference. It's amazing, and I'm very happy to be here, so thank you very, very much for having me. I've never been to St. Augustine before, and it's very beautiful. So, I want to say happy Friday! Happy Friday, everyone! Woo! We made it!
00:00:20.720 Okay, I'm going to ask you to do something that probably no other speaker asks you to do. I want you to get on the internet right now. Actually, hold on. We’re going to try out a little test here. Get on the internet and tweet this at me: @TenderLove photo me. We’re going to see if Twitter's rate limits or the Wi-Fi here are actually working. Is anybody tweeting at me? Is it working? For those of you who have tweeted at me, you can only do it once, but you can do it more than once. What I have done is set up a daemon on my machine that, anytime you tweet @TenderLove photo me, it will take a photo of me from the webcam here and reply to you on Twitter. I made sure to wash my face before so hopefully, it won’t look too crazy.
00:01:12.000 So, my name is Aaron Patterson, and I'm on the internet as Tender Love. I noticed that someone here is wearing Google Glass. I also have Google Glass at home. I'm on the Ruby core team and the Rails core team. This doesn't mean that I know what I'm talking about; it just means that I'm terrible at saying no. Anyway, I work for a very small startup out of Dallas, Texas. I'm a remote worker for this company. You might have heard of them; it's AT&T. Yes, I work for AT&T as an open-source programmer.
00:01:53.120 Yesterday was extremely awesome! I had a great time at this conference. Leon talked about caves, which is really awesome because I live in a cave—I don't really get out much. Katrina had many takeaways in her talk, and where I’m from, we call those 'to-gos,' not takeaways. Richard talked about testing the untestable, which I thought was very interesting because he didn't really answer the question: I mean, if it's untestable, how can you test it? By definition, if it's untestable, how is this possible?
00:02:12.600 Anyway, I went on a ghost tour and thought that was really cool because I met a bunch of investigators who presented lots of evidence. Every time they said 'investigator' or 'evidence,' all I could imagine in my head was air quotes. I was just thinking about what if they were showing air quotes. Anyway, I love programming. I love programming a lot! It’s also my job, and I’m super happy that I can do it every day.
00:03:04.360 Specifically, I love Ruby programming. I love Ruby programming a lot! The reason I work on open-source stuff as my job is that I mainly work on Rails, and I’m paid to basically do whatever open-source stuff I want to do, but I mostly do Rails. The reason I do that is that Rails is what gave me a job to be a Ruby programmer. I love Ruby so much, and I want everybody else to be able to do Ruby as their job too, so I work on Rails to ensure that companies will continue to use it, and we continue to have jobs programming Ruby.
00:04:09.319 So anyway, I’m paid to do this, and I feel obligated to talk about work stuff. I’m going to first talk about business up front, and then when we're done with that, we’re going to party in the back. Yes, this is a mullet talk! I do look young, but the hair is not real.
00:04:56.480 I’m going to complain—that’s the first part. I’m straight up complaining. I'm going to talk about Rack. I think that Rack is shackling us, and this is going to be fairly specific about Rails internals and stuff. If you don’t really understand, don’t worry about it; just tune out and wait for the back end of the mullet. I want to talk about the shackles of Rack in terms of streaming. Basically, I want streaming.
00:05:34.840 I think the future of the web is very important for us to have streaming. We need to be able to stream to our clients so that we can reduce latency. I want to get data out to the clients quickly, as quickly as possible. You see this with technologies like Node; Node makes it very easy to stream stuff out to the client, and I want to do this in Ruby. Frankly, I hate JavaScript, so I will happily talk to people about that later.
00:06:00.960 Another important part about doing streaming is that I think we can actually gain some performance benefits from supporting this in Rails: supporting streaming as a first-class citizen. I'll show you exactly what I mean. Here’s an example of how streaming could work. On the left side, we have our server code, and on the right side, we have app code. The app says, 'Okay, write some data out to the client,' and in Rails, we automatically do HTML escaping for you if your string is not marked as HTML safe.
00:06:48.120 What’s cool is that if we had this buffer that we actually wrote to, we could have a background process saying, 'Okay, pop off the buffer. If that thing is not HTML safe, make it HTML safe, then write it out. Otherwise, just write the thing out.' We could actually have some parallelism going inside our application, so we could do two things at once. This is concurrent-friendly. Unfortunately, today’s safe buffer looks something like this. We say, 'Okay, take a safe string and concat it to another safe string,' or 'concat a non-safe string onto that,' and we have to do all that checking right at that moment and we're buffering all this stuff up.
00:07:36.440 This is not concurrent-friendly because you may use the return value of that. We have to have that return value back when you write to the buffer; we know you’re not going to use that anymore. It’s like, 'Okay, we're writing this to the buffer; go.' Anyway, what we do in Rails as the base use case is what most people do: they assemble a page together where that page gets assembled into one very large buffer and gets sent out to the client. So Rails buffers the entire page currently.
00:08:23.760 Now, there’s a dirty little secret I have to tell, which is that streaming may not actually stream. To understand why this is, we have to take a look at what the write system call does. For example, if you have this file and you say file.write to write some data out to it, it may not actually write to the file system. It may buffer that, so Ruby has internal buffers that save that data in memory and then write it out to the system call. Even your system has buffers where, if you use the write system call, it may buffer that before actually flushing to the file system.
00:09:10.560 So, when we say write, it may not actually write. There’s actually a small buffer involved there. What if we just grew that buffer? What if that buffer became infinite, or what if it was the size of a page? You can imagine that’s how buffering should work—that’s how we should support a full in-memory buffer. We can say that buffering is a form of streaming where the streaming’s buffer size is very large.
00:10:28.680 So today in Rails, we have a stack that looks basically like this. We have an adapter that sits between the Rack and your controller. That streaming adapter is responsible for handling streaming. The streaming interface in Rails looks a little bit like this: we add a header on the response, and then we do this loop where we write to the stream. Every five seconds, it’s going to write out to the stream, and our stream looks like an IO object. We want to treat that stream as if it's an IO object, so if you know how to use an IO, you know how to use streaming.
00:11:16.920 So how does it work? I don’t actually have enough time to tell you this because I have 172 slides, so I'm going to handwave over this a lot. Basically, the gist of things is that we have, on the left side, your app code and on the right side, the middleware. That middleware isn’t exactly what it is; I'm just using lambdas to represent the middleware. That’s how the Rack stack works—you take all these lambdas and chain them together like this.
00:12:06.760 What happens today is that we have two threads running. When you want to do streaming in Rails, one is running your app code and the other is the Rack middleware running in a different thread. When the request comes in, it goes down this lambda chain through the middleware, and we get to the one that’s like, 'Okay, it’s time to call Rails; it’s time to call your controller.' At that point, we say, 'Okay, hey controller, get ready.' We call the controller, the controller writes the response, and adds headers, and as soon as you call write on the stream, at that point, we say, 'Oh, hey, other thread, Rack middleware thread, it’s okay for you to return back up the stack.' First, the user wanted to write, so we’re going to return back up the stack, and then the web server is going to start writing data out.
00:13:07.040 So how does Rack fit into this? Why is Rack really bothering me? I'm going to show you exactly why Rack bothers me. This is what a Rack interface looks like. All you have to do to implement Rack is respond to call. I’ve left out most of the objects just for clarity. You have something that responds to call, takes an environment which is a hash of stuff, and then you return an array that has three values in it: the status code, the headers as a hash, and then an output.
00:13:47.360 The output has to respond to each, and the web server loops over each of them and writes that out to the socket. Simple enough; this is great! Everybody was excited about this because it seems very easy to write a web service. Now we need to do resource management, so when a request comes in, we need to get a database connection and give you a database connection in your controller. We say, 'Okay, let’s get the database connection, call the process that actually processes your controller, then release the database connection.' So we get that resource, and then we process your page, and then finally, we disconnect from the database.
00:14:37.680 So it seems pretty straightforward enough, right? It's very straightforward. We just build up the page, disconnect, and now you can see why we have to buffer—the page, when you're done processing, needs to be sent when we can release the data. So we think, 'Okay, I want to implement a streaming server.' Let’s take a look at a very simple streaming server that streams out 'hello world' every half second. The way this works is based on the fact that Rack calls each in your body.
00:15:26.000 So we implement each at the top and say, 'Okay, you called each. I’m going to start feeding that block every half second.' Okay, easy enough! But then we want to embed a database connection. So we connect to the database—get the database—and then we say, 'Okay, let’s get the output,' and then disconnect from the database and send the output to the web server. This isn’t going to work because we return that object all the way back up to the web server, and the web server is like, 'Okay, I’m going to call each.' This thing tries to get rows from the database, but we don’t have a database connection anymore.
00:16:15.160 So this obviously won’t work; our cleanup code is wrong. We have to fix our cleanup code, and the way we do this looks like this: we have to implement close, and Rack will call close on the object. As soon as you close that file handle, I want you to disconnect from the database. Now the other problem we have is error handling. What happened if there was an error during streaming? We have to rescue and say, 'Okay, you messed up when you were constructing the streaming object. We got an exception, and now we have to disconnect from the database.' So now we have this repeated code where we have to ensure we close it here on close or if there is an exception, we need to do it down here too.
00:17:30.640 The other problem we have is header management. Let’s say you have some streaming code that looks like this: you want to add headers—that’s great! You can only add headers before you start streaming. If you started streaming, the headers have already been sent, and it doesn’t matter whether or not you send headers anymore; they’re not going to make a difference. This means that you probably have a bug in your code, and we want to raise an exception when you try to add a header. So we freeze these headers on write, and as soon as you call write, you want to just freeze the headers.
00:18:24.640 So what happens is that, as soon as you call add header, you get an exception. You’re notified, 'Oh, I messed up in my code; I didn’t actually want to add a header here because I've already started streaming out to the socket.' We have to keep track of a flag for whether or not you've written to the stream. If you remember, though, in our Rack stack, any one of these middleware could actually try to write headers. As we’re returning back up the stack, we may not have actually written to the socket yet, so it’s valid for any of these middleware to write to those headers.
00:19:30.000 But as soon as the client code has written to the socket, we tried to freeze those, but that’s not going to work because these middleware need to write to it too. So we need another flag; we have to have a flag asking if we have actually written to the socket. If we’ve written to the socket, then we should raise an exception in the middleware. So we need two flags for cleanup code that’s repeated: one for database cleanup and one for whether the user has written or whether we’ve actually written to the socket.
00:20:15.040 Finally, we get something that looks like this. This is a totally handwavy implementation, but you can see that in each of the steps, these steps are completely disparate from each other; not only that, but some of the steps are repeated. You can’t just look at a particular piece of code and figure out what the execution path is going to look like. The responsibilities are scattered throughout the code.
00:21:00.240 Another problem is that every middleware needs to be aware of these rules and implement them because if any middleware messes up, it ruins the bunch. If you forget to call close on something, it’s going to mess up your entire stack and you won’t know. My main gripe about it is the death knell for me is the environment hash that gets passed into call. If you think about this hash a little bit, it’s actually a global variable, and we know that global variables are bad. I hope I don't need to reiterate how bad they are.
00:22:02.840 We get a user agent in this hash—that’s great! You can access anything about the request you want to know there, but eventually you get middleware that are like, 'Hey, I need to share some data. Let’s share some data.' The only thing they can share is this hash. We get people who write stuff; this is all Rails core code. We say, 'Okay, we need to share the cookie jar between two middleware.' We have session middleware that looks up your session, and another middleware that looks up the cookie jar.
00:23:05.280 We need to share that information, but you’ll notice that we have leaking knowledge; you have to know what that key is, and now it’s in multiple places. Also, the order matters. If we don’t execute the jar before we execute the session middleware, the session middleware isn’t going to have a cookie jar. There’s no way; we need to make sure that that order is correct.
00:23:56.520 Additionally, we have no introspection—there's no way to look at a particular middleware and say, 'Hey, I know what’s going to be next in the stack.' There’s no API for that. The other problem is that everything is public. This @app.session thing here: anybody could access this, so there could be middleware in the world saying, 'I’m going to inject my own custom middleware for, I don’t know, devise or something in the middle of this, and I'm going to use this session,' and what if we on the Rails core team want to change that key?
00:24:42.680 It’s impossible; we can’t do it and maintain backwards compatibility. You end up with hilarious code; let me quote something hilarious here. This is an excerpt from the Rails code where we have, at the top, a method called cookie jar. You’ll see that this cookie jar method lazily puts an object into the Rack environment hash, saying, 'Okay, is there a cookie jar there? If not, let’s instantiate one and shove it into the hash.' Then we have another middleware somewhere else like, 'Hey, let’s look up the cookie jar. If there was a cookie jar there, let’s do something; if there wasn’t, then we'll do something else.'
00:25:24.400 What’s annoying about this is you can’t tell if this middleware is in the wrong place. You can’t tell whether this middleware is in the wrong place or if nobody just called the method. Maybe you’re not getting any cookie jars all day long, and you can’t tell if you made a mistake or if middleware is in the right place, or if nobody is calling that method. My point is that Rack is encouraging bad patterns in our code; it’s not necessarily a bad thing, but when you try to take these use cases into account, a lot of bad patterns arise.
00:26:18.040 We can’t safely deprecate things. On the core team, we can’t say, 'Hey, I want to change that key.' None of this information is private to us. We can’t safely change it and give you an upgrade path. There’s no way for us to refactor this code. So when I look at that particular key in that cookie jar, I feel locked into it. I have to maintain that to maintain backwards compatibility; I'm shackled to this.
00:27:05.120 So what I really want to do is break the shackles of Rack. I think we need to have something that’s not based on some hash; we need to have some kind of object-oriented defined API. I don’t have an entire answer for this, but I feel like maybe we need to steal from Node or J2EE; there are ideas around there that we can adapt. These are just a bunch of thoughts that I’ve been thinking about that really bother me in my day job, and I wanted to share. Unfortunately, right now, it’s a huge yak shave; it’s mostly just a little sparkle in my eye.
00:27:52.640 So if you want to talk about this stuff with me later, come ping me. I’m happy to talk about it, or we can do it during the Q&A. I think now we’re going to enter the party in the back portion of my presentation that I like to call 'Oh, Oh, Oh, It’s Magic!' You know Magic: The Gathering? How many people here play Magic: The Gathering? A few? How many of you know what it is?
00:28:39.160 If you don't know, it’s a collectible card game that has been around for at least 20 years. I started playing it in high school, which was about 20 years ago. I didn’t have any money then, so I quit playing because these cards cost money. Then, maybe ten years ago, when I had a job and disposable income, and also no friends, I spent tons of money on this game. I eventually quit playing after a while because, mainly, I went to a tournament. I thought, 'You know what? I’m going to meet people; I’m going to play in a tournament.' So, I played in a tournament and got destroyed by some 13-year-old kid who just slapped me in the face and insulted my mother.
00:29:31.480 It was basically like playing Call of Duty, except in real life. Those kids are brutal, man, so I quit. Anyway, recently somebody said, 'Hey Aaron, I’m going to a conference, and I’m going to play Magic: The Gathering again. Would you like to play?' So I was like, 'Yes! I have cards! I’d love to play!' I opened my closet and found out I have something like 6,000 cards. That’s a lot of cards!
00:30:15.000 Looking at this collection, I had a bunch of questions: 'What cards do I have? Are these cards any good? How much are the cards worth?' Some of them are good, some of them are bad. I wanted answers, but there are too many cards to sift through. So I thought, 'This process of looking through all these cards is repetitive; computers should do repetitive processes for us.' I wanted to create a system that would do this for me.
00:31:01.680 So I put on my robe and wizard hat and built a system that would handle this for me. Let’s look at the hardware first. I have my laptop with a light box and a webcam. The webcam points down into the light box and takes photos of the card. I'll show you a quick demo of what happens: you see a live feed from the camera on the right.
00:31:29.960 When I run it, it brings up something on the left. The card is extracted from the image, and on the right, the three cards are what the system thinks it is, suggesting those to me as matches. I keep putting cards under it, and it recognizes the card on the left while suggesting what it thinks is one of these on the right. I save, confirm it’s right, and it continues.
00:32:31.760 The high-level process is that we take a photo, extract the card from the photo, and then identify the card, trying to recognize it. I'm glossing over identifying the card; we’ll zoom in and talk about that soon. Card identification starts with a known corpus of cards—images tied to data. The system guesses based on that corpus and prompts me, asking, 'Hey Aaron, I think it’s one of these.' If it’s right, I confirm; if it’s wrong, I tell it the right one and teach the system. Each time I do this, it saves that data, increasing the corpus.
00:33:20.440 We have a main corpus and learned data on top of that, hopefully improving guesses to the point where I won't have to be involved anymore. The parts look like this: we have corpus information storage using SQLite; card matching using LePhash; and OpenCV for recognizing and extracting the card from the image and cropping it. Let’s discuss information gathering to get that initial corpus of data.
00:34:09.480 I wrote a web scraper to scrape the Wizards of the Coast website, extracting the card name, image, set, and rating. Knowing which set it belongs to is important because some cards are worth more in older sets. I set up my data model like this: the card has an image. It makes sense, right? I tried importing all the data, but it totally broke because of one stupid thing: there are two cards that share one image. They’re two unique cards but share the same image. So, I had to update my model to say that an image has many cards, and a card has one image.
00:34:54.680 After rerunning the program, it broke again because of one card with two images. I learned that the relationship is actually that an image has many cards, and a card belongs to many images. Now, let’s talk about the scraping technologies I used, which include promises in Ruby. How many JavaScript programmers do we have here? You love promises, right? Yeah, we have them too.
00:35:43.800 A promise in Ruby looks a bit different than in JavaScript. We create a promise and can sleep in it to do some computations. You can get the status of the promise to tell whether it's running or completed. If we call value on the promise, it will return the value after computation. We can even raise on it to cancel the operation.
00:36:30.640 The main problem with these promises is you can have too many in the system. You can create an infinite number of them, which may not be efficient for computations. To resolve this, I built an executor pool to limit the running promises. We start a pool of threads (or promises) that pop off a queue of a particular size. The result is considerably more efficient, using parallel processing.
00:37:21.320 After writing this executor pool, we run promises in parallel. I can say I want to handle only five of these requests at a time. I used this to download a lot of data from the Wizards of the Coast website. I ended up downloading about 1.6 GB of data in around 40 minutes. I wonder if they noticed me doing it.
00:38:09.560 Next is perceptual hashing. This is how I compare images, looking for a match in the database. I have a bunch of images, and I want to find the one that corresponds to the image I capture. I use the library LePhash, which wraps up libphash from the phash gem. It calculates a perceptual hash for images; think of it like calculating a hash for a key.
00:39:07.200 For example, we have a reference image on the left and a scanned image on the right. We generate numbers for both images. Phash allows us to compute the Hamming distance between these two values, indicating how many changes need to be made to make them the same. The lower the number, the closer the match; fewer changes mean higher confidence.
00:39:56.800 I also extended SQLite to add a Hamming distance function in the database. With about 22,000 unique cards, I needed to compare hashes between my scanned image and those cards. Performing this in Ruby initially took too long, so I implemented the Hamming distance function in the database for better efficiency.
00:40:44.520 Now that I have a giant corpus of images, I need to calculate hashes for them. I wrote code to calculate four hashes at a time, taking advantage of my four CPUs. Surprisingly, the CPU usage was underwhelming. Since MRI has a Global VM Lock, I sent a patch to allow the gem to unlock the GVL while calculating Hamming distances. This resulted in four times the speed increase, allowing me to use all CPUs effectively.
00:41:35.480 Recognition and cropping is the fun part, using OpenCV to find the card in the image, extracting, and resizing it. However, I realized I didn’t know what I was doing with OpenCV and found a lot of cargo-culted code on GitHub. Most of it is in Python or C++. I often had to reference various sources to create workable code. Fortunately, the authors of the OpenCV book are also authors of OpenCV, so I recommend checking that out.
00:42:24.160 The process involves taking an image from the webcam. We want the image rectangular, looking at the card, but we get an angled shot. I take a photo, and we need to preprocess it by changing it to grayscale. This can be done in OpenCV, and the result is an image represented in gray. We need the grayscale for edge detection using OpenCV’s Canny edge detector, which identifies all edges in the image.
00:43:12.340 Next, we find contours, filtering out holes. Contours are edges that form boxes. The outer contour surrounding the card is critical. When we plot all the contours, we need to figure out the largest one because I know the largest contour surrounds the card.
00:43:51.720 After identifying this largest contour, we create a polygon to define it. Apparently, we have to ensure we get the points in clockwise order, but they appear counterclockwise when plotted. This results in needing to reverse the points. Once we obtain the points, we can get a rectangle around the card. However, the card is still not rectangular due to the angle of the webcam.
00:44:39.560 To correct this, we tell OpenCV, 'Here are the points that form a rectangle,' and give it the width and height. This provides a warp matrix to create a properly positioned card. Once I obtain that new image region, I can provide the scanned card I'm looking to capture.
00:45:33.400 For card matching, it's relatively straightforward: we calculate the hash for the captured image, go to the database, sort the records by distance from that hash, save the card. As mentioned earlier, after saving the match, it becomes part of the corpus. Now, I can determine what cards I have and how good they are.
00:46:20.160 I can now answer some questions: 'What do I have?' I have scanned all these images and stored them, as well as 'How good are they?' I can order them by their rating to find my best and worst cards. The best card I own is recognized in the Magic community for its value; conversely, the worst card is not well-known and considered terrible.
00:47:10.240 As for how much they're worth, I haven’t integrated with any pricing systems yet. There are pricing websites available, but I haven't added that functionality into my system. I do face challenges with this system, like time for the camera to focus on the card when placed beneath.
00:47:58.960 When you have 6,000 cards, focusing isn’t fast enough. Sometimes, the system fails to identify the card correctly. The good news is that when I teach it the correct identification, it retains that knowledge for future recognition. Another challenge stems from similar artwork. It's tough to differentiate between cards that look almost identical.
00:48:43.280 Finally, my cat loves to move things around, and it's a constant source of disruption. He'll see a camera cable or box and play with them, so I have to manage that. Thank you very much! I have stickers of my cat if anyone would like one.