Whose turn is it anyway? Augmented reality board games.

by Dave Tapley

The video titled "Whose turn is it anyway? Augmented reality board games" presented by Dave Tapley at RailsConf 2017 explores the integration of augmented reality into the classic board game PitchCar. The presentation highlights the challenges of keeping track of game elements when playing board games, which can detract from the fun. Tapley discusses how he approached this problem by using web technologies, image processing, and mathematical principles to create an augmented reality solution.

Key Points Discussed:

- Introduction to PitchCar:

- Tapley explains the rules of PitchCar, a simple flicking game where players take turns moving wooden disks around a modular track. It emphasizes the challenge of keeping track of the cars' locations when they go off the track or collide.

- Capturing Image Data:

- Tapley demonstrates how he captures images using a webcam through the WebRTC API, leveraging JavaScript for real-time data processing. He discusses his initial struggles and discoveries, specifically how the massive data strings from image captures can be handled.

- Real-Time Processing with OpenCV:

- The use of the OpenCV library is discussed, particularly how it can analyze captured images to locate the cars. Tapley describes transforming data from a data URI to a format usable by OpenCV to detect car positions.

- Augmented Reality Mechanics:

- The talk covers developing a mask for detecting colors corresponding to the player's cars, employing image processing techniques to differentiate between the track and non-track environments. Tapley highlights various mathematical transformations, such as the Hough transform, to accurately locate circular objects (the cars) in the image.

- Action Cable Communication:

- Tapley shares how he utilized Action Cable for real-time updates between the server and client to enhance responsiveness, making the game more interactive.

- Live Demo:

- Towards the end, a live demo showcases the system detecting cars and indicating their positions on and off the track with real-time feedback from the participants.

- Takeaways and Future Possibilities:

- Tapley reflects on the learning experience from creating this project and suggests that advancements in JavaScript frameworks would benefit future developments in AR and gaming applications.

Conclusions:

Tapley's presentation effectively combines technical insights about web technologies and image processing with the practical application of augmenting traditional gaming experiences. The integration of these technologies not only enhances gameplay but also demonstrates the value of innovative problem-solving in the tech development space.

00:00:11.719 Hello, hi! My name is Dave Tapley. You can find me on Twitter at Dave Tapley. Firstly, thank you for coming to my talk. It's going to be fairly broad, and I'm going to show you how I've been engineering a solution to a problem in a board game.

00:00:24.300 If you are here because of the board games in the title, I want to clarify that there is actually only one board game I'm going to discuss. However, if you are here because 'augmented reality' is in the title, then yes, I have been working on an augmented reality app for the web using Rails.

00:00:36.329 To that point, almost nothing I am going to show you is probably the right way to solve these problems. It is definitely not the right tool for the job, but I have had a lot of fun with this little project. I've gotten to use some older and newer technologies in weird and wonderful ways.

00:01:02.670 But before we get into the engineering fun, we need to talk about board game fun. So, introducing PitchCar that you can see on the screen. I thought the best way to show you PitchCar, which is very simple, is to explain how it works. The game is simple: players take turns to place a piece at the start, and each person has one of these little cars, which is just a small wooden disk.

00:01:24.150 Players take turns to flick their cars around the track, and the first person to complete two laps is the winner. The track is made up of little pieces that jigsaw together; it's just a series of straights and corners. There is also a smaller track that you can build, providing 630 possible combinations of track, so you have plenty of replay opportunities.

00:02:05.490 One thing to notice about the track is these little red walls. The walls are really what makes the game fun; you can slide around them and bounce off them. However, a side effect is that there are places where there are no walls.

00:02:20.900 Coming off the track is just part of the game; it happens all the time. The good news is that you're not out of the game; you just go back to where you were. However, remembering where to return can be hard, and sometimes you can knock yourself or other players off the track simultaneously.

00:02:51.769 Even if you manage to remember where you have to go, it's challenging because it's a game all about being in the right place at the right time. It’s very easy to remember you were maybe half an inch to the left, or the other person was, for example, behind you. So, we’re all human, and no harm done, but as an engineer, I thought we could do better. So, let's have a computer track it!

00:03:16.620 Broadly speaking, I mean capturing images from the webcam and then identifying where the cars are. Then, if someone leaves the track, we will want to show where they were so they can be placed back accurately. That's cool!

00:03:50.680 Before we get into how I engineered the solution, I want to do a brief overview to justify some of the decisions I made. There are broadly three things I needed to do. One is to capture the images from the webcam. I had never played with this before; it turns out there’s this Web Real-Time Communications API.

00:04:14.890 No surprise, it involves JavaScript. Like most people, I cautiously say that JavaScript is the language of the web—it’s okay! However, I haven't played with any new frameworks in a while, so I thought I’d give Vue.js a try to increase my use of UJS.

00:04:37.130 I won’t say too much other than I'm kind of back into JavaScript now, which I never thought I would say. Once we’ve captured the images, we need a way to process them with some computer vision. That sounds terrifying in Ruby, so I basically cheated and used this old, but super stable, 18-year-old OpenCV library with a Ruby wrapper, so we don’t have to look at any C code.

00:05:15.919 I wanted it to be real-time, as I didn’t want to have to submit the form every time someone took a shot, as that would be tedious. I also got to play with Action Cable, which was a good entry point for me.

00:05:40.610 In summary, this is how broad the talk is going to be: we're going to see this web RG protocol, and I'll explain how much I like PJs probably more than I should. OpenCV will be on the back end, which will involve a bunch of math. However, you will be okay, and then Action Cable will be responsible for sending it all back and forth.

00:06:07.150 The plan is that the bulk of the talk will focus on capturing these images, sending them over to the server, processing them with OpenCV to determine where the cars are, enabling us to show you how to place them back when they come off the track. If we get through all of that, we might even see a demo that may or may not work.

00:06:49.930 The first step was capturing the image, which seemed like the logical place to start. I've never really seen Vue components before, so I found this new webcam component on GitHub, which has a simple 'Hello World' implementation. The main element that Vue is going to take over is the video tag, which implements WebRTC, and below that, there's a button element for triggering the photo capture.

00:07:34.010 The only interesting thing about that button is the '@click' directive in Vue, which is meant to handle the on-click event. In the image tag, there's a 'v-bind:src' that allows the source of the image tag to update based on whatever is in the component's photo variable.

00:08:06.339 The corresponding JavaScript snippet implements the behavior of that 'take photo' button when you click it. It looks fairly simple, but I started wondering what this 'photo' variable is. How do you represent a file or image data in JavaScript? Spoiler: it’s a string.

00:08:44.379 The 'photo' variable is a massive string, specifically a Data URI. Mozilla describes it as a URL prefix for the data scheme, allowing content creators to embed small files inline in documents. 'Small files' loosely means under a gigabyte, which I think is reasonable for this context.

00:09:05.150 I wanted to play around with these Data URIs, so I found a handy website where you can input a URL, hit 'generate Data URI,' and receive the output at the bottom. However, it can be misleading because if you copy the text from that input, it can actually produce a very large string.

00:09:34.400 What amazed me was that if you stick that massive string in the source attribute of an image tag and load it in a browser, it just works! So, I realized that source attributes don’t always have to link to files; they can also contain the data itself.

00:10:07.570 As a result, I was able to call that a success. Now that we have our Data URI in Ruby land on the server, the next step is to get it loaded into the OpenCV library.

00:10:36.790 If you look at the OpenCV documentation, you'll find a class which is the closest thing to an image class. OpenCV IPL (Intel Image Processing Library) has read methods, but they expect a filename. This presented the challenge of figuring out how to fit our string-shaped Data URI into that file-shaped load method.

00:11:14.740 It’s fairly obvious that everything after the comma in a Data URI is the actual data. Thus, I figured I could slice off everything before the comma and use just the data part, but it seemed a bit ugly to handle that manually. Thankfully, Dee Balls had written a Data URI gem, which I found extremely helpful.

00:11:57.000 This gem allowed me to handle Data URIs well. I simply opened a file in binary mode, wrote the data to it, and then passed that to the IPL image load method. I was shocked that it worked right off the bat. Typically, we would say, 'You can’t just write binary strings to files in Ruby,' but it really worked the first time.

00:12:36.090 Once we had our image loaded, I needed to give a brief segue regarding matrices. I know most of you are probably familiar with matrices, but for those who aren't, it’s essentially a rectangular array of values. In essence, images are just matrices, with each cell in the matrix representing the color of a pixel.

00:13:15.290 OpenCV makes it clear by having its image class actually inherit from the matrix class. Thus, in OpenCV, an image is just a table of pixels. If you need more convincing that an image is just a table, there’s a spectacular website where you can upload an image and download it as a spreadsheet, with the background colors of the cells set to match the pixels. It's fabulous!

00:13:51.300 Now, with our image successfully loaded, let's begin the process of identifying the car pixels. You could assume that the blue pixels belong to the car, so we need a way to instruct OpenCV to tell us which pixels are blue. As web programmers, we can refer to a web color definition of blue and use the equality method that OpenCV provides, which gives us a binary matrix showing whether each pixel is blue.

00:14:32.000 The result is what I am referring to as a mask—a new matrix that, instead of containing colors, contains truth values. For every pixel, is it blue? While it would be nice to visualize the masks directly, OpenCV has limitations on visualization. However, the matrix class does offer a save image function, but again, it requires a filename.

00:15:26.020 I decided to take a page from our earlier strategy and attempt to work with Data URIs again, but this didn't work quite as easily as before. The principle remains the same: write to a temporary file in binary mode, read it back as a string, and then use base64 encoding. Eventually, we could recover the Data URI again.

00:15:51.140 While I assumed this would be computationally intensive, it turns out that the file system often doesn't bother writing these files directly to disk, so performance remains adequate. Once we have our mask as a Data URI, we can stick it into the source attribute of an image tag, generating a displayable image.

00:16:20.289 However, I have to emphasize that the chances of a pixel being exactly the blue color we defined, within the large RGB color space, is essentially zero. Instead, we want to consider a range of blues. Fortunately, OpenCV can help with this. A very similar method allows us to supply two colors, defining a min/max range, and OpenCV will tell us which pixels fall within that range.

00:17:11.340 This way, we can create a more dynamic mask from the pixels that are blue but also avoid just aiming at random bits of dust that happen to be blue in the image. To visualize these mask changes, I sent the data back from the Action Cable server to the browser, creating a new data URI containing the updated mask with the range function.

00:18:01.790 On the JavaScript side of this, when creating the subscription in the Vue component's created callback, I added a function that would be invoked whenever it receives a message from Action Cable. When the message containing our mask data URI arrives, I can set that to the corresponding model variable.

00:18:50.300 The beauty of this is that now the image just magically updates whenever JavaScript receives the action cable message. I was quite impressed by how simple this setup was. This approach eliminates the need to copy-paste our mask Data URI every time.

00:19:23.470 However, we still didn't have a way to dynamically update the color ranges for our mask detection, so we looked into that—getting used to Vue.js. In the HTML snippet, you’ll see a 'v-model' directive, which allows the input inside it to bind to a range object that holds the min and max values for our color thresholds.

00:19:49.230 Additionally, I've added an @change attribute which is similar to an on-click handler. This is part of the developing pattern we're seeing, where changes to our input send messages back over Action Cable. The methods are defined in the component where I specify that every time the HSV input is adjusted, we send an update message to Action Cable.

00:20:35.840 This message will indicate the full ranges for red, green, and blue colors. When this data comes back to the Ruby side, we can extract the ranges and utilize them for our OpenCV processing too.

00:21:16.480 After this exchange, we update our mask accordingly. The final result was a visually pleasing little component that would dynamically adjust as we input different color ranges. Essentially, every input change sends a message of Action Cable with updated ranges, and with each alteration, a new mask is created and returned as a Data URI.

00:21:56.640 The outcome was pleasing as we were able to sift through very specific colors and accurately pinpoint where the car was on the track. However, merely having the mask wasn't enough. We wanted to ascertain the exact XY coordinates of the car's position.

00:22:39.720 This is where the Hough Transform or Hough Transform math comes into play! This remarkable piece of math allows us to analyze a matrix of truths and falses, identifying where the circles are located. The good news is that if the car is circular, we can get its x and y position from the resulting object.

00:23:47.200 Using a few parameters from the wisdom of the internet, I found and tweaked the settings until I could reliably identify the center of the car’s circle in the image, returning the center coordinates. This worked surprisingly robustly and effectively.

00:24:17.820 The final challenge was determining which car fell off the track. In a simplified state of the game, it's pretty clear that the blue car is on the track and the pink car got knocked off. However, the orange car is elsewhere, potentially under a chair as players get excited.

00:25:11.850 Reflecting on this scenario, I thought we might have three possible states: cars are either on the track, in the image, completely off the track, or outside the camera's view. However, as I simplified my logic, I realized that there were only two conditions: a car is either on the track or it’s not.

00:25:56.300 This realized optimization led to defining a mask where all tracked pixels are true, allowing me to delete everything that isn't on the track. Consequently, if the car is in the image, by definition, it must be on the track.

00:26:42.820 After successfully building this mask—simply requiring some geometry based on the start line of the track—I developed a tool with Vue.js and Action Cable that allows you to visualize and adjust the mask.

00:27:19.990 At this stage, I had to introduce some highly sophisticated code for describing tracks. Essentially, every track starts with a straight piece, then turns, and I’ve manually crafted this code based on my understanding.

00:27:59.010 I crafted an algorithm to determine the next piece based on its previous piece, tracking its rotation and preserving the scale, allowing me to draw the track in the proper location on the mask.

00:28:39.400 The next piece should follow a pre-determined order: starting with the first piece, I modified how the code handles drawing the corners onto the mask, using OpenCV's existing methods to facilitate the drawing.

00:29:40.280 By integrating recursive functionality, each piece of the track could be identified and positioned correctly based on the previous piece's attributes, ensuring the path is drawn accurately and systematically.

00:30:46.299 As these methods combined, we reached the point where our mask could facilitate our car detection logic—a simple premise where, if the car was in the imagery, it confirmed its presence on the track.

00:31:24.710 The core logic checks whether it sees the car and confirms its status. If the car goes off the track and can no longer be seen, we simply leave a marker where it originally was so players could replace it accurately back on track.

00:32:03.500 Now, let’s see if this setup works in real-time! Because the track was slightly bumped during setup, we'll fine-tune the position to ensure alignment.

00:32:24.330 This is the demo where we can visually check if the augmented reality setup works. So essentially, as cars are flicked, the system should track their movement accurately.

00:32:45.210 The system should grab hold of car locations and provide live feedback on their status while interacting in real-time game scenarios, allowing a smooth gaming experience.

00:33:08.400 Every time a car goes off track, the UI should respond painting an overlay or question mark to indicate where it was last seen. This clearly shows the accurate feedback for players.

00:34:09.300 To summarize, once you explore the features and toolsets of Vue.js and Action Cable, you'll find incredible versatility in creating dynamic interactive projects with reactive elements, whether it's games or any real-time application.

00:35:03.500 I think one of the biggest takeaways is the power of using Action Cable to facilitate real-time communication with web components. As we advance into newer frameworks like Rails, a lot more JavaScript will surely define how we create the next generation of reactive applications.

00:35:43.870 The continuous evolution of these technologies means we will see better automations in tracking systems and enhanced visual feedback systems in gameplay, making experiences even smoother.

00:36:53.800 Engaging in projects like these allows us to think creatively about the power of augmented reality and its capacity to revolutionize board games. Thank you all for listening!