Dōmo Arigatō, Mr. Roboto: Machine Learning with Ruby

00:00:15.880 First, thanks so much for coming to this talk. It really means a lot to me that you're willing to spend your valuable time here, so thank you. I also want to thank the conference organizers, Cincinnati, and everyone who is speaking and attending, doing everything that makes this community fantastic. I'm super glad to see you here; give yourselves a round of applause—you all are great!

00:00:32.590 This talk is titled 'Dōmo Arigatō, Mr. Roboto: Machine Learning with Ruby.' A couple of months ago, I was thinking about how machine learning is often associated with Python or Java, but it doesn't have to be that way. So, I may mispronounce some Japanese terms, and my Japanese is non-existent, but I will do my best. This talk is dedicated to my younger brother, Josh, who passed away unexpectedly this summer.

00:01:07.869 All right, part zero—this is a computer talk, so we have to start with zero! I tend to speak very quickly, especially when I'm excited about topics like Ruby and machine learning, which I find super exciting. I will try to slow down and maintain a normal pace. If I start speeding up or going off track, please wave or give me some kind of signal so I can adjust—just make it big so I can see it from up here because the lights are really bright. Feel free to shout; that's also okay.

00:01:29.930 I plan to talk for about 35 minutes and then have some time at the end for questions. It’s funny; I gave a talk last year at RubyConf about garbage collection, and I was so prepared that I finished my slides multiple days in advance, which never happens. However, Matz sat down front-row center right before I started, and I rushed through the entire talk in about 25 minutes. So, I’ll try not to do that again; I’ve been practicing imagining Matz in the audience during my talks, even though I don’t think he’ll show up this time.

00:02:02.140 My name is Eric, and I’m a software engineer and manager at Hulu. A friend of mine cheerfully called Hulu Netflix with ads, which isn't wrong. You can find me on GitHub and Twitter under this unusual human hash. I write a lot of Ruby and JavaScript for work and even a bit of Go, which is nice. My side projects typically involve Ruby or Clojure. I’m proud to say that I’m also a newly-minted contributor to Hydrus, so if you haven't heard of it or are curious about what it is, come find me after the show!

00:02:39.319 I’ve been writing Ruby for about five years and about a year ago, I wrote a book called 'Ruby Wizardry,' which teaches Ruby to kids aged eight to twelve. If you're interested in that, come see me afterward. Unfortunately, I’m out of stickers, but we have a 30% off promo code from the folks at No Starch. If you want to buy the book online this week, just go to No Starch's website and use that promo code for a discount!

00:03:06.660 While this isn’t a lengthy talk, I think it helps to provide an overview of what we will cover. I’ll talk a bit about machine learning in general—particularly supervised learning and neural networks—and then focus on machine learning with Ruby using the MNIST dataset, which I will elaborate on shortly. But first, let's do a quick show of hands: how many of you feel very comfortable with machine learning? Raise your hand; great! Now, what about supervised learning? Cool! And how many of you are familiar with neural networks? Interesting! There seems to be a mix of familiarity, which is perfectly fine.

00:04:09.329 The good news is, if you didn’t raise your hand, you’ll still be just fine. This talk is somewhat introductory, and you do not have to be a mathematician to engage with machine learning. However, it does help to have some high school-level math skills, such as basic statistics, first-year calculus, or linear algebra, to better understand how machine learning algorithms function. Nevertheless, you don't need to know them to use the tools we will look at or to grasp the content of this talk.

00:05:11.000 When I think of machine learning and artificial intelligence, I envision the ability to recognize patterns. What I mean by machine learning can be boiled down to one word: generalization. It's about programming a computer to create rules to handle and process data to make generalizations without having to be explicitly programmed for every rule. Think of it as pattern recognition; you observe what constitutes a car and iterate on that knowledge, recognizing different representations across various contexts, whether it be in Cincinnati, Norway, or even on the moon.

00:05:59.280 The goal of machine learning is to help machines detect underlying patterns in datasets, grouping and making predictions about them. To illustrate this, let's consider supervised learning, which involves having a dataset where you have labeled examples—like a list saying 'this is a car' or 'this is not a car.' In supervised learning, you can perform classification or regression based on the underlying data which allow the machine to generalize from known data to unseen data.

00:06:47.070 In terms of classification, imagine that example of recognizing cars—determining whether something is a car or not. Regression, on the other hand, might involve analyzing housing data based on features such as square footage or proximity to good schools. You analyze the data to find a relationship, using scatterplots to determine housing prices based on these features, essentially performing function approximation. The aim is to explain data and uncover patterns that might escape human recognition.

00:07:30.390 To implement machine learning, we need to identify features and labels. For instance, the MNIST dataset is a collection of tens of thousands of handwritten numbers. The features here are the raw pixel values of those handwritten digits, and the labels correspond to the digits themselves—0 to 9. Thus, the process involves separating our data, defining our features and labels, and splitting the dataset into training and testing sets.

00:08:11.880 The training set includes the data that the machine learning algorithm will learn from, while the test set is used for evaluation after training. Importantly, feeding the test data into the algorithm during training is considered cheating, as it doesn't genuinely test the algorithm's ability to generalize to unseen data. You might work at a machine learning company where products involve making real-time recommendations, in which case you might test your algorithms against historical data or update them with new inputs regularly.

00:08:59.880 A crucial distinction in machine learning is between memorization and generalization; the latter is the true goal. The labels from 0 to 9 represent the digits we aim to predict based on the images of handwritten numbers. Each image corresponds to a vector of pixel intensities, where black pixels are represented by a non-zero value and white by zero. Understanding how to convert these images into a usable feature vector is key to our predictions.

00:09:53.880 We've focused on supervised learning, so let's now discuss neural networks—the tools of machine learning modeled after the human brain. Neural networks consist of perceptron inputs, akin to biological neurons. A perceptron operates almost like a function, where inputs (like dendrites) receive signals that are processed and result in output (the axon) being sent onward, determining whether to fire or not. We can model this as a simple function, where weights represent the signal's importance, and thresholds dictate whether neurons activate.

00:10:41.410 During training, we initialize neurons with random weights and then gradually adjust these weights as we learn from the data, employing a method called back propagation. This involves making predictions, assessing where we went wrong, and propagating error signals backward through the layers of the network to fine-tune the weights—letting the network learn. It's a cyclic process that continues until we reach a satisfactory level of accuracy, which can often be over 90% for certain datasets.

00:12:47.340 While neural networks can achieve impressive results, they’re also often black boxes; understanding the specific weights can be complex. The architecture typically consists of input layers corresponding to features, hidden layers that are hyperparameters to tune, and output layers that align with the labels expected from the dataset. For our case, since we’re working with digits, the output layer would need to have ten neurons.

00:13:35.700 Additionally, you can manipulate other parameters, such as the learning rate, which determines how quickly the network adjusts during training. A smaller learning rate means more time to learn without overshooting the local minima in error reduction. Moreover, we’re entering the realm of deep neural networks with more than three layers, where intricate architectures lead to breakthroughs in performance, particularly in advanced projects from tech giants.

00:14:59.880 Now that we've discussed the theoretical side, we can look at the actual data we'll be using—the MNIST dataset. We’ll use the Ruby gem called `Ruby Fann`, which interfaces with the Fast Artificial Neural Network library written in C. Then, we’ll discuss developing an application that utilizes the trained network to classify handwritten digits accurately. The MNIST dataset contains images of handwritten digits that have been resized and centered, making it simpler for the model to analyze without complex image processing.

00:16:05.279 The dataset consists of 60,000 training examples and 10,000 test cases to evaluate our model’s performance after training. It’s open-source and available for anyone interested in experimenting with it, further ensuring a transparent foundation for machine learning projects. When I tweet out and share the slides, I’ll also provide the links to access the dataset to facilitate your exploration.

00:17:15.579 I trained a neural network on the MNIST dataset to see how well we could perform. I chose a parameter of 1,000 epochs, but in practice, it often reaches minimal error before that marker. The goal was to get the highest correct classification rate, and I found that the model obtained an impressive 99.99% accuracy on the training data. However, when I tested the algorithm on unseen data, it dropped down to about 93%. This suggests that we’re managing our risk of overfitting, which is something I want to discuss next.

00:18:09.339 Overfitting occurs when a model learns too much about the noise and specifics of the training data instead of capturing the broader trends. When training too tightly on idiosyncratic elements of the data, generalizability diminishes, leading to poor performance with new cases. Factors contributing to noise can include errors in labeling, variability in human writing styles, and excessive complexity in the model itself, which emphasizes patterns that might otherwise be irrelevant.

00:19:01.350 As we develop neural networks, checking for high weights at various stages is essential. It’s crucial to monitor the number of neurons and simplify the neural net to avoid overfitting. As I said, achieving about 93% accuracy is promising and indicates a balanced trade-off. If you're interested in experimenting, feel free to pull down the GitHub repository and test your own parameter tuning. Let's explore together to see if anyone can exceed my accuracy of 93.28%.

00:20:07.740 Now that we understand how neural networks function and how they can be implemented effectively with the MNIST dataset, we’ll look at how I built and trained this neural network library in Ruby. Additionally, I will present a small network for the MNIST data and showcase an app I developed, allowing users to test the model’s accuracy and utility. As always, navigating the internet often reveals that someone has already built something similar, sometimes even better. I'm greatly indebted to Jeff Usher, who created an implementation of the MNIST dataset in Ruby on GitHub. I highly encourage you to check it out!

00:21:16.800 When I started working on my demonstration, I was surprised to find Jeff's repository, which handles touch events and adds some flair that mine currently lacks. My major contributions tend to focus on implementing various features while keeping things cohesive. In terms of front-end development, I overshot my target by integrating React, which probably only needed 30 lines of JavaScript but ended up involving many more due to the tools I integrated, such as ES6 and Webpack.

00:22:12.689 The submission code from the React component uses the Fetch API to send canvas data to a Sinatra server, which processes input and returns a JSON prediction. The intent is to create a seamless user experience where the frontend neatly displays editable canvas predictions and allows user interaction. That's the frontend side; on the backend, it employs Sinatra and the Ruby Fann gem to carry out training and testing processes. This collaborative approach allows us to see the theory discussed earlier transformed into a functional application.

00:23:21.090 You can see the solid framework put into practice as we take training data, create a new instance of the Ruby Fann artificial neural network, and define our architecture accordingly. For this particular project, I set up 576 input nodes corresponding to a 24x24 pixel image so that every pixel has its own input. I chose 300 hidden neurons—something that can be adjusted based on performance needs—and adjusted the number of output nodes to 10, representing the digits 0 through 9. As we train, the algorithm will execute a specific number of epochs, which dictates the number of times we pass through training data, continuously back-propagating and refining until we hit a desired mean squared error.

00:24:56.560 While the front end employs React and ES6, the backend uses Ruby and Sinatra with the Ruby Fann gem to streamline the training process. This framework allows us to utilize everything we’ve talked about so far in a practical way. I think I’ve spoken quite a bit already, so now I’d like to attempt some demonstration—always a bit risky, but it should be an exciting opportunity to see the model working in action!

00:25:39.880 As you can see, I’m running a local instance. The first thing to do is draw a number—let's say a seven. The model should predict whether it's correctly identified. There we go, it thinks it's three; let’s move on to experiment with a smiley face! This type of interaction allows you to test the boundaries of the model's understanding by observing how it interprets various shapes drawn by a user.

00:28:00.280 The tool predominantly identifies digits, but at times, it showcases some interesting predictions even for unconventional drawings. With some practice, we can even trick the algorithm into somewhat plausible interpretations. It’s important to remember that our tool doesn’t inherently understand these drawings but rather predicts based on the features learned during training. I hope you all enjoyed that quick demo; it certainly provides a very hands-on look at the utility of neural nets.

00:30:20.540 As we wrap up, let's summarize what we covered: We explored machine learning and understood its core principles, particularly focusing on supervised learning where labeled data helps predict unlabeled samples. We discussed neural networks as powerful tools for machine learning with their strengths and weaknesses, emphasizing generalization while being cautious about overfitting. Also, we explored how all this can be accomplished in Ruby, giving us a unique opportunity to engage with machine learning within the Ruby ecosystem. It's an exciting journey that potentially calls for us to build and enhance our tools further.

00:31:57.000 To conclude, I share this thought: While Ruby may not yet have the same robust tools as Python or Java for machine learning, the Ruby community is vibrant, motivated, and full of talented individuals. If we want to elevate Ruby's presence in the machine learning landscape, we need to actively contribute and create tools ourselves—be the change we want to see. I encourage you all to think about contributing to projects and building a better foundation for our community to harness machine learning effectively.

00:32:52.010 And finally, I'd like to leave you with a cautionary reflection drawn from personal experience. While developing intelligent models, we must be vigilant about the data we input, as biased or inaccurate data can lead to biased models, potentially influencing decisions in harmful ways. Careful consideration is paramount, especially in projects with significant social implications. This ends my presentation, and I truly appreciate your time. Now, I welcome any questions or discussions if you have them!