00:00:25.760
Who remembers email before Gmail? Who remembers the massive amount of spam that we used to get before Gmail? Yes, we do. I remember when I switched from having my Excite email account to having Gmail. It was like entering a new haven where I didn't have to spend all of my day marking things as spam or not spam and deleting things out of my inbox. It was wonderful.
00:00:56.800
When I was a kid, I listened to my favorite radio station and I had a cassette in the tuner, waiting for that opportune moment when I could run across the room and hit record. After about six hours, I'd have maybe 20 minutes of a good mixtape. Of course, I no longer have to do that because I have Spotify and Pandora on my phone. It's really amazing because I don't have to spend any time making mixtapes anymore, although they are kind of fun to make.
00:01:36.320
What do these two things have in common? They're both using data to solve a problem and to make our lives much easier: Pandora makes our life easier, and Gmail makes our life easier with spam filtering. So today, I'm here to issue every single one of you a challenge: Somewhere in this room, there is somebody who's going to make the next Gmail, or maybe it's a group of people. Data, as we all know, is a big deal. Data, big data, whatever you want to call it—it’s a bunch of marketing jargon, but it is a big deal. Data is the new bacon.
00:02:17.920
I really like this shirt; I don't actually eat bacon, but this is so awesome! But you might be saying that I am at the wrong conference. I'm not at JavaOne, I'm not at PyCon, and I'm not at Clojure. This is not a big data conference; this is a Ruby conference. I must be in the wrong place because Ruby is not a big data language.
00:02:49.040
But I am going to agree with what Ben pointed out before this talk, which is that Ruby has lots of tools. Ruby has plenty of tools, whether it's in C libraries, regular Ruby, RubyGems, or Java through JRuby. Ruby has many tools. But unfortunately, a lot of us don't know how to go about solving data-type problems because data science and machine learning is kind of a big mess. If you were to open up a journal on machine learning, you would see lots of comments about things that most of us don't understand because honestly, most of the academics are trying to understand it themselves.
00:03:37.440
On top of that, Ruby is not about complex math. Mats created Ruby for our happiness; complex math was not created for our happiness. But that can actually work in our favor because Ruby is such a wonderful language. I personally love working with Ruby.
00:04:12.640
My name is Matthew Kirk. I have been doing data science and machine learning for eight years, a long time before I even realized that it was called machine learning; I learned it as operations research. I have been doing Ruby for five years. I really love both—I love Ruby and I love complex math. I like machine learning.
00:05:09.600
Today, we're going to cover feedforward neural networks. Doesn't that sound exciting? Neural networks are an extremely vast subject, so we're going to condense it quite a bit because this is 30 minutes long. We're only going to talk about feedforward neural networks. There is plenty more that you can learn about, but let's just focus on one thing. We're going to go through an example of how to classify strings to a particular language, and we're going to do it in a test-driven fashion. Just to prove to you that I'm not making things up, we're going to actually demo it at the very end.
00:06:40.560
Neural networks are a supervised learning method, which, if you remember from the last talk, is basically the idea that you have data points that have particular labels, and you want to learn something from that—a particular pattern. Neural networks are really interesting because you don't have a lot of restriction; you can map just about anything with them. That's why I like to call them kind of a 'sledgehammer'; they're really good at taking any functional approximation and making it happen.
00:07:31.680
If anybody here has done computer science, you've most likely heard about neural networks a little bit, and some of you probably have seen a graph like this. Neural networks are split up into three different layers. You have the input layer, which is simply the inputs—these are zero to one binary type inputs—true or false type inputs. It's quite simple to know; it’s just the input of what you're looking at. The thing that confuses people the most would have to be this hidden layer, where we're adding complexity to this particular function. We can't see what's going on in there; it's more or less like a private method of this function.
00:08:30.599
Lastly, we have the output layer, which again is just an output from this function. Inside these neural networks, conceptually, we have neurons. If you have ever taken biology, we all have neural networks in our brains comprising little neurons that communicate with other neurons. It's the same idea with artificial neural networks, except that artificial neurons are just a weighted sum. For example, x1 and x2 are weighted together and then wrapped in activation functions. The output of these neurons has meaning, which leads us to a better analogy of logic.
00:09:22.960
Neural networks and perceptrons are based on the idea of threshold logic, where you can input data and determine whether it's true or false, similar to a digital logic gate or Boolean logic. Neural networks simplify things further with fuzzy logic; things no longer have to be zero or one; they can be somewhat true or somewhat false, essentially a range between 0 and 1. This is extremely powerful because it allows us to operate in the gray area.
00:10:42.800
We can think of a neuron as taking two inputs, adding them together in a particular logical way, much like a digital logic gate, and then outputting a signal to the next neuron. The weighted sum wrapped in a function operates to ensure all outputs fall between 0 and 1. These terms can be complex, but the critical takeaway is that we have a learning curve in machine learning. When we start learning something, we often begin at a low level of understanding and, over time, improve until we reach a plateau. The Gaussian and sigmoid functions help bring outputs closer to binary classifications.
00:11:07.920
There is one important aspect to neural networks, and that is finding the weights in these sums. You could brute force it, which would take an inordinate amount of time, or you could use training algorithms like backpropagation. Essentially, the process aims to minimize error. To visualize this, imagine standing at the top of a hill in thick fog, seeking the steepest descent. Walking in that direction repeatedly will ultimately lead you to the valley bottom.
00:12:06.040
The valley bottom represents minimizing error. To do this effectively, we want our model to accurately represent the real data it is modeling. However, neural networks can be dense, and you can delve deeper into concepts like RBF networks and deep learning, which means there is a vast quantity of information available.
00:12:56.080
Feedforward neural networks create complex models, ultimately outputting true or false types of statements, yielding a fuzzy logic function. But being at a programming conference, let’s focus on practical implementation: What can you actually do with a neural network? To set the stage, being an avid foreign language student, I once typed a German word into Google Translate and wondered how Google achieved its translations.
00:13:11.960
This inquiry led me to the realization that Google probably employs an advanced system for classifying languages, especially among Latin languages like English, German, Polish, Swedish, and Finnish, as these share common characters, making them challenging to predict. The first step in any machine learning problem is collecting data, as it serves as the foundation. Without data, machine learning can't proceed. I chose to use the most translated book, the Bible, as it is freely accessible, providing ample data with minimal intellectual property issues. However, text isn't inherently numerical.
00:14:52.520
What we could do is split the text into stems or analyze letter frequencies. I opted to try letter frequency, extracting how often letters appeared in the text belonging to different languages. This analysis often uncovers certain characteristics and commonalities across languages, making it possible to determine patterns. This is a bit like an episode from the show 'Ghostwriter' where they cracked codes using the prevalence of letters in the English language.
00:15:20.480
Each language exhibits specific letter usage, such as Polish having many Y's and Z's or Finnish containing numerous vowels. These traits allow us to create a probabilistic model to classify languages effectively. Building the neural network would typically start in an environment like IRB while trying various gems. I believe that applying test-driven development to machine learning challenges offers a fantastic approach, as it's greatly beneficial while tackling scientific problems. You formulate a hypothesis, test it, and iterate over time.
00:16:40.720
In test-driven development for neural nets, you'd test elements like seams and whether your model underfits or overfits data. Conceptually, your model would consist of many input nodes corresponding to unique characters across the languages you're analyzing, alongside hidden units and final output nodes representing language classifications.
00:17:30.720
To illustrate testing seamlessly, one would ensure input validity where all character representations sum to one, indicating proper probabilistic distributions. This means that if you had a sentence made up of characters A and B, each should account for 50% of the representation. Additionally, testing for underfitting is crucial, as mathematical models tend to either underfit or overfit the data. A model's internal error decreases, while the actual error fluctuates, resembling a polynomial-like behavior. The key is to minimize both errors through cross-validation.
00:18:51.680
Cross-validation entails partitioning your dataset into a training set—used to train the neural network—and a validation set, which is measured against actual performance for error calculation. For example, you could split your dataset to validate a language classifier, targeting a specific percentage of error, trying to remain under 5%.
00:19:39.840
Crucially, it’s not about the precise number but rather documenting your first assumptions to gauge improvements or required adjustments in error control as you progress. I'd also like to highlight Occam's Razor, which suggests that if a model requires too many iterations, it might not be ideal; simpler models often provide better outcomes.
00:20:26.080
Thus, it’s beneficial to maintain a maximum iteration limit in your model. Let’s go ahead and examine some practical code. In the seam test, my goal is to ensure that characters from A through Z are processed correctly. I verify that vectors contain those characters without duplicating any letters and that they sum to one.
00:21:02.320
Additionally, when testing cross-validation, I compare verses from Matthew and Acts, utilizing a helper function to measure their performance against the predicted output. The productivity of neural networks comes in part from their speed; they can process checks in constant time relative to their operations—essentially going from O(n) complexity to O(1) or even faster since we're merely handling weighted sums.
00:22:02.720
For instance, here's a demonstration of my application: typing in a German stem like "gazoon" yields immediate translation results. The network recognizes the language, showcasing its speed and efficiency compared to traditional computational methods.
00:22:42.640
Moving forward, I encourage you to explore my GitHub repository, where you can find the source code for the language predictor. I'm also writing a book on machine learning, information that can be found on my website. If you prefer digital communication, feel free to follow me on Twitter.
00:23:45.920
In conclusion, this is just the beginning. Neural networks and machine learning are expansive fields, but they are vital to learn about. Many of us work on Ruby on Rails applications that generate data, enabling us to craft exciting tools through machine learning. It’s an exhilarating area to delve into, and I wholeheartedly encourage each of you to explore this domain further. Thank you for your attention!