Test Driven Neural Networks with Ruby

Talks

Matthew Kirk

1 talk

#neural-networks

#machine-learning

#test-driven-development

#artificial-intelligence-ai

#data-science

#ruby-programming

Test Driven Neural Networks with Ruby

by Matthew Kirk

The video titled "Test Driven Neural Networks with Ruby" features Matthew Kirk presenting at the MountainWest RubyConf 2014. The talk focuses on utilizing test-driven development (TDD) to implement neural networks, specifically feedforward neural networks, aimed at classifying languages based on sentence input. Kirk begins by discussing the relevance of data in modern technology, drawing parallels between tools like Gmail and Spotify that leverage data to enhance user experience. He emphasizes the richness of Ruby for data science despite it not being traditionally recognized as a big data language.

The presentation introduces neural networks as supervised learning models, highlighting their structure composed of input, hidden, and output layers, akin to biological networks. He explains concepts like weighted sums and activation functions, clarifying how they relate to logic gates and fuzzy logic. Kirk mentions the importance of finding weights in neural networks and proposes the use of training algorithms, particularly backpropagation.

Key points discussed include:

The Structure of Neural Networks: Composition includes input, hidden, and output layers with neurons functioning as weighted sums.
Learning Process in Neural Networks: Involves minimizing error through methods like backpropagation, visualized through a gradient descent analogy.
Data Collection Techniques: The presentation advocates using letter frequency analysis, taking the Bible as a data source due to its wide availability for various languages.
Testing in TDD: Emphasizes employing test-driven approaches to ensure proper model functionality and performance, focusing on practices to avoid underfitting and overfitting while optimizing model accuracy.
Implementation and Example: Demonstrates the language classification application, showcasing its instant processing capabilities by typing in words and retrieving results in real-time.

Kirk encourages the audience to utilize the provided GitHub resources for further exploration and expresses his excitement about the future of machine learning within the Ruby ecosystem. He concludes by highlighting the expansive potential of neural networks and their applications in real-world data problems, urging attendees to engage more deeply in this innovative field.

00:00:25.760 Who remembers email before Gmail? Who remembers the massive amount of spam that we used to get before Gmail? Yes, we do. I remember when I switched from having my Excite email account to having Gmail. It was like entering a new haven where I didn't have to spend all of my day marking things as spam or not spam and deleting things out of my inbox. It was wonderful.

00:00:56.800 When I was a kid, I listened to my favorite radio station and I had a cassette in the tuner, waiting for that opportune moment when I could run across the room and hit record. After about six hours, I'd have maybe 20 minutes of a good mixtape. Of course, I no longer have to do that because I have Spotify and Pandora on my phone. It's really amazing because I don't have to spend any time making mixtapes anymore, although they are kind of fun to make.

00:01:36.320 What do these two things have in common? They're both using data to solve a problem and to make our lives much easier: Pandora makes our life easier, and Gmail makes our life easier with spam filtering. So today, I'm here to issue every single one of you a challenge: Somewhere in this room, there is somebody who's going to make the next Gmail, or maybe it's a group of people. Data, as we all know, is a big deal. Data, big data, whatever you want to call it—it’s a bunch of marketing jargon, but it is a big deal. Data is the new bacon.

00:02:17.920 I really like this shirt; I don't actually eat bacon, but this is so awesome! But you might be saying that I am at the wrong conference. I'm not at JavaOne, I'm not at PyCon, and I'm not at Clojure. This is not a big data conference; this is a Ruby conference. I must be in the wrong place because Ruby is not a big data language.

00:02:49.040 But I am going to agree with what Ben pointed out before this talk, which is that Ruby has lots of tools. Ruby has plenty of tools, whether it's in C libraries, regular Ruby, RubyGems, or Java through JRuby. Ruby has many tools. But unfortunately, a lot of us don't know how to go about solving data-type problems because data science and machine learning is kind of a big mess. If you were to open up a journal on machine learning, you would see lots of comments about things that most of us don't understand because honestly, most of the academics are trying to understand it themselves.

00:03:37.440 On top of that, Ruby is not about complex math. Mats created Ruby for our happiness; complex math was not created for our happiness. But that can actually work in our favor because Ruby is such a wonderful language. I personally love working with Ruby.

00:04:12.640 My name is Matthew Kirk. I have been doing data science and machine learning for eight years, a long time before I even realized that it was called machine learning; I learned it as operations research. I have been doing Ruby for five years. I really love both—I love Ruby and I love complex math. I like machine learning.

00:05:09.600 Today, we're going to cover feedforward neural networks. Doesn't that sound exciting? Neural networks are an extremely vast subject, so we're going to condense it quite a bit because this is 30 minutes long. We're only going to talk about feedforward neural networks. There is plenty more that you can learn about, but let's just focus on one thing. We're going to go through an example of how to classify strings to a particular language, and we're going to do it in a test-driven fashion. Just to prove to you that I'm not making things up, we're going to actually demo it at the very end.

00:06:40.560 Neural networks are a supervised learning method, which, if you remember from the last talk, is basically the idea that you have data points that have particular labels, and you want to learn something from that—a particular pattern. Neural networks are really interesting because you don't have a lot of restriction; you can map just about anything with them. That's why I like to call them kind of a 'sledgehammer'; they're really good at taking any functional approximation and making it happen.

00:07:31.680 If anybody here has done computer science, you've most likely heard about neural networks a little bit, and some of you probably have seen a graph like this. Neural networks are split up into three different layers. You have the input layer, which is simply the inputs—these are zero to one binary type inputs—true or false type inputs. It's quite simple to know; it’s just the input of what you're looking at. The thing that confuses people the most would have to be this hidden layer, where we're adding complexity to this particular function. We can't see what's going on in there; it's more or less like a private method of this function.

00:08:30.599 Lastly, we have the output layer, which again is just an output from this function. Inside these neural networks, conceptually, we have neurons. If you have ever taken biology, we all have neural networks in our brains comprising little neurons that communicate with other neurons. It's the same idea with artificial neural networks, except that artificial neurons are just a weighted sum. For example, x1 and x2 are weighted together and then wrapped in activation functions. The output of these neurons has meaning, which leads us to a better analogy of logic.

00:09:22.960 Neural networks and perceptrons are based on the idea of threshold logic, where you can input data and determine whether it's true or false, similar to a digital logic gate or Boolean logic. Neural networks simplify things further with fuzzy logic; things no longer have to be zero or one; they can be somewhat true or somewhat false, essentially a range between 0 and 1. This is extremely powerful because it allows us to operate in the gray area.

00:10:42.800 We can think of a neuron as taking two inputs, adding them together in a particular logical way, much like a digital logic gate, and then outputting a signal to the next neuron. The weighted sum wrapped in a function operates to ensure all outputs fall between 0 and 1. These terms can be complex, but the critical takeaway is that we have a learning curve in machine learning. When we start learning something, we often begin at a low level of understanding and, over time, improve until we reach a plateau. The Gaussian and sigmoid functions help bring outputs closer to binary classifications.

00:11:07.920 There is one important aspect to neural networks, and that is finding the weights in these sums. You could brute force it, which would take an inordinate amount of time, or you could use training algorithms like backpropagation. Essentially, the process aims to minimize error. To visualize this, imagine standing at the top of a hill in thick fog, seeking the steepest descent. Walking in that direction repeatedly will ultimately lead you to the valley bottom.

00:12:06.040 The valley bottom represents minimizing error. To do this effectively, we want our model to accurately represent the real data it is modeling. However, neural networks can be dense, and you can delve deeper into concepts like RBF networks and deep learning, which means there is a vast quantity of information available.

00:12:56.080 Feedforward neural networks create complex models, ultimately outputting true or false types of statements, yielding a fuzzy logic function. But being at a programming conference, let’s focus on practical implementation: What can you actually do with a neural network? To set the stage, being an avid foreign language student, I once typed a German word into Google Translate and wondered how Google achieved its translations.

00:13:11.960 This inquiry led me to the realization that Google probably employs an advanced system for classifying languages, especially among Latin languages like English, German, Polish, Swedish, and Finnish, as these share common characters, making them challenging to predict. The first step in any machine learning problem is collecting data, as it serves as the foundation. Without data, machine learning can't proceed. I chose to use the most translated book, the Bible, as it is freely accessible, providing ample data with minimal intellectual property issues. However, text isn't inherently numerical.

00:14:52.520 What we could do is split the text into stems or analyze letter frequencies. I opted to try letter frequency, extracting how often letters appeared in the text belonging to different languages. This analysis often uncovers certain characteristics and commonalities across languages, making it possible to determine patterns. This is a bit like an episode from the show 'Ghostwriter' where they cracked codes using the prevalence of letters in the English language.

00:15:20.480 Each language exhibits specific letter usage, such as Polish having many Y's and Z's or Finnish containing numerous vowels. These traits allow us to create a probabilistic model to classify languages effectively. Building the neural network would typically start in an environment like IRB while trying various gems. I believe that applying test-driven development to machine learning challenges offers a fantastic approach, as it's greatly beneficial while tackling scientific problems. You formulate a hypothesis, test it, and iterate over time.

00:16:40.720 In test-driven development for neural nets, you'd test elements like seams and whether your model underfits or overfits data. Conceptually, your model would consist of many input nodes corresponding to unique characters across the languages you're analyzing, alongside hidden units and final output nodes representing language classifications.

00:17:30.720 To illustrate testing seamlessly, one would ensure input validity where all character representations sum to one, indicating proper probabilistic distributions. This means that if you had a sentence made up of characters A and B, each should account for 50% of the representation. Additionally, testing for underfitting is crucial, as mathematical models tend to either underfit or overfit the data. A model's internal error decreases, while the actual error fluctuates, resembling a polynomial-like behavior. The key is to minimize both errors through cross-validation.

00:18:51.680 Cross-validation entails partitioning your dataset into a training set—used to train the neural network—and a validation set, which is measured against actual performance for error calculation. For example, you could split your dataset to validate a language classifier, targeting a specific percentage of error, trying to remain under 5%.

00:19:39.840 Crucially, it’s not about the precise number but rather documenting your first assumptions to gauge improvements or required adjustments in error control as you progress. I'd also like to highlight Occam's Razor, which suggests that if a model requires too many iterations, it might not be ideal; simpler models often provide better outcomes.

00:20:26.080 Thus, it’s beneficial to maintain a maximum iteration limit in your model. Let’s go ahead and examine some practical code. In the seam test, my goal is to ensure that characters from A through Z are processed correctly. I verify that vectors contain those characters without duplicating any letters and that they sum to one.

00:21:02.320 Additionally, when testing cross-validation, I compare verses from Matthew and Acts, utilizing a helper function to measure their performance against the predicted output. The productivity of neural networks comes in part from their speed; they can process checks in constant time relative to their operations—essentially going from O(n) complexity to O(1) or even faster since we're merely handling weighted sums.

00:22:02.720 For instance, here's a demonstration of my application: typing in a German stem like "gazoon" yields immediate translation results. The network recognizes the language, showcasing its speed and efficiency compared to traditional computational methods.

00:22:42.640 Moving forward, I encourage you to explore my GitHub repository, where you can find the source code for the language predictor. I'm also writing a book on machine learning, information that can be found on my website. If you prefer digital communication, feel free to follow me on Twitter.

00:23:45.920 In conclusion, this is just the beginning. Neural networks and machine learning are expansive fields, but they are vital to learn about. Many of us work on Ruby on Rails applications that generate data, enabling us to craft exciting tools through machine learning. It’s an exhilarating area to delve into, and I wholeheartedly encourage each of you to explore this domain further. Thank you for your attention!

Matthew Kirk

1 talk

MountainWest RubyConf 2014