Machine Learning

Summarized using AI

Neural Networks with RubyFANN

Ethan Garofolo • August 11, 2013 • Earth

In this talk at the LoneStarRuby Conf 2013, Ethan Garofolo introduces the concept of neural networks (NNs) and discusses their applications using the RubyFANN library.

Key points covered in the presentation include:

- Understanding Neural Networks: The speaker explains that neural networks are function approximators that use input data to find underlying patterns or rules, emphasizing the experimental nature of fine-tuning these models.

- Components of Neural Networks: Each NN consists of input nodes, hidden neurons, and output nodes, with a focus on how these nodes interact during the training phase.

- Training Process: Neural networks learn by adjusting the weights of connections based on errors encountered during processing, using algorithms like backpropagation to refine performance over time.

- Applicability of Neural Networks: It is crucial to assess whether a neural network is suitable for a specific problem, with a warning against mistakes that can arise from improperly interpreting categorical data.

- Practical Applications: Garofolo discusses spam detection as a case study, sharing his experience with data input variables and outputs when using RubyFANN to construct a spam detector. He notes that despite some success, neural networks may not be the optimal solution compared to other methods.

- Tic-Tac-Toe Example: He also presents an example of analyzing game strategies using neural networks, contrasting different modeling approaches and highlighting the importance of data quality and learning contexts.

- Further Applications: The speaker concludes by mentioning potential uses of NNs in facial recognition, handwriting recognition, and gesture detection, encouraging experimentation with various parameters and activation functions.

Ultimately, Garofolo advises that while neural networks can be powerful, they require careful design and experimentation to effectively address real-world problems and underscores that every technical solution is fundamentally a human problem.

Neural Networks with RubyFANN
Ethan Garofolo • August 11, 2013 • Earth

Neural networks (NNs) not only sound really cool, but they can also solve some pretty interesting problems ranging from driving cars to spam detection to facial recognition.

Solving problems with NNs is challenging, because actually implementing a NN from scratch is difficult, and knowing how to apply it is more difficult. Fortunately, libraries, such as RubyFANN, exist to handle the first problem. Solving the second problem comes from experience.

This talk will show a few different approaches to applying NNs to such problems as spam detection and games, as well as discussing other areas where NNs might be a useful solution.

Help us caption & translate this video!

http://amara.org/v/FG8g/

LoneStarRuby Conf 2013

00:00:15.200 Welcome to the talk on Neural Networks with RubyFANN. I am Ethan Garofolo, your humble presenter here today.
00:00:18.800 I own Big O Studios, a consultancy based in Round Rock, where we do web application development in a Ruby-based web framework that must not be named because this is a Ruby conference, right? I also use Node.js, I’m writing a book about it, and I do Unity 3D development as well. Sometimes, all of that work is part of the same project, and of course, I have my super secret projects that I can't talk about. But we all have those, right?
00:00:51.199 Anyway, today we’re going to talk about neural networks. We’ll touch on what they are in general, how they operate, and explore strategies for applying them to problems you might be trying to solve. We'll discuss some specific examples and also talk about other problems that could potentially be solved with neural networks. Lastly, we will have time for a Q&A.
00:01:15.680 So, I do want to ask, how many of you here have formal computer science training, like attending school for it and obtaining a degree? Okay, or at least have some degree of exposure to it. I'm glad it wasn't just all the hands because chances are, if you have a CS background, you probably got some exposure to neural networks. However, we're going to review it anyway because it's really useful.
00:01:37.280 One other caveat I want to mention is that people go to graduate school to study neural networks, often spending three to four years working toward a PhD. I have not done that, and while we might touch on some deep understanding of neural networks, we probably won't get that deep. If we tried, we could be here until LoneStarRuby 2017, and I imagine you all have better things to do in the meantime.
00:02:05.680 So, let's get into it. What exactly are neural networks? They are function approximators. Two parts there: functions and approximators. You've likely seen functions before—not just the function keyword in JavaScript or C++, but the concept of mapping a domain to a range.
00:02:19.120 For example, consider the identity function. This is something you probably saw in high school. We often leave high school thinking that all functions map numbers to other numbers. That's not necessarily the case; they only have to map domains to range values. For example, what if my input is a low cat? As long as the output is also a low cat, that's still the identity function.
00:02:40.640 One thing we noticed in high school is that, when solving problems, you receive an input and a function and are tasked with calculating the output, or you’re given an output and need to find the input that produces it. In real-world applications, though, you don't always know what the function is, and that's where a neural network comes in. You have a set of observations and aim to find the rule that takes you from one to the other.
00:03:00.640 So, we approximate that with a neural network. Why do we call it an approximator? Because they are rarely 100% accurate; it's not an exact science, and there’s a lot of experimentation involved. Scott Belware, who spoke earlier today, made a great analogy about an engineering console in a submarine. You have an infinite array of dials, switches, and valves that you can tweak to get the machine running just the way you want it.
00:03:23.360 Neural networks function similarly, with various parameters to tweak, and it takes experimentation to find what works. It's important to acknowledge that you won’t always know whether it will work for your specific problem when you start—a challenge in itself!
00:03:41.600 Now, as for how they work, here’s an example of what a neural network might look like. The circles represent what we call nodes. On the left, indicated as n sub i, are the input nodes. You get to choose what those inputs are, though they must be numeric values. The inputs need to be representable as numbers.
00:03:50.720 On the very right, you have output nodes, which you will have one or more of—it's essential to have some outputs; otherwise, what’s the point of the neural network? The number of output nodes varies based on your specific problem. The nodes in the middle are what we call hidden neurons. They are called hidden because your input and output do not influence them directly.
00:04:00.080 In this example, the three hidden layers shown have four neurons each, though they don’t have to have the same number; that’s up to you. What the hidden neurons do is approximate a function. You can graph functions. The more hidden neurons you have, the more complicated the graph of your function can be. Unlike in metal music where more volume is always better, for neural networks, more hidden neurons are not always better. It truly depends on your application and how much training data you have.
00:04:20.960 I've mentioned input and training. The goal of a neural network is to train it to correctly recognize data. When you first initialize a network, each node will have a weight assigned to it. This weight signifies its importance, not the type of weight you're probably thinking of. You must have a corpus of training data—observations with expected outputs.
00:04:37.680 You then take a subset of these observations and feed them into your neural network, which will propagate through and likely be wrong for a long time. However, there are algorithms that will go back and adjust those weights, one of which is called backpropagation. Backpropagation basically takes all the errors from the output layer and feeds that information back through the network to adjust the weights accordingly.
00:04:59.280 There's a complicated math theory behind this, but the gist is that the network being wrong at first is okay—that's how it learns. Much like humans, who learn by making mistakes, the process requires iterative adjustments to develop accuracy. If I were teaching a neural network the identity function, I would provide inputs of 1, returning an output of 1, then 2 for an output of 2, and so on. Ideally, over time, the network would learn that if given four, it should produce four again.
00:05:21.720 So how do you know when to apply a neural network to a problem, or whether it’s necessary? Like many answers I could provide today, it’s hard to lay out a recipe for this. A critical thing to consider is working with enumerated data. Let’s say one of your inputs is the type of food someone is selecting—Italian food, Mexican food, Korean food, etc. You need to reduce that to numeric data.
00:05:41.559 If you choose to represent those categories on a single node, you might assign Mexican food a value of zero, Italian food a value of 0.5, etc. However, this creates a risk of introducing false information. The network treats the output assigned to one input as closer to another, rather than reflecting that these foods are distinct categories. Such bias could mislead your network, which may be concerning.
00:06:05.440 To overcome this, one option is to give a separate node for each possible value, setting them to either one or zero. Another thing to be mindful of is that while I worked at a company called Seven Billion People, which produced the RubyFANN gem, we had a neural network that adjusted website content based on user click stream data. The challenge we faced was the unpredictability of neural networks.
00:06:23.520 Once you’ve trained a neural network—even if it’s 100% accurate—what you end up with is a text file with numbers that describe how many nodes are present and their weights. If someone opens that file, even if they created it, they might be lost in how the network processes data. This might not be ideal if you’re after investment dollars, as the fogginess could be concerning for stakeholders.
00:06:43.760 Keep in mind that every technical problem we solve is ultimately a human problem, and that perspective can matter at your company’s stage.
00:07:03.920 Now let’s break this down into some specific problems. The first one I'll discuss is spam detection in emails. Those of you with experience in spam detection already know that neural networks are not the most effective solution. So, let's say we are committed to using neural networks for this—how might we approach it? This is where RubyFANN comes in.
00:07:19.679 RubyFANN is composed of two components: Ruby, which you’re likely familiar with, and FANN, which stands for Fast Artificial Neural Network. FANN is a C library. You can check it out; it’s amazing. RubyFANN provides Ruby bindings for this C library, written by Steven Myers, who runs Tangled Path here in town and is a good friend of mine.
00:07:29.440 To use RubyFANN, you need to install it—just like any other gem. You can run a command like 'gem install ruby-fann' or add it to your Gemfile. If you don't have FANN installed on your system, RubyFANN will handle that for you by downloading and compiling the source. If you’ve had trouble installing the Postgres gem, you’d appreciate RubyFANN taking care of its dependency issues.
00:07:49.840 Now, I’ve provided some code examples in the repository. Yes, it’s called 'tic-tac-toe' because I did two examples, starting with tic-tac-toe. Let’s say you want to build a spam detector using a neural network. You may be wondering what inputs or characteristics of an email could indicate it's spam or ham. What do you guys think?
00:08:08.960 For example, linking it to the sender it came from might be a great factor, and you could develop a history for it. There are numerous factors to consider here, and I ended up creating over 82 different inputs for my spam detection example.
00:08:57.440 Some of the inputs I utilized included the character count, word count, alphanumeric character count, and the ratio of alphanumeric characters to non-alphanumeric ones. For instance, characters like links contribute to a different character compilation. I also looked at the number of words that appeared only once compared to the overall word count. To gather my data, I sourced a corpus of emails from csmining.org, which has an excellent repository for data mining.
00:09:30.240 After organizing that data, I was able to process it into a database. The relevant inputs were derived from turning text into numeric representation, and this code is available in my GitHub repository as well.
00:09:51.679 Next, we come to the actual utilization of RubyFANN. The library has a class called TrainData within the RubyFANN module, where you simply provide it the inputs and outputs in parallel arrays. If you're unfamiliar with the concept of parallel arrays, they are multiple arrays of the same length where the nth element corresponds with one another.
00:10:12.080 This approach allows for efficient encapsulation of your training data. To create your network, you utilize the standard class, providing a hash containing three key-value pairs: the number of input neurons, the array of hidden neurons specifying how many you want in each layer, and the output neurons.
00:10:31.440 Upon training the network with your data, you need to determine how many epochs—the number of times you want your training data processed until it meets a certain error threshold. Keep in mind that the threshold should align with the domain’s requirements; for example, higher accuracy might be critical for a space shuttle launch than for analyzing tweets.
00:10:53.520 By wrapping this logic into a method, I found that training my neural network was an interesting experience to witness. Each iteration provided feedback on performance, revealing the initial accuracy levels and mistakes, thus allowing for continuous adjustments.
00:11:14.000 Yet, I found that the network struggled to learn effectively from bad inputs or misclassified data. However, this situation presented an opportunity to analyze what went wrong and tweak various parameters for performance improvement. Examples include adjusting the number of hidden neurons or exploring other inputs.
00:11:41.440 Later, I wrapped the classification processes into a function where I applied the trained neural network to new emails, testing its accuracy against previously unutilized data. Every positive result represented ham, and negatives represented spam. I noticed that my network often hovered around zero, indicating uncertainty.
00:12:04.080 Ultimately, the neural network performed as poorly as random guessing, but it didn’t perform worse, which provided me some reassurance. This reiterated the truth that neural networks are not the ultimate solution for every problem; Bayesian networks often provide more accurate results in these contexts.
00:12:25.520 However, observing the process of breaking down and analyzing the problem provides valuable insight for real-world applications. Now, let’s shift gears to another example, tic-tac-toe, a game we're all likely familiar with. I realized I could model this problem with two different neural networks.
00:12:49.680 In one approach, I simply represented the board state, using negative one for squares controlled by the opponent, zero for neutral squares, and positive one for occupied ones by the player. The output had nine corresponding nodes, one for each square.
00:13:08.720 The challenge with this method was that I was hand-programming the examples, making the network no more effective than my own tic-tac-toe skills. Ultimately, I theorized that the perfect player is the one who doesn’t play at all, as illustrated in the film War Games.
00:13:30.960 The second approach considered the board state along with the move taken and the outcome of the game. It exhibited remarkably exciting results. When I began training the second network, it was evident how certain variables impacted gameplay.
00:13:58.560 For instance, it would output the square's choice, favoring the center and the best next moves. However, I found that it struggled to learn effectively how to block an opponent’s success or establish victory, showcasing the limitations of training based solely on previous inputs.
00:14:19.360 While conducting tournaments, I introduced a third participant that made random choices during gameplay. If a player can't outperform random guesses, that’s indicative of a problem. Thus, I ensured my neural networks faced off against the random approach under various circumstances.
00:14:48.800 Surprisingly, the X-player (the one who goes first) typically won against the random player; however, there were many drawn conclusions due to the inherent nature of tic-tac-toe.
00:15:08.080 It was fascinating to observe the board state player consistently won, regardless of who moved first. This discrepancy likely stemmed from the larger number of gameplay data points feeding into the winning algorithm, as opposed to the limited learning represented in the board state-only model.
00:15:29.440 With enough data, the board state with results managed to drastically exceed the capabilities of the board state-only player. I attributed this success to its enriched learning dataset, allowing for a more holistic view of future moves.
00:15:48.560 This further drives home the importance of tweaking and experimenting with neural networks. I encourage you to refer to the RubyFANN documentation, which outlines various parameters, including activation functions for nodes. Each activation function influences how nodes interpret numeric inputs, and finding the optimal setup requires careful consideration.
00:16:12.320 For instance, one activation function squashes inputs into a range of negative one to one. The choice of activation function can significantly impact model performance, so it’s essential for you to experiment with different values—both at the node and layer levels.
00:16:33.600 Beyond these experimental processes, neural networks can be incredibly useful in facial and handwriting recognition, driving assistance systems, and gesture detection. For gesture recognition on mobile devices, for instance, you can break down the screen into a grid where each contact point from touch gestures serves as input to the network.
00:16:54.240 With that, I’d like to pass the floor back to you all. If there are any remaining questions or discussions on practical applications, I would be glad to address them. Thank you for your time today!
Explore all talks recorded at LoneStarRuby Conf 2013
+25