Ruby Video | Five Machine Learning Techniques That You Can Use In Your

MountainWest RubyConf 2014

Five Machine Learning Techniques That You Can Use In Your Ruby Apps Today

Benjamin Curtis

MountainWest RubyConf 2014

#machine-learning

#recommendation-systems

Five Machine Learning Techniques That You Can Use In Your Ruby Apps Today

Benjamin Curtis • March 17, 2014 • Earth

In the presentation titled "Five Machine Learning Techniques That You Can Use In Your Ruby Apps Today," Benjamin Curtis discusses the increasing relevance of machine learning (ML) techniques for Ruby application developers. The goal is to introduce a selection of practical ML strategies that can enhance the functionality of Ruby apps without requiring advanced mathematical knowledge. The presentation is structured around the following key points:

Overall, the presentation serves as an accessible introduction to various machine learning techniques that Ruby developers can adopt, emphasizing practical applications and encouraging further exploration.

Five Machine Learning Techniques That You Can Use In Your Ruby Apps Today
Benjamin Curtis • March 17, 2014 • Earth

Five machine learning techniques that you can use in your Ruby apps today By Benjamin Curtis

Machine learning is everywhere these days. Features like search, voice recognition, recommendations - they've become so common that people have started to expect them. They're starting to expect the apps we build to be smarter.
Ten years ago, machine learning and data mining techniques were only available to the people dedicated enough to dig through the math. Now that's not the case.
The most common machine learning techniques are well known. Standard approaches have been developed. And, fortunately for us, many of these are available as ruby gems. Some are even easy to implement yourself.
In this presentation we'll cover five important machine learning techniques that can be used in a wide range of applications. It will be a wide and shallow introduction, for Rubyists, not mathematicians - we'll have plenty of simple code examples.
By the end of the presentation, you won't be an expert, but you'll know about a class of tools you may not have realized were available.

Help us caption & translate this video!

http://amara.org/v/FG2G/

MountainWest RubyConf 2014

00:00:25.720 All right, good morning everybody! It is a pleasure to be here with you. This has been a fantastic conference. I want to thank all the presenters who have gone on before, and of course Mike, who has done a great job putting this all together.

00:00:34.120 I am Ben Curtis. I am from the Seattle area; I actually live in Kirkland. For those who live outside of Seattle, that may not matter, but if you do live in Seattle and I say I'm from Seattle, that's totally false. I just want to be clear about that.

00:00:45.399 I am one of the co-founders of Honeybadger.io alongside starh horn and Josh Wood. If you don't know about Honeybadger, you really should check it out. It's an awesome service for Ruby developers. You can find me at Stimpy on the interwebs. I’m excited to be talking about machine learning techniques today.

00:01:11.920 It's been fun to put together this material, and I hope that we can spend a few productive minutes together discussing it. I apologize in advance if you are a machine learning expert because I am not, but I enjoy this topic. I hope not to offend you with any inaccuracies, and please correct me later if I mislead anyone.

00:01:40.680 We're going to learn about machine learning; it’s a vast topic, so I will not cover everything in detail. Instead, I want to discuss it from the perspective of a Ruby developer. I aim to understand how machine learning techniques can be applied in applications. If that doesn't sound interesting to you, feel free to find something else to do for half an hour.

00:02:01.000 There is a plethora of information available, so once you start learning, you can continue to dig deeper. If you're intrigued by what you learn here, you can definitely look up more on Wikipedia and other resources. My main goal today is to introduce you to some essential phrases and keywords that will help you construct a productive line of research into machine learning.

00:02:27.200 Machine learning seems like a big, mysterious topic. However, once you know a few key terms, it can start to make sense and open doors to deeper understanding. We’ll explore some of these key words that will help you on your journey to finding the right patterns or algorithms relevant to your projects.

00:03:00.320 Another title for this talk could be 'Making Sense of a Bunch of Data.' To me, machine learning is essentially about helping manage the massive amounts of data we deal with daily and making smart decisions or uncovering interesting trends from that data.

00:03:07.040 Before diving deeper into the talk, I’d like to share a warm-up joke. I haven't seen any warm-up jokes yet today, so I hope I can be the first.

00:03:22.840 I love that joke! I was sharing it with my family over dinner the other night, and I had to explain what TCP and UDP were. I love my kids; they're great. They humor me, and it was a lot of fun.

00:03:35.560 Before we delve into machine learning and how to make use of the data you will have, we need to recognize the importance of data itself. Logging is crucial; there's a notable blog post by Jay KPS, who works at LinkedIn, discussing setting up a logging infrastructure within a business.

00:03:54.680 He outlines how important logging is, not just for storing strings to text files that someone might sift through later. Think a little more abstractly: consider all the data flowing into your enterprise and what happens to it. Are you throwing it away because you're not tracking it, or are you finding systematic ways to log it for later analysis?

00:04:08.560 This task can be viewed as step zero of any machine learning endeavor: first, you need to have substantial data to learn from. Sometimes we forget when dealing with data that the timing of events can be as crucial as the content itself, especially when discussing recommendation systems and freshness of data.

00:04:36.200 The 'when' of things happening can be critical; without logging, you won't know when things occurred. To help you in your logging ventures, here are a few products you might consider starting with for effective logging in your environment.

00:04:54.720 I am most familiar with Logstash, which works well with Elasticsearch and Kibana for visualization. There are other great systems like Amazon's offerings if you're willing to pay for them. Apache Kafka, written by Jay Kreps, also addresses logging, although it can be complex to get started with it.

00:05:14.120 Clustering is one of the foundational ML techniques we need to explore. Given a lot of data, we want to make sense of it by organizing it into smaller segments. Machine learning involves analyzing this data and breaking it into comprehensible parts.

00:05:21.440 When we approach clustering, we want to analyze everything first and look for patterns. The principal algorithm used in clustering is K-means clustering. It's a straightforward concept: given a set of points, K-means allows us to place them into K buckets based on their similarities.

00:05:44.920 Essentially, we define K, which is the number of clusters we want, and K-means categorizes the data points into those clusters based on their proximity to the cluster centroids. A notable challenge of K-means is deciding the value of K, but this allows for effective data organization.

00:06:06.720 K-means is classified under unsupervised learning techniques, where the computer identifies patterns without much external guidance. Let's take a look at how K-means works in a practical example. Imagine we have lots of data points.

00:06:29.600 We choose a K of 3, meaning we want to create three clusters. We start by randomly selecting three points as centroids. Each of the other data points is assigned to its closest centroid. After that, we adjust the centroids to the average position of the points in each cluster, and the process repeats until the clusters stabilize.

00:06:57.960 For example, when visualizing this data, if we wanted to display a large number of houses on a map, showing thousands of points is not helpful. Clustering the houses into groups would allow us to present the information in a much cleaner way.

00:07:23.160 Let's consider a practical scenario: using a clustering algorithm like K-means effectively manages and visualizes large datasets, making it easier for users to navigate through data-rich environments.

00:07:46.000 From here, we can dive into more focused categories by talking about supervised learning techniques, which imply that we provide computers with specific guidance on how to treat data. One interesting example of supervised learning is decision trees.

00:08:06.560 Decision trees work similarly to conditional statements in programming. We can teach a machine learning model how to classify data by giving it a set of input criteria. For example, we could classify temperature readings into different health statuses based on certain thresholds.

00:08:32.560 The power of decision trees lies in their ability to apply conditions and classifications based on the input data. This concept can be extended to various applications including writing articles, analyzing data, and even predicting outcomes.

00:08:54.960 A fascinating example is a decision tree algorithm that can generate articles for news outlets. When an earthquake occurs, the system can quickly compile relevant information based on preset criteria, leading to timely article publishing.

00:09:21.760 Such applications of decision trees highlight their practical uses in processing real-time scenarios and automating routine tasks. However, while decision trees are straightforward, our next discussion will take us a step further in classification.

00:09:44.520 Classifiers, such as the Bayesian classifiers widely used in spam filters, classify data based on learned patterns from a given corpus of data. By analyzing the features of input text, these algorithms can categorize content accurately.

00:10:00.560 Let's take sentiment analysis as an example. Many online platforms and review sites utilize classifiers to determine the general sentiment of user comments and feedback by analyzing keywords and tone.

00:10:23.360 For instance, a classifier could examine product reviews to differentiate between positive and negative sentiments through the words being used. This is a powerful tool for shaping personalized user experiences and marketing decisions.

00:10:46.080 As mentioned earlier, naive Bayesian classifiers are an elementary approach in text classification that leverages probability to determine classifications based on observed patterns. This flexibility allows businesses to refine their customer engagement strategies.

00:11:09.680 Moving on to more advanced algorithms, one such technique is latent semantic indexing (LSI). LSI facilitates deeper semantic understanding of content by analyzing texts based on their meaning and association.

00:11:34.480 By examining the co-occurrence of words, LSI can identify similar documents without needing human-intuited meanings. This capability makes it useful for efficient searches and recommendations.

00:11:56.440 In the Ruby realm, libraries like the Classifier gem provide easy tools to implement classifiers and LSI techniques, allowing Ruby developers to utilize these powerful methodologies in their applications.

00:12:20.480 As we dive deeper into practical applications, the implementation of recommendation engines will become central to our understanding of machine learning in real-world scenarios.

00:12:43.440 Recommendation algorithms assess user preferences and suggest content based on similar user behaviors, utilizing collaborative filtering techniques. This is prominent in e-commerce and entertainment sectors, driving personalized experiences.

00:13:07.600 For example, if a user frequently buys books on Python, recommendation engines will infer connections to suggest related topics or similar titles, shaping an intuitive shopping experience.

00:13:31.320 The Jaccard index is another interesting algorithm used in machine learning. It compares the similarity of collections to establish how closely related they are based on data shared between them. This can assist in clustering and classification, providing insights into data relationships.

00:13:55.520 In practical terms, classes can be easily categorized into different groups using the Jaccard index. This insight can be employed in identifying groups of users with shared interests, adapting services to their preferences.

00:14:20.000 An effective way of leveraging the Jaccard index is through databases and libraries that facilitate smooth operations over large datasets. This accelerates query execution and enhances performance.

00:14:44.720 Recommendation engines can benefit significantly from tools that use Jaccard similarity for refining suggestions for users. This involves not just presenting popular items but truly tailored experiences based on cluster analysis.

00:15:07.520 Finally, remember that the learning algorithms you choose can have a substantial impact on engagement and satisfaction. Tools made for handling large quantities of information should be considered.

00:15:29.520 As we wrap up, I hope this provides a broad overview of the available machine learning techniques developers can adopt in their Ruby applications. Exploring and implementing these concepts can significantly enhance application functionality.

00:15:52.240 Thank you all for your time, and I welcome any questions or discussions you might have. I'm looking forward to seeing how we can all leverage these techniques for better software solutions!

Benjamin Curtis

explore all talks recorded at MountainWest RubyConf 2014

Explore all talks recorded at MountainWest RubyConf 2014

MountainWest RubyConf 2014