Ruby Video | Intelligent Ruby: Getting Started with Machine Learning

Intelligent Ruby: Getting Started with Machine Learning

#machine-learning

#neural-networks

#natural-language-processing

#recommendation-systems

Intelligent Ruby: Getting Started with Machine Learning

Ilya Grigorik • September 18, 2010 • Earth

In the presentation titled "Intelligent Ruby: Getting Started with Machine Learning" by Ilya Grigorik at the GoGaRuCo 2010 event, the speaker delivers a concise introduction to machine learning, emphasizing its accessibility and practical applications, particularly in Ruby. Grigorik outlines several significant themes and points during the talk:

In summary, Ilya Grigorik encourages attendees to explore machine learning's capabilities and highlights that while algorithms may seem intricate, they are fundamentally rooted in simple, actionable insights. He promotes leveraging data-driven approaches and optimizing runtime while applying ensemble methods to enhance performance, ultimately inspiring exploration in this innovative field.

Overall, the session aims to alleviate the intimidating reputation of machine learning, making it more accessible to Ruby developers and enthusiasts, while showcasing practical applications and encouraging further exploration.

Intelligent Ruby: Getting Started with Machine Learning
Ilya Grigorik • September 18, 2010 • Earth

Machine learning is a discipline that is concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data â€” a fancy name for a simple concept. Behind all the buzzword algorithms such as Decision Trees, Singular Value Decomposition, Bayes and Support Vector Machines lie the simple observations and principles that make them tick. In this presentation, we will take a ground-up look at how they work (in practical terms), why they work, and how you can apply them in Ruby for fun and profit.

No prior knowledge required. We will take a quick look at the foundations (representing and modeling knowledge, compression, and inference), and build up to simple but powerful examples such as clustering, recommendations, and classification â€” all in 30 minutes or less, believe it or not.

Help us caption & translate this video!

http://amara.org/v/GZSq/

GoGaRuCo 2010

00:00:13.880 I have several goals for this presentation. One is to talk a little bit about the general trends in machine learning and hopefully get you excited about some of the things happening in this field. The second goal is to communicate that even though there is a lot of academic math involved in the literature of this space, it's actually very easy to get started.

00:00:30.279 A lot of it relies on very simple core insights that we can all use. Lastly, I hope to inspire you to go out and explore these ideas further.

00:00:50.640 So, first of all, what comes to mind when you hear 'machine learning'? Terminator? AI? Chess-playing programs? Roomba?

00:01:02.760 When I ask this question, a lot of people picture complicated mathematical formulas with linear algebra, calculus, and optimization theory all mashed into one field. While that's certainly true, it is, I think, unnecessarily complicating things.

00:01:31.240 The reason for this is that any course in AI or machine learning, whether at a university or self-taught, often focuses on the algorithm itself. We generally have inputs, a runtime (which may be your CPU, GPU, or whatever), and an algorithm—essentially the core insight of how your machine will learn.

00:01:56.840 Learning in general is hard—this is something we all know intrinsically. Here, you're attempting to teach a machine to learn something, which adds complexity. I believe we should expand this idea for several reasons, and I'll explain why.

00:02:22.640 First, consider the runtime. My experience with machine learning was very theoretical. I took various courses and explored the field independently, only to find that academics often treat the runtime as merely a practical constraint.

00:02:48.000 A survey of machine learning faculties at most universities will typically show a heavy presence of statisticians and optimization theorists, focusing on mathematical proofs.

00:03:00.680 While this is interesting, it often overlooks the runtime. For them, the runtime is just a constraint—someone will eventually build a machine that can run it, regardless of whether it requires a terabyte of memory. However, I came to realize that I couldn't run my supposedly great recommendation algorithm on anything but the most trivial datasets.

00:03:39.799 I discovered that my local computer science department housed a fabled machine with 40 terabytes of memory and lots of CPUs. I pursued access to it, spending two months negotiating until I successfully logged in one day, only to find that I only had access to 768 megabytes of memory.

00:04:09.879 What I had wasn't the supercomputer I imagined, but rather a commodity cluster of 50 machines. I learned quickly that practical constraints exist with distributed systems, which led me to explore the challenges they present.

00:04:39.560 Today's cloud computing platforms make a significant difference, but there's still not a lot of research done. In fact, you all are ahead of most academic research in your knowledge of distributed systems.

00:04:55.640 Next is the data input. Until recently, data input has been scarce. For instance, in natural language processing, researchers often worked with datasets of millions of files. Today, however, we have trillions of pages on the web to work with.

00:05:58.440 The Ruby community plays a crucial role in generating terabytes of unstructured and structured data. Yet, we face a conundrum: we have the capability to process huge datasets, but we're also collecting data at an unprecedented rate. So, what does that mean for us?

00:06:44.840 This brings us to an interesting paper published in the early 2000s by a group of researchers from Microsoft titled 'More Input vs. Better Algorithms.' They wanted to determine how throwing orders of magnitude more data at existing algorithms would impact performance.

00:07:58.600 What they found was fascinating; as you increased the input size, the performance of all algorithms improved. For instance, one algorithm, referred to as learner 5, started at about 78% accuracy and performed even better with more data. This phenomenon eventually led to what's termed 'Data-Driven Learning,' emphasizing the importance of having access to large datasets.

00:08:57.840 You can even go so far as to make your data the algorithm itself. Consider a simple example—anyone reading the jumbled text on the slide would struggle to determine how many distinct words are present due to the absence of spaces; however, recognizing 'word segmentation' can be tricky, as this is an ongoing challenge for many in the field.

00:10:28.200 How do we go about solving this problem? We could build a grammar model, grab a toolkit that already provides a solution, or make an educated guess. The latter could involve estimating the probability of word segmentation based on how frequently certain letters appear in known data.

00:11:38.919 For example, you could write a web scraper for Google, search for specific letters, and determine their frequency across the web. Employing such a simple, data-driven approach can yield remarkably accurate results, akin to what Google does through its translation services.

00:12:50.679 This concept demonstrates how algorithms in machine learning often revolve around simple insights. One example of this is learning through compression. Compression, which is essentially about identifying significant concepts within data, can serve as a useful metaphor for understanding machine learning.

00:14:37.880 Let's say you're tasked with predicting whether a certain fruit is tasty. You might hypothesize that the feel and color of the fruit are determining factors. By gathering data and plotting it, you can visually separate the 'tasty' fruits from the 'not tasty' ones using a decision boundary - this embodies the perceptron algorithm, a basic model for classification in machine learning.

00:16:21.320 However, perceptrons have limitations and don't work for all datasets. Consider a scenario where you're only working with color as your dataset—you simply cannot draw a linear boundary that effectively separates two classes in that case. The key insight is that you can expand your data space into more dimensions, allowing for a clearer boundary.

00:17:36.399 This realization directly connects to support vector machines, where data is thrown into n-dimensional space, helping separate positive from negative examples. Thankfully, Ruby developers have resources at their disposal—libraries like libsvm, which is well-supported and conducive for tackling straightforward tasks like spam classification.

00:18:41.200 Moving on to another significantly impactful area in machine learning, we have recommendation systems. In essence, you have users and objects (like movies) that get ranked by users. By leveraging linear algebra concepts, we can analyze what a specific user might like based on their preferences, enabling us to predict rankings for unrated objects.

00:20:53.600 Matrix decomposition can help us do this. Consider an image, which represents a large matrix of pixel values. With methods like singular value decomposition (SVD), we can effectively compress this information while still roughly preserving the important characteristics of the image. This is a basic yet powerful approach used in computer vision.

00:22:27.440 Similarly, we can apply this concept to recommendation systems by finding significant features within the data. By running SVD on a matrix derived from user-object interactions, you can discover essential correlations that help improve recommendations.

00:23:38.760 Next, we have clustering, an essential machine learning task. Just like humans can visually discern clusters among data points, we can devise algorithms to do just that. The challenge is defining the 'similarity' between data points, as it greatly influences how clusters are formed.

00:24:48.880 For simple strings, you can immediately see similarities, as they share common characters. The challenge lies in determining how to quantify that similarity. By employing compression techniques, you can compare two strings to identify shared similarities based on size reductions resulting from compression.

00:26:07.360 Once you derive a similarity score based on compression, you can effectively cluster huge datasets by measuring degrees of similarity. This approach is powerful because, unlike domain-specific methods, it requires no prior knowledge of the data.

00:27:50.160 In summary, we've discussed runtime, data input, and algorithm performance. Many algorithms may appear complex but have simple core insights. Additionally, ensemble methods demonstrate the power of combining multiple simple models for solving complex problems.

00:29:24.400 This strategy is exemplified in competitions like the Netflix prize, where collaboration led to improvements. My final note is to encourage everyone to leverage data-driven learning, focusing on runtime, and applying ensemble methods for better performance. Thank you all for your attention!

00:30:53.000 Unfortunately, we don’t have time for questions now, but I encourage you to catch me during the break. Thank you once again!

explore all talks recorded at GoGaRuCo 2010

Explore all talks recorded at GoGaRuCo 2010

GoGaRuCo 2010