Paris.rb Conf 2018

Food, Wine and Machine Learning: Teaching a Bot to Taste

Recorded in June 2018 during https://2018.rubyparis.org in Paris. More talks at https://goo.gl/8egyWi

Paris.rb Conf 2018

00:00:11.450 Please welcome my friend, Ian. I think it's totally on purpose to talk about food right before lunch, so thank you very much for being so hungry.
00:00:23.779 Thank you to everyone here and the organizers for having me. I'm so excited to be here. Let's get started. First off, a little bit about myself: I used to do Ruby on Rails development back in the States, in D.C. After a while, I got tired of it and took a break to travel, study, and work in winemaking.
00:00:39.170 During the next six years, I worked in Australia, France, California, and New Zealand. Throughout this time, I missed learning new things every day, as winemaking is an agricultural vocation with a seasonal aspect. I wanted to keep learning, so I recently returned to technology. Now, I’m living in Wellington, New Zealand, working as a senior developer.
00:01:06.110 So about this talk: the reason I got into machine learning and started exploring it is quite practical. People often ask me challenging questions about the many kinds of wine, and no one knows what they should be drinking. I thought surely there must be a way to automate this. That's why I started learning about machine learning—not just because it was cool.
00:01:30.649 To take a step back, what is machine learning? You can't discuss machine learning without considering the broader domain of artificial intelligence. Artificial intelligence is basically making software do smart things. Within that, we have machine learning, which involves creating its own models based on training data rather than hard-coded rules.
00:02:00.619 Deep learning is a subset of machine learning that uses artificial neural networks with many layers, requiring large amounts of training data. And encompassing all of this is natural language processing, which interprets and processes language, whether spoken or written.
00:02:36.400 As I said, machine learning does not rely on coded rules; it creates its own models based on training data. There are two main types of machine learning: supervised learning, where your training data includes known outputs, and unsupervised learning, where the models identify hidden structures in unlabeled data. There are numerous algorithms available for various problems.
00:03:25.360 Machine learning has many advantages. Its accuracy improves as you collect more data, and it can automate and learn automatically. It can also be fast and customizable based on your data and is scalable, depending on how you implement it.
00:03:45.950 So if it's so great, why aren't more Rubyists using it? There are reasons why people in the Ruby community might not be adopting machine learning as much. One reason is that we really like Ruby and may not want to write in Python, which is the leading language in data science with many libraries and tools.
00:04:30.790 Additionally, there’s a time constraint. Familiarizing yourself with the algorithms can be daunting, especially if you feel you lack knowledge in that area. I’m not a data scientist, but that didn't stop me from experimenting with machine learning, and it shouldn’t stop you either.
00:05:02.790 To tackle the first point about doing machine learning in Ruby, there are several resources available. There are many machine learning gems and natural language processing gems I’ll share later. Also, there's a library called Picol that allows you to call Python libraries from Ruby. So, the lack of tools built in Ruby shouldn’t deter you.
00:05:40.000 Furthermore, there is a community called SciRuby that is working on developing more data science tools specifically for the Ruby community. Therefore, we can perform machine learning within Ruby.
00:06:02.690 If you don't want to dive into different algorithms or the details of implementation, many APIs for machine learning and natural language processing are available. There are several significant ones I won’t go into detail about today, but there are plenty of options for you to experiment with.
00:06:51.135 Now, if you want to get started with machine learning, there are a few things you need to consider. First and foremost, consider what question you want to answer. This is crucial because not every question is suitable for machine learning, and you may need to refine how you plan to solve your question.
00:07:18.360 Second, it is essential to have access to quality data. What constitutes good data? It should be representative of future data, complete (with no gaps), and rich in relevant features or attributes while minimizing noise.
00:07:36.880 The general rule is that the more data you have, the better the model you can create. To summarize the machine learning process: you start with a set of data and split it into training and test datasets. Preparing these may involve data cleaning, filling in gaps, and figuring out which elements are relevant.
00:08:01.170 You would employ the training dataset to create your training model and then use your test or cross-validation dataset to predict and evaluate the performance of your predictions. Through this process, you optimize until you reach the desired level of performance and accuracy.
00:08:30.610 However, there are significant challenges in machine learning. People often forget that while machine learning can do many things, it doesn’t solve everything. One major challenge is that mistakes in your training data can be hard to detect, particularly if you're using customer-generated data or surveys.
00:09:06.540 Using such data to create predictions can lead to erroneous outcomes, resulting in performance below expectations. Also, achieving 100% accuracy is nearly impossible, and testing can be complicated when considering edge cases.
00:09:21.260 You must also consider whether your future data resembles the data used in training. If there are significant changes in the data characteristics, your model may not be relevant. Biases in your training data can also become magnified, complicating success measurement.
00:09:58.670 There are situations where using machine learning is not appropriate. If the rules are known, well-defined, and finite, there is no reason to set up a machine learning model and run predictions, as they may not be as accurate as if you already know the rules.
00:10:27.170 If you determine you need extremely high accuracy, machine learning may not be suitable for your situation. Additionally, if data is unavailable or challenging to obtain, machine learning may not be the solution for you.
00:10:52.520 With these challenges in mind, let’s explore a practical example: the wine bot. The goal here is to address the problem where average consumers find wine intimidating and don’t know how to select wine for their meals. To solve this, I decided to build a chatbot that would educate, entertain, and match wine with food.
00:11:26.050 Breaking that problem down, my happy wine bot will need to converse with users to answer their questions. It needs to understand the tastes and flavors of food, the tastes and flavors of wine, and also evaluate food and wine pairings together.
00:11:52.600 First, teaching it to converse was quite challenging; I spent a lot of time developing chat responses. I decided to use a natural language processing API service because it’s a complex field, and it’s best to let someone else handle that. I integrated it with both Twitter and Facebook using available gems.
00:12:14.960 Next on my list was understanding the tastes and flavors of food. This task requires considering sensory experiences. Our mouth can sense basic tastes such as sweet, salty, sour, bitter, and umami. Additionally, our sensory experience also includes physical sensations such as spiciness or temperature.
00:12:42.340 To fully understand flavors, we must also consider the role of the nose. Depending on the researcher, we can detect anywhere from 10,000 to 1 trillion smells with our noses, contributing significantly to our perception of taste.
00:13:13.510 Many factors affect how we perceive food, including ingredients, herbs, spices, and cooking methods, which can influence the taste. For wine matching, these factors severely impact the flavors, textures, weight, and intensity—elements that sommeliers consider when pairing food and wine.
00:13:49.279 When teaching my bot about food, I realized that training data was vital. I determined that ingredients and cooking methods resemble recipes, and there are many recipes available. However, I needed complete and detailed recipes, so I restricted my inputs to selected reputable online recipe sites.
00:14:08.260 Additionally, I utilized my own cooking experiences to generate supervised training data. With this data, I decided to create one classification model for each flavor attribute based on recipes, leading to over 50 attributes I need to predict per recipe.
00:14:36.210 The goal of this work is to predict how any given recipe will taste. For example, with a hamburger recipe, I would attribute different tastes like salty, sweet, sour, and bitter based on the ingredients and flavors involved.
00:15:00.520 This would form my training data, allowing me to classify hundreds of recipes manually to create the model. The bot will use this training to analyze any provided recipe text, processing and classifying the food accurately.
00:15:25.480 Teaching the bot about wine is a bit more challenging. Wine taste perception involves varied factors, including winemaking decisions and climatic influences that affect viscosity, flavor, tannins, acidity, and other aspects that sommeliers consider when pairing wine with food.
00:15:53.860 However, the challenge with teaching the bot about wine is that consistent wine data is often lacking. Wine tasting reviews vary greatly in length and detail, sometimes being incomplete. Thus, I focused on broad categories like Bordeaux or Champagne, compiling over 100 wine categories with more than 40 attributes to guide wine pairings.
00:16:30.740 Food and wine pairing involves several strategies. For example, a wine might enhance the taste of food, or vice versa, leading to a magical experience when both complement each other perfectly. Sommeliers often consider flavor intensity, matching robust red wines with heavy meats to harmonize basic tastes.
00:17:07.320 They can also identify contrasts, such as pairing herbal wines with spicy foods, while ensuring they avoid problematic combinations. For instance, a heavy, alcoholic red may not pair well with a spicy dish, as it can intensify the heat of the spices.
00:17:39.420 While considering how to teach the bot to evaluate these pairings, I found that many rules are inherently defined. As such, I relied on my knowledge and experience combined with literature to create a pairing engine that uses weighted attributes to match food and wine.
00:18:10.740 In essence, this setup operates like a big calculator that can compute the best pairings based on the attributes associated with flavors and tastes.
00:18:32.139 Let me show you how this looks with a quick demo. This is my Facebook page with an interactive chatbot feature. First, let’s chat about wine. I’ll use a joke as an icebreaker—'Why aren’t grapes ever lonely? Because they come in bunches.' Now let’s try suggesting a wine match.
00:19:09.460 I'll input 'cassoulet,' a dish I love. The bot generates a recommendation by finding a recipe for cassoulet and analyzing the tastes associated with it. Then, it uses my pairing engine to suggest the best wine options. It presents several red wine categories that complement the dish.
00:19:40.450 You can play with this yourself at your convenience. I hope you take away from this experiment that there are many options for integrating machine learning into your Ruby stack. It's crucial to examine the problem you're trying to solve and assess the quality of the data available to you.
00:20:08.500 So, what are you waiting for? Give it a try!
00:20:24.370 Thank you.
00:20:37.860 Thank you very much. So, what is the timing?
00:20:47.890 No, no, it’s perfect. We can take some questions.
00:20:53.250 Please feel free to come up if you have questions.
00:21:08.110 First of all, thank you for the talk. I can't imagine that gathering the wine data must have been quite challenging. I have one question, though: I saw that you trained 50 separate models for each type of tasting. Did you use binary labeling, or did you do multi-level classifications?
00:21:23.000 So, it was a simpler classification—like, is it sweet or not sweet? Basically, I wanted to keep it straightforward to make it easier to troubleshoot any inconsistencies.
00:21:37.270 Yes, exactly. Just a binary classification made more sense to me since it helped identify where the model might be misclassifying.
00:22:02.220 Given that I’m making real-time predictions, I found the compute intensity manageable. I haven’t scaled to handle millions of users yet.
00:22:23.860 Thank you.
00:22:32.630 Please stay up because if anybody else wants to ask questions, we would like you to remain here.
00:22:50.920 Hello, thank you, and thanks for being brave enough to mix wine and science in France. What about Ruby and machine learning? Who is using it, and how does it look? Is it just wrappers around things that are done in other languages?
00:23:09.680 I'm not too familiar with many organizations using Ruby for machine learning, but honestly, I've noticed that people in the Ruby community aren't very engaged with it.
00:23:21.520 The reason Python is so prevalent is that they have spent over a decade optimizing their libraries and scaling up their solutions, whereas Ruby lacks this level of community support.
00:23:41.790 That said, we can catch up, but we need more community involvement. Many developers leverage cloud services for temporary compute power, especially for training purposes, instead of keeping training models running constantly.
00:24:16.600 I think for some of these machine learning services, it makes sense to utilize cloud platforms for more computing resources.
00:24:32.430 In my case, I've worked with local models that I downloaded, allowing me to perform predictions without relying on API calls.
00:24:52.560 It all depends on your solution and the specific problem you want to solve.
00:25:13.780 Thank you.
00:25:31.590 Okay, let’s take one last question.
00:25:39.320 Thanks for your talk. I’m going to use your wine solution next. My question relates to the application mainly being built on a rules-based engine, correct?
00:25:56.620 Yes, absolutely. I defined rules based on the taste attributes in the models. But many modern applications leverage statistical methods and machine learning, which might lead to higher chances of hitting the best options.
00:26:22.370 Given that Ruby doesn’t have these facilities, you’ve mentioned importing from other languages. Is there a plan to integrate those algorithms directly into your application?
00:26:38.230 Actually, it’s all native Ruby now. Initially, I used API services to handle machine learning but began to experience delays. I developed a local implementation for faster predictions.
00:27:02.310 I feel that I wouldn’t want to shift from a rules-based engine to machine learning because I need more empirical data. The taste relationships are based on the recipes rather than previous sommelier expertise.
00:27:33.120 Overall, I might experiment more with diverse models for recipe predictions, but I feel confident in where I’ve set up my bot for wine pairings.
00:28:01.150 Does that answer your question?
00:28:05.820 Yes, thank you.
00:28:08.580 Thank you very much, Mai.