00:00:18.279
All right, people! Is it cool if I go ahead and kick this off right on time? Everybody okay with that? Looks like we have a lot of seats filled. Thank you all so much for coming down.
00:00:23.480
I'm John, and I'm going to talk about machine learning. I know some of you might be at Sy's talk right now, so I appreciate you coming to this instead.
00:00:29.599
Sandy will be great on video. She's a wonderful person, and I know that I have to deliver at least as much value as you would have gotten out of Sy's talk, so you've set a pretty high bar for me. I appreciate it, and I hope I won't let you all down.
00:00:42.239
So what's our goal today? Ideally, I'd like one takeaway, but three takeaways would be great as well. One takeaway is even better! I want to use Ruby to answer questions about your users and your business. That’s my goal, and we're going to employ machine learning to achieve it.
00:00:58.600
If there are some chairs down here, feel free to grab them and scoot them around somewhere. This room is kind of arranged a bit funky.
00:01:13.119
I have a question for all of you, and this is going to be interactive for a bit. How many people have a users table in their Rails app?
00:01:18.400
Okay, here's a better question: How many people do not have a users table? Alright, yeah. I'm just curious, what's the primary object in your app instead of users? You said assets? That one makes sense.
00:01:33.520
So, we are talking about machine learning for fun and profit, and yes, you know, some things like that. You'll probably find the same techniques apply, but almost everybody has a users table, which is what started this discussion.
00:01:39.799
Now, what is the goal of your businesses? Anyone? Just shout it out. What is the real goal of your business? Make money! Thank you! So, we've got users and we've got profit. Who here has a plan for making money from their users? Raise your hand if that's you. Alright, you're first!
00:01:58.079
I’m going to put you on the spot. What’s your plan? How do you turn users into profit? Oh, you give loans to users, and then they pay them off? Awesome! I completely understand that business model. That is fantastic!
00:02:19.800
Does anyone work for a social network-type company in the attention economy? Yes? Okay, definitely!
00:02:31.640
I’ve done that too, and that frames the story for me. We’re probably all familiar with the concept that we have users and we want to generate profit. Everyone knows about the underwear gnomes, our friends the underwear gnomes.
00:02:43.560
There’s that hilarious part where the gnomes explain to the South Park boys that step two is a big question mark after they collect the underpants, and from that they will derive profit.
00:03:01.680
It's really strange to speculate on what types of business models you could create by collecting underpants and using machine learning on those underpants to generate profit. But we're not going to delve into that today.
00:03:14.480
Instead, we are going to figure out how to fill in that gap, how to fill in that question mark with the information that's in your users table right now that you can use to turn into money, or hopefully some form of money.
00:03:20.239
We’re going to employ a specific set of tools, most of which you’re likely already familiar with, as we discussed earlier regarding the user table.
00:03:31.480
I'm a big fan of science—I was a chemist in another life, so I appreciate scientific principles—but science can lead you down a bad path. I want to ensure that when we're thinking about the data science we’re undertaking today, we think less about the crazy, like this guy from Back to the Future, and more like kickass science.
00:03:45.400
Neil deGrasse Tyson is one of my favorite examples of how science can be approached.
00:03:52.079
So, we're going to utilize our users table to determine how to make a profit with data science, and we're going to endeavor to do so with a mindset that is more kickass like Neil deGrasse Tyson than crazy.
00:04:09.760
Quickly, the obligatory introduction: I'm John Paul Ahenfelter. I work here at Treehouse. I asked earlier how many Treehouse fans are here, and a few hands went up.
00:04:27.919
Before that, I was at General Assembly, so I've covered two of the big names in education. My next stop might be Dev Bootcamp so I can keep collecting education companies.
00:04:36.880
I've got Treehouse stickers for anyone who wants them up here because we do have some pretty cool branding. You can come grab Mike the Frog or check out our new boat design. I really have no clue what the boat is for Treehouse, but it’s wonderful!
00:04:57.160
More importantly, why should you care about me in data science? I’ve been doing this for a long time.
00:05:08.199
This is a snapshot from 2006 when I started the data warehousing track at the MySQL conference, and I've since taught it extensively at O'Reilly's Open Source Convention.
00:05:20.240
We were discussing big databases that were in the 10 to 100 gigabyte range. That was considered huge at the time—difficult to store such large amounts of data.
00:05:32.759
So, who here has a database larger than 100 gigabytes? Just curious. Quite a few of you. How many are over a terabyte?
00:05:44.120
We even have Facebook here with their exabyte data, although I don't think any Facebook folks are present since they're all into PHP.
00:05:56.680
Data has changed significantly, and so have the tools we use to handle it.
00:06:02.600
I started working with neural networks back in grad school—and even before that, back in undergrad I was doing this with Visual Basic on MS-DOS and had to buy a math co-processor for my computer.
00:06:20.800
Running numerical simulations back then used to take hours—sometimes even days! Thankfully, a lot has changed since then. So, I've been at this for quite a while.
00:06:37.840
At the same time I started my research project, Inc. magazine published a cover story that was far more interesting than what I was working on.
00:06:49.500
Our format today is going to use problem-solving with some data. We'll apply code to get some results, allowing us to learn a bit about our users and subsequently how to generate revenue.
00:07:06.840
My session is titled "Machine Learning for Fun and Profit." Recently, I’ve been reflecting on how this should be framed more as storytelling—storytelling about your users.
00:07:23.400
I believe that stories are a much more powerful metaphor. Let's start with simple stories, just like the ones you share around the campfire—stories that make people happy, stories that teach.
00:07:36.720
So, let me ask: Who here actually knows their users? How many of you really work with your users table? Perhaps you’re in marketing or business development? Any of you really feel like you know who your users are? It’s often challenging.
00:07:53.760
I bet all of you know about your users in some broad strokes. How many people are familiar with analyzing users through tools like Google Analytics or Mixpanel?
00:08:09.280
These methods tend to portray users as homogeneous entities, and you aggregate that data. A standard Google Analytics dashboard only provides surface-level insight.
00:08:26.079
Aggregates can tell you about your average user, and we all know that nobody dreams of being the average user. We should strive for something more engaging.
00:08:39.200
People want to feel special. We need to tell better stories. Aggregates can be boring, and the SQL database administrators of the past often had a tough time—the same goes for those dealing with reports.
00:08:56.919
Aggregates can still tell interesting stories, especially when we examine changes over time—seeing your user growth, engagement patterns, or revenue generation.
00:09:12.239
The context can make all the difference. We want to discover important aspects of your users that tell a compelling narrative.
00:09:26.320
I was thinking about the users in my database who spent good money at my company. I wondered how many of them are female. This thought bridged me toward storytelling in a meaningful way.
00:09:40.199
No one does storytelling quite like This American Life. They have a unique structure that captivates an audience.
00:09:47.919
They masterfully weave individual stories into a larger, meaningful narrative that takes you on an emotional journey.
00:10:02.760
Different methods of storytelling exist online, with headlines that bring you in. You may see phrases like "Seven Unbelievable Facts About Your Users—Click Here for More!" This is all part of how people want to receive data.
00:10:17.240
Understanding your users involves delving into their qualities. How do you find out more about your users? If you wanted to know about the male-female distribution, how would you discover that?
00:10:31.560
For instance, how many of you collect gender information at registration? Not many, right? What’s the traditional way to figure this out? Surveys! But what are the percentages typically?
00:10:46.079
Survey response rates are generally very low. You may think you have a representative sample when you don’t, which can lead to statistical insignificance.
00:11:00.920
Wouldn't it be better to have more confidence and better knowledge about your users? Descriptive data can help you segment your users into different groups.
00:11:14.240
You can use lookup tables to do this, which we'll discuss shortly. You can also perform name analysis. Most of these methods are quick to execute and yield better results.
00:11:27.599
If I told you I could provide you with 80% accuracy on male and female categorization based on the first name alone, who would think that's worse than a survey? I believe it’s at least as good, likely better.
00:11:39.640
Today, we're going to explore a couple of examples together that can be done without any complex gems or advanced linear algebra.
00:11:54.640
One tool we're going to look at is the 'sex machine' gem, which uses data to assign gender based on names. So let’s run through the code and see how it performs.
00:12:12.440
If you need to install the gems, feel free to follow along. I'll give an explanation of what we're doing and then we can test this out together.
00:12:28.679
We'll start by selecting all users by first name and then analyze it using the sex machine gem, allowing us to see how accurate it is for various names.
00:12:43.360
How many of you have the gem installed so far? Alright, there should be someone near you who can help. Let’s take a minute, and while people are setting that up, let’s check some names.
00:13:02.559
For example, what is the assigned gender for names like Cedar or Justice? For those with unusual names, these results can be quite fascinating!
00:13:17.360
Next, we’ll dive into how to assign gender to users based on the information in our database.
00:13:30.040
This ongoing story affirms how we can utilize data and machine learning to generate insights from our user data effectively.
00:13:47.680
If you have access to your users with relevant attributes, I want you to experiment with your local machine data to see how effective this can be.
00:14:02.960
You can run user analysis against your data rather than relying only on what I provided as sample data.
00:14:14.320
Let’s take five minutes to experiment and then we'll reconvene. I'd like you to try checking the gender assignments for various names and see what results you obtain.
00:14:43.960
If you’d be willing, please share your findings. For instance, I often use ‘John Paul’ for my tests and I’m curious to see how it plays out with the gender assignment.
00:15:00.920
Let’s see how the results pan out. Also, if anyone has names that are commonly gender-neutral, we should check those too.
00:15:17.440
When you run this, I'd love to hear your reactions about the outcomes, especially if they contradict your expectations.
00:15:29.000
Regarding the data we have, I can provide insight into the challenges of accurately determining gender from names, particularly in diverse datasets.
00:15:41.280
Now, let's delve into how this gender assignment can translate to user insights and how we can apply them practically.
00:15:56.080
Using a served database, we can easily track user profiles and utilize this information to improve engagement across our audience.
00:16:09.440
Continuing, I want to explore the importance of geolocation services, especially through IP address assignments now.
00:16:27.720
Apps collecting such data can vastly improve tailored experiences, particularly for customer support based on their location.
00:16:41.520
So, how many of you currently utilize geolocation tools for user interaction or behavior tracking? Many platforms today leverage such functionality for additional insights.
00:17:02.000
For context, we at Treehouse have seen how geolocation informs our user engagement strategies.
00:17:16.000
Would it surprise you to learn about how useful this information can be for planning our support efforts?
00:17:29.600
This data aids in staffing strategies around peak usage times based on user locations, facilitating better customer experiences.
00:17:46.240
Next, let’s dive into the code that utilizes geolocation services via free APIs that help track user geolocations effectively.
00:18:02.560
It's essential to understand the context of our user data. We’ve got users always pining for reliable support, and understanding geographical spread allows us to handle that better.
00:18:18.560
Alright, now let's code our two cases: assigning gender to users and collecting geographic data through APIs.
00:18:35.560
Remember, a lot of users in our tables can indeed look like aggregates, but we need to segment and understand those users better.
00:18:51.560
We’ve got about an hour left to dive into the remaining examples.
00:19:06.560
Now we are going to explore user segmentation through clustering. Clustering can provide insights into user behavior.
00:19:23.000
For context, K-means clustering allows you to categorize users into a fixed number of segments based on similarities among their attributes.
00:19:40.600
Let’s run a quick code demo on clustering algorithms to provide you with visual data representation, which can enhance how we understand end-users.
00:19:56.560
This is where we can dive into more meaningful insights about user activity—this is where the magic of machine learning comes into play.
00:20:09.680
To this end, the clustering we deploy will allow us to segment users better than using broad strokes.
00:20:28.360
By utilizing the information we have, we can categorize these users into casual, professional, or super-users. This deep divide can help optimize our service delivery.
00:20:45.240
The next portion of our workshop will tackle implementing clustering using Ruby's AI libraries.
00:21:05.320
Let’s see how that performs as we categorize various action metrics against clustered user trends.
00:21:20.640
Now that we have a solid understanding of data transformations, it’s time to look at the coding side of comfortable clustering.
00:21:32.920
This approach can truly bring results to the business's bottom line by harnessing user data effectively.
00:21:50.360
If you haven’t yet, please validate your understanding of these analytical tools and how they engage with your dataset.
00:22:09.840
Next, we’ll examine collaborative filtering, a popular method for personalized recommendations in real-world applications.
00:22:26.160
Collaborative filtering takes user-item interactions to recommend items based on what similar users have liked.
00:22:46.480
This methodology has been adopted by many tech giants in various forms. Next, let’s showcase how it's commonly executed.
00:23:04.440
What we aim to do is summarize the data using Singular Value Decomposition (SVD), a key technique for reducing dimensions in user-item matrices.
00:23:20.840
With SVD, we aim to deal with large, sparse matrices effectively, allowing us to analyze user interactions better.
00:23:38.240
Through the implementation of algorithms, you can find more tailored recommendations for users, enhancing their overall experience.
00:23:51.600
Now that we've outlined these advanced concepts in this workshop, let's run through some examples of how to practically apply them.
00:24:06.440
You can begin exploring datasets relevant to your applications to bolster your engagement efforts.
00:24:21.840
As we round out our session, the goal here was to illustrate how Ruby can be practically employed to derive actionable insights.
00:24:34.080
Let’s take a moment to delve into the tools specifically, which can supercharge what you bring back to your teams after this workshop.
00:24:48.280
We’ve discussed various levels of analysis capturing user interactions, recommendation algorithms, and clustering for segmentation.
00:25:02.320
So, moving on, the underlying message here is that there are many resources at your disposal to deepen your understanding of data science.
00:25:17.000
For those of you eager to learn more, consider diving into O’Reilly resources, especially those relevant to your technical skills.
00:25:33.560
And I highly encourage you to explore additional materials on machine learning and data science that might pertain to your needs.
00:25:52.360
The breadth of learning opportunities available is extensive, so take advantage of this by committing to a tailored resource that fits your learning style.
00:26:10.720
To conclude our workshop, I'd like to thank you all for your engagement, and I’d love to open the floor for any questions!
00:26:22.080
Feel free to reach out to me on social platforms or here after the session.
00:26:36.560
I genuinely appreciate the time you've given to this, and I hope you find immense value in applying these concepts!
00:26:50.320
Thank you so much for attending, and let’s all strive to turn our user data into insights that positively impact our businesses.