From Data to Recommendations: Building an Intelligent System with Ruby on Rails

Summarized using AI

From Data to Recommendations: Building an Intelligent System with Ruby on Rails

Rashmi Nagpal • May 31, 2024 • Verona, Italy • Talk

The video "From Data to Recommendations: Building an Intelligent System with Ruby on Rails" by Rashmi Nagpal explores the construction of intelligent recommendation systems using machine learning techniques integrated within the Ruby on Rails framework. The presentation begins with foundational concepts in artificial intelligence and machine learning, highlighting how algorithms can mimic human behavior and generate data-driven insights. Namely, it covers collaboration and content-based filtering as key methodologies in building effective recommendation systems.

Key Points Discussed:

  • Introduction to AI and ML: Understanding artificial intelligence as a framework encompassing machine learning (ML) and deep learning, with examples like spam detection and handwriting recognition.
  • Building a Machine Learning Model: Demonstrating a housing price prediction model using existing data sets in ROR to showcase linear regression and machine learning principles.
  • Understanding Recommendation Systems: Detailed exploration of collaborative and content-based filtering, explaining how these methods utilize user data and preferences to enhance engagement and conversion rates.
    • Example of Collaborative Filtering: Similar taste recommendations based on user collaboration.
    • Example of Content-Based Filtering: Recommending items liked by similar users.
  • Challenges in Recommendation Systems: Discussing issues such as cold start problems, data sparsity, scalability, and the need for diverse data. Solutions include using hybrid approaches and data augmentation techniques.
  • Implementation in the Real World: Presenting real-world applications through services like Netflix and Amazon and how they leverage recommendation systems to enhance user experience and conversion rates.
  • Ethical Considerations: Emphasizing the importance of maintaining user privacy and the relevance of continual updates to the recommendation systems to adapt to changing user behaviors.

Conclusion and Takeaways:

  • Recommendation systems are integral to personalizing user experiences and optimizing product engagement.
  • Those interested in implementing such systems should focus on clean, diverse data and evolve their algorithms to stay current.
  • Engaging with existing resources and maintaining ethical standards in data handling are vital for successful deployment and user trust.

Rashmi Nagpal concludes with resources for further learning, reaffirming the importance of building intelligent systems that translate raw data into valuable insights within Ruby on Rails applications. The talk effectively combines theoretical knowledge with practical applications, delivering a comprehensive understanding of recommendation systems.

Overall, the video serves as a guide for developers and engineers looking to incorporate intelligent recommendations within their Ruby on Rails projects, highlighting the transformative potential of machine learning in web applications.

From Data to Recommendations: Building an Intelligent System with Ruby on Rails
Rashmi Nagpal • May 31, 2024 • Verona, Italy • Talk

From Data to Recommendations: Building an Intelligent System with Ruby on Rails.

Did you know that 75% of users are more likely to engage with a platform offering personalized recommendations? Well, with 1.2 million websites worldwide built on Ruby on Rails, let’s harness the power of this popular framework to implement intelligent recommendations effortlessly! This talk will explore the exciting realm of recommendation systems and how to build an intelligent system using Ruby on Rails. We will start by unraveling the fundamentals of recommendation systems, including collaborative filtering. With this foundation, we will integrate machine learning techniques into the Ruby on Rails framework. Throughout the talk, we will discuss various strategies for capturing and utilizing user preferences, improving recommendation accuracy, and continuously refining the system based on user feedback. By the end of this talk, you will have gained valuable insights into building an intelligent system that can transform raw data into valuable recommendations within a Ruby on Rails application!

Rashmi Nagpal is Machine Learning Engineer @ Patchstack.

---

rubyday 2024 is the 11th edition of the Italian Ruby conference, organized by GrUSP,
The event is international, and all sessions will be in English.
📍 Verona | 📅 May 21, 2024

Join the next edition
🔗 www.rubyday.it

---

rubyday is organized by GrUSP.
We organize events, conferences and informal meetings involving Italian and international professionals.
We aim to make the ecosystem of the Italian world of web development better both in terms of skills and opportunities by creating greater awareness through comparison and sharing.

Subscribe to our newsletter:
✉️ [www.grusp.org/en/newsletter](http://www.grusp.org/en/newsletter)

 Follow us
 Website https://www.grusp.org/en/
 LinkedIn https://www.linkedin.com/company/grusp
 Twitter https://twitter.com/grusp
 Instagram https://www.instagram.com/grusp_
 Facebook https://www.facebook.com/GrUSP

rubyday 2024

00:00:00.440 all right so welcome everyone I'm Rashmi
00:00:03.439 nakal I'm a machine learning engineer by
00:00:05.640 profession and a researcher by Passion
00:00:08.800 and today I'm really excited to share my
00:00:11.160 learnings with all of you on building
00:00:13.639 from data to recommendations and I'll be
00:00:16.199 covering all the basics to bring all of
00:00:18.039 us on the same page in machine learning
00:00:19.840 domain
00:00:37.399 so the agenda of the talk firstly is
00:00:39.320 going to be um you know first of all
00:00:42.120 okay
00:00:43.520 sorry I think uh okay for what is
00:00:46.640 artificial intelligence and you know
00:00:48.199 what are the building blocks around it
00:00:49.840 and how can we build these um
00:00:51.480 recommendation systems and further on
00:00:54.239 we'll discuss the overview of the
00:00:55.840 collaborative filtering that how it
00:00:57.840 plays the rules while we are building
00:00:59.320 the recommendation a systems and what
00:01:01.480 are the challenges SL mitigation
00:01:04.199 strategies that we can use while
00:01:05.799 whenever we are building all these kind
00:01:07.280 of techniques and tools for
00:01:08.960 recommendation system and I'll leave you
00:01:11.240 with bunch of resources towards the end
00:01:13.119 of the talk so without any further Ado
00:01:15.360 let's
00:01:16.119 begin so you know what is an artificial
00:01:18.439 intelligence it's kind of a big blog in
00:01:20.560 which you are training the algorithms
00:01:22.479 which mimx the human behavior machine
00:01:24.799 learning is a subset of AI which is you
00:01:27.680 are training or building some programs
00:01:30.159 and the model in itself is extracting
00:01:32.280 the patterns from the data set and you
00:01:34.200 you know it gives you some data D
00:01:35.680 decisions around around it so the
00:01:37.880 example over here I've given is spam
00:01:39.920 email detection and the Deep learning
00:01:42.200 it's a subset of machine learning which
00:01:44.759 implies that you extracting the patterns
00:01:47.079 from the huge chunks of the data set and
00:01:49.399 giving you know some kind of a real-time
00:01:51.520 data driven decisions surround it
00:01:53.439 handwriting recognition is an example of
00:01:55.520 the deep learning so now I explain you
00:01:58.360 the basics let me give you um demo
00:02:01.320 around how we can build some machine
00:02:02.799 learning algorithms in the ruin itself
00:02:05.360 so what our use cases going to be that
00:02:07.719 given a housing kind of a data set which
00:02:10.200 has a lot of features what I'm imply by
00:02:12.599 features is there are certain columns in
00:02:14.519 the data set which has uh you know the
00:02:17.120 sales price location a neighborhood or
00:02:20.080 what is the size of the house uh you
00:02:22.080 know whether it's what are the
00:02:23.560 dimensions of the house and you have to
00:02:25.680 predict the price of the house given on
00:02:27.800 the basis of these features so let's see
00:02:30.239 how we can do
00:02:34.080 that so I've already coded like before
00:02:37.200 so here I'm using the existing packages
00:02:39.400 or the gems as you won't see in the you
00:02:41.360 know in the ruin itself and we have
00:02:43.360 given the data set I'll show the data
00:02:44.879 set how it looks like it has like huge
00:02:47.360 um you know it's a data set which
00:02:48.760 comprised of a lot of features like the
00:02:51.560 square feet U what is the price again
00:02:54.640 neighborhood tax class at present there
00:02:57.080 are a bunch of features that we are
00:02:58.519 already being given in the data set so
00:03:01.840 how we are going to build the price
00:03:03.920 prediction model is we are using this
00:03:05.599 linear regression again it's a class or
00:03:08.000 I would say it's one of the machine
00:03:09.280 learning algorithm behind it and then
00:03:11.560 first of all we are reading the data set
00:03:13.640 then we are calling this data set like
00:03:15.959 the algorithm on top of the given data
00:03:17.760 set and we are just running it for 500
00:03:19.560 epox and once we have trained the model
00:03:22.360 we want to test the model based upon the
00:03:24.440 1,000 square ft or the 2,000 square ft
00:03:26.879 how the model will predict the price of
00:03:28.840 that particular house
00:03:30.480 okay so let's see um
00:03:41.120 yes okay now we can see in here how the
00:03:43.799 model is training it's running till 500
00:03:45.879 EO and every eoch you can see the cost
00:03:48.439 how it is chaining all around and if you
00:03:50.400 see the dimension of the house like if
00:03:52.040 it's 1,000 square ft by 2,000 square ft
00:03:54.680 we have to pay around 466k which is a
00:03:57.120 huge amount while this is the data set
00:03:59.040 which was based in California so you
00:04:00.920 know coming from that background well
00:04:02.840 I'm pretty sure that you can easily find
00:04:04.280 this kind of houses in Italy for or
00:04:06.280 maybe like a mansion in here but okay so
00:04:09.120 this is how we can train a machine
00:04:10.400 learning model and that's one of the
00:04:12.040 basic
00:04:13.239 examples now I have explained you the
00:04:15.479 basics let's
00:04:17.759 go I want to test the waters this is one
00:04:21.120 of the very fun activity I really want
00:04:22.880 to do with all of you so given these
00:04:25.479 images or the faces of the people I want
00:04:28.960 you to take a minute minute and
00:04:30.919 think think which of these faces are
00:04:33.680 real I'll just give you all a 1 minute
00:04:36.479 to think around
00:04:50.199 it okay now please raise your hand if
00:04:52.919 you think pH a is a real
00:04:58.400 one okay
00:05:00.199 interesting please raise the hand if you
00:05:02.120 think face B is a real
00:05:05.199 face excellent please rais hand if you
00:05:08.520 think face C is a real
00:05:10.479 one oh wow so 80 person within the
00:05:14.000 audience have rais hands for all the
00:05:16.039 faces none of these faces are
00:05:20.240 real okay so these people they do not
00:05:23.199 exist on the planet Earth to say the
00:05:25.280 least I'm not sure about the extr terral
00:05:27.600 planets while I'm in the astronomy I'm
00:05:29.840 not sure about the Andromeda galaxy okay
00:05:32.479 so what is working behind the hoods it's
00:05:34.600 an application of the deep generative
00:05:36.479 modeling for example chat GPT di e these
00:05:39.960 are all the examples of the deep
00:05:41.560 generative modeling so what is happening
00:05:43.840 is in an algorithm you are feeding the
00:05:45.800 data instances it is synthesizing the
00:05:48.520 brand new data instances which are not
00:05:50.919 existing in the existing data that we
00:05:52.919 are trained the model on so this is an
00:05:55.360 application of the deep generative
00:05:56.639 modeling but let's see what is working
00:05:59.160 behind hoods how the algorithm or you
00:06:02.160 know the modeling is happening and
00:06:03.560 building such kind of data instances
00:06:05.319 which look like real
00:06:08.560 ones so this is a basic neural network
00:06:11.440 which is the brain behind every AI
00:06:13.759 algorithm you can see so given a input
00:06:16.680 data set you know and then of course the
00:06:19.199 data set input in a sense that it could
00:06:20.960 be numeric it could be categorical it
00:06:23.160 could be any format then you training a
00:06:25.800 machine learning model on top of it
00:06:27.759 right now consider weight sum and by an
00:06:29.960 activation function as a machine
00:06:31.880 learning as a black box which comprises
00:06:34.400 of the three stages and then it gives
00:06:36.120 you the output output could be any use
00:06:39.400 case that you're trying to build your
00:06:40.599 model on around right so this is the
00:06:43.240 basic example of the neural network now
00:06:45.720 how it works behind the hoods let's see
00:06:47.880 through an example or application so
00:06:50.479 this is my fluffy my pet dog I just
00:06:52.880 wanted to build an image recognition on
00:06:55.720 top of my fluffy wanted to see whether
00:06:57.720 the algorithm is actually detecting
00:06:59.160 whether the this is an image of a dog
00:07:00.759 versus not so you can see that given an
00:07:03.319 image it has taken a small pixels there
00:07:05.599 will be like bunch of other pixels
00:07:07.240 converted to the numeric format and then
00:07:09.800 you are building a machine learning or
00:07:11.240 an image recognition algorithm on top of
00:07:13.319 it and it gives you an accuracy or I
00:07:15.720 would say a confidence that okay with
00:07:18.240 92% confidence this is an image of a dog
00:07:21.520 or maybe like 8% confident it's an image
00:07:23.919 of a cat right so that's how the machine
00:07:26.479 learning or the basic neural networks is
00:07:28.080 working behind the pictures
00:07:31.840 now if I want to deploy image
00:07:34.599 recognition or any machine learning use
00:07:36.720 case in the real world what how the
00:07:39.160 stage are look like you know what are
00:07:40.720 the building blocks behind it you can
00:07:42.879 see over here there are three main
00:07:44.479 pillows the first is a build stage next
00:07:47.039 is a deployment stage and the other one
00:07:49.000 is a monitoring phase so what happens in
00:07:51.440 the build stage is you have the data
00:07:53.840 ingestion then you are training the
00:07:55.680 model on top of it and understanding how
00:07:57.960 the model is going to test because you
00:07:59.639 want to retrade the process right and
00:08:01.879 once you very much satisfied with it you
00:08:03.960 go with the deployment stage package the
00:08:06.159 model just test in a you know real world
00:08:08.479 example and then you keep on monitoring
00:08:10.440 it and if the model let's say you have
00:08:12.440 trained the model back in 2019 you want
00:08:15.000 to see how the model is performing over
00:08:16.960 the years so that's why you Circle back
00:08:19.120 to the build stage and that's why you
00:08:20.400 can see an arrow which is circling back
00:08:22.080 from monitoring to the build stage so
00:08:24.240 this is how the entire pipeline of the
00:08:26.039 machine learning looks like but wait a
00:08:28.159 minute we are here to build the
00:08:30.000 recommendation systems so let's
00:08:32.760 see so what happens in the
00:08:34.680 recommendation system is let's say there
00:08:36.599 are two people who really like the pizza
00:08:38.719 and the salads then the whatever the
00:08:41.919 application or the system is going to be
00:08:43.560 they will recommend you or maybe like
00:08:45.080 having a p a piece of the Coke or maybe
00:08:47.640 like this kind of an item within the
00:08:49.600 entire entities that you have ordering
00:08:51.880 that could be a possibility basically it
00:08:54.040 improves the earning rate slash I would
00:08:56.480 say more and more you know e-commerce
00:08:58.320 websites are taking in
00:09:00.959 a very example or we can see in the
00:09:02.959 day-to-day life is a Spotify example
00:09:05.279 like right recently I'm just interested
00:09:07.120 in this U metal music so the other day I
00:09:09.519 was just listening to the PA of oseris
00:09:11.880 and then it started recommending me you
00:09:13.600 know what maybe you can like some other
00:09:15.200 piece of the metal band so that's how
00:09:17.560 the recommendation systems are working
00:09:19.440 in Netflix we all see the Netflix right
00:09:22.720 oh if I have seen a particular movie
00:09:24.600 then it will recommend you know there
00:09:25.800 are certain movies which you can see
00:09:27.440 which are very much similar to How uh
00:09:29.600 you know this movies on the basis of
00:09:31.279 your taste or your particular genre of
00:09:33.800 the movie that you want to
00:09:35.320 watch similarly the e-commerce website
00:09:38.000 for the for example the Amazon so highly
00:09:40.560 so I was just actually the other day I
00:09:42.519 was looking in for disappointing
00:09:44.120 affirmations which is one of the famous
00:09:45.680 books um how you need to unfollow your
00:09:48.040 dreams though the title itself sounds
00:09:49.880 controversial but it's highly
00:09:51.360 recommended book then you can see that
00:09:53.640 how the Amazon behind Hood started
00:09:55.640 recommending me the books which are on
00:09:57.760 similar taste as to mine
00:09:59.839 so Amazon Spotify Netflix all these kind
00:10:04.040 of entertainment music Industries they
00:10:06.160 are always you know emphasizing these or
00:10:09.120 working behind the hoods as the
00:10:10.360 recommendation
00:10:12.399 system and did you know that on an
00:10:14.800 average an intelligent recommendation
00:10:17.040 system delivers around 23% approximately
00:10:19.959 kind of a conversion rates for the web
00:10:22.160 products and it was being reported by
00:10:23.959 the Nvidia
00:10:25.320 itself but you know we have seen so many
00:10:28.160 use cases of the recommend Commendation
00:10:29.800 system but what is the rational or the
00:10:32.200 logic behind how it works so a very high
00:10:36.279 level I'll give you is on the basis of
00:10:38.680 the user preferences whether it could be
00:10:40.639 the explicit or an implicit for example
00:10:43.480 you must have seen a small um you know
00:10:45.800 the drop down box which will was this
00:10:48.120 previous video favorable was it
00:10:50.040 interested to you so that's how the all
00:10:52.399 these you know websites are collecting
00:10:54.440 users data and then they are
00:10:56.079 recommending on top of it so you pising
00:10:59.040 in the user preferences and then in the
00:11:01.560 recommendation system in itself it is
00:11:03.680 providing you the recommendations or
00:11:05.360 whatever your preferences could be and
00:11:07.560 there are like two main categories of it
00:11:09.720 the first is a collaborative filtering
00:11:11.480 the other is a Content based filtering
00:11:13.360 I'll give the demo of both of these
00:11:14.920 techniques but first of all we need to
00:11:16.839 understand what is the collaborative
00:11:18.079 filtering and what is the content based
00:11:19.519 filtering so in the collaborative
00:11:21.480 filtering is let's say me and my brother
00:11:23.680 we actually watch a particular movie
00:11:25.480 then it is going to recommend on the
00:11:27.079 basis of what our taste parts are going
00:11:28.639 to be like in a way it is collecting
00:11:30.639 like in a collaboration of the two users
00:11:33.240 whatever the two users have watched a
00:11:35.120 movie maybe they're recommending on the
00:11:36.600 similar stat on that and what happens in
00:11:39.240 the content based filtering is you know
00:11:41.560 let's say a particular user has seen a
00:11:43.880 particular movie then it is actually
00:11:47.160 checking on the basis of oh why this
00:11:49.000 movie this particular user has taken and
00:11:51.440 then it will recommend on the basis of
00:11:53.079 the previous history of the movies that
00:11:54.760 this particular user has seen so that is
00:11:56.959 how the content based filtering is
00:11:58.480 coming in and that it might be
00:12:00.360 recommending to some other person as
00:12:03.920 well Netflix actually uses both of these
00:12:06.880 actually they use like the hybrid
00:12:08.200 algorithm behind the hoods whenever we
00:12:09.920 are there recommending any kind of uh
00:12:11.760 you know the movie or the shows or the
00:12:13.120 dramas or whatever we really wanted to
00:12:14.760 see so let's see first um okay so the
00:12:18.639 recommendation systems how they work
00:12:20.240 behind the hood some mathematics is
00:12:21.880 there but I'll explain on a very high
00:12:23.639 level so that becomes intuitive for all
00:12:25.519 of us to grasp and understand so giving
00:12:28.440 the users and the items you can see
00:12:30.680 let's say an item could be a movie right
00:12:32.959 there are like three users Ted Carol and
00:12:35.440 Bob and all of the um you can see the
00:12:38.079 Ted Carol and Bob they all like the
00:12:39.839 movie B I'm just giving an example of
00:12:42.040 the items as the movies right now which
00:12:44.120 could be anything maybe like it could be
00:12:45.680 a food item or it could be anything else
00:12:48.760 so you want to suggest what whether the
00:12:51.680 movie C is being liked by the Bob versus
00:12:54.160 no so that's why I use like Le squares
00:12:56.800 algorithm which is also kind of for
00:12:58.399 defining the differences between you
00:13:00.839 know the cosine similarity um given the
00:13:03.760 two data sets or the text vectors you
00:13:05.920 find the distance and the less at the
00:13:07.519 distance that implies that the more
00:13:08.720 favorable the person is going to like a
00:13:10.279 particular thing so Bob will highly
00:13:12.920 likely love the movie C so that's how it
00:13:16.040 is happening behind the scenes given an
00:13:17.920 itemm and the user so you do the matrix
00:13:20.519 multiplication so this alternating link
00:13:22.800 Square algorithm uh algorithm is working
00:13:25.600 behind hoods basically it's just doing
00:13:27.560 the matrix multiplication of the users
00:13:29.519 and the items and then it fills out this
00:13:31.399 kind of a matrix which is a sparse at
00:13:34.079 the moment so that's how you know um
00:13:37.360 this matrix multiplication or
00:13:38.639 recommendation system works so let's see
00:13:40.880 first an example here I've given like
00:13:43.199 multiple users and all of these users
00:13:45.639 are given a certain R uh rating and
00:13:48.720 those ratings are from the you know 1 to
00:13:50.800 five and there are like six items around
00:13:53.240 it you can see this is a sparse Matrix
00:13:56.199 user one likes the item five and the
00:13:59.120 user one again likes the item two or
00:14:01.480 three but we don't know what whether the
00:14:03.480 user one likes the item four five or six
00:14:06.079 so we need to fill a lot of this
00:14:07.600 information so let's see how we can
00:14:14.720 do so this is what the data that I have
00:14:18.639 and that's what the table I showed in
00:14:20.560 the slide that's is what our goal is you
00:14:23.079 need to understand uh you know for a
00:14:25.120 particular user what are the
00:14:26.480 recommendation of the fun for the user
00:14:28.720 one and what it particularly likes so
00:14:31.480 let's see what what I'm doing right now
00:14:34.040 is I'm just calculating the cosine
00:14:35.600 similarities and then on top of it I'm
00:14:37.600 finding the similar users users who
00:14:39.880 particularly like maybe the user one has
00:14:41.600 a similar taste as a user two and on
00:14:44.240 basis of it it will do the
00:14:45.440 recommendation so it's working the ALS
00:14:47.600 algorithm behind the
00:14:55.800 scenes so you can see for the user one
00:14:58.600 there are like the two items it has
00:15:00.360 highlighted the user which is the item
00:15:02.839 four and five and with a certain value
00:15:04.959 or a score you can see and the scores
00:15:07.000 could range from 1 to five I've not
00:15:09.079 normalized it but we can also do the
00:15:10.680 normalization around it and if you want
00:15:12.880 to see for which particular user let's
00:15:16.839 say Okay so this user one has been
00:15:19.959 mapped to the different other users like
00:15:22.079 the other users which were given as 2 3
00:15:24.639 and four similarly if you want to find
00:15:26.759 for the other users we can definitely
00:15:28.319 change you know
00:15:29.600 we are just again passing in the
00:15:30.959 recommend items a different user all
00:15:32.920 together and we can see how the model is
00:15:34.639 going to interpret and give user
00:15:36.600 corresponding um recommendation for this
00:15:40.199 okay so next example I wanted to show
00:15:42.519 you for is the book's recommendation
00:15:45.279 given a book name can you recommend the
00:15:47.519 similar books on the basis of the genre
00:15:49.680 or maybe like the you know the style
00:15:51.240 around it so let's do
00:15:57.160 that so I have already some data for
00:16:00.079 this book recommendation and I am
00:16:02.120 actually passing in okay this is the the
00:16:03.959 universe I'm much more into the
00:16:05.399 astronomy so I will be giving like so
00:16:07.120 many examples around it so I like this
00:16:09.639 Mo uh book which is the universe and
00:16:11.759 let's see whether our algorithm is able
00:16:13.560 to give and suggest recommendations
00:16:15.600 around the similar book
00:16:31.639 okay so you can see that the books which
00:16:33.440 has given um you know the recommendation
00:16:35.720 model has given the Brief History of
00:16:37.360 Time why the universe is way it is a man
00:16:40.399 on the moon and the history of the space
00:16:42.360 exploration so you can see that the
00:16:44.120 similar Taste of the book has been given
00:16:45.839 in the model by itself and what is the
00:16:47.600 corresponding recommendation that it has
00:16:49.360 given right so that's how we can build
00:16:52.560 uh so the first example was of the
00:16:54.680 collaborative this one is an example of
00:16:56.759 the content based because I like a
00:16:58.399 particular content and therefore it has
00:17:00.319 given me an example you know okay maybe
00:17:02.680 the universe is the book that you like
00:17:04.160 the most and therefore you are you would
00:17:06.439 like these other possible books which is
00:17:08.240 the space triology or maybe the Space
00:17:10.400 Odyssey these kind of different books
00:17:12.120 around it so let's come back to our
00:17:15.799 slides there are like bunch of
00:17:17.880 challenges whenever we are building the
00:17:19.520 recommendation system the first is the
00:17:21.319 cold start problem so actually Netflix
00:17:23.640 also fa this problem when they initially
00:17:25.360 started out because they were like no
00:17:27.039 users no history you don't know whether
00:17:29.320 particular user will like this kind of a
00:17:30.799 movie versus not so that is when the
00:17:33.320 e-commerce website started facing an
00:17:35.039 issue of the uh cold start problem that
00:17:37.120 is a Content that there is no existing
00:17:39.840 content or the history of the user
00:17:42.240 therefore the model needs to have it in
00:17:44.080 order to predict what could be the
00:17:45.600 future possibilities and the next is a
00:17:48.039 sparsity so there are of course a lot of
00:17:50.440 algorithms which helps in dealing with
00:17:52.440 the sparcity that when there is less
00:17:54.720 number of items in the data in itself
00:17:57.000 how we can actually provide the readings
00:17:59.280 UNS scalability is also an issue you
00:18:01.320 know when you need like a lot of
00:18:02.799 resources intensive resources because
00:18:04.720 you want to train deep neural networks
00:18:06.480 or you know all these kind of large
00:18:08.360 language models so scalability is also
00:18:10.720 one such potential reason that you know
00:18:12.640 what are the CH challenges in terms of
00:18:14.400 the recommendation system diversity is
00:18:16.720 equally important so what I meant by
00:18:18.440 diversity is diversity within the data
00:18:20.840 the more the diverse data is the better
00:18:22.880 the recommendation model is going to
00:18:24.440 perform if the data is you know skewed
00:18:27.000 or the distribution is not fair of
00:18:28.919 course the recommendation model is not
00:18:30.400 going to give the better prediction
00:18:32.520 values as we say garbage in garbage out
00:18:35.679 right so if you're putting the garbage
00:18:37.480 or the trash of the data inside your
00:18:39.440 recommendation model therefore the model
00:18:41.440 is going to give the trash kind of a
00:18:42.919 values so definitely you need like the
00:18:44.960 good data which is clean out all the
00:18:47.440 removing of the outliers so let's see
00:18:49.360 the ways how we can mitigate all these
00:18:51.600 challenges in the recommendation system
00:18:54.400 again just circling back to what I was
00:18:56.000 is saying before you need to have a good
00:18:57.919 quality of data we don't want the data
00:19:00.840 which is skewed or highly skewed towards
00:19:02.600 one particular class so that the model
00:19:04.600 is not going to learn and predict so
00:19:06.559 that's why we need to remove and
00:19:08.039 pre-process the data remove the outliers
00:19:10.159 in the data and in order to tune our
00:19:12.760 hyper parameters around it using the
00:19:15.919 hybrid approaches not just combining you
00:19:18.240 know the collabor not just having the
00:19:19.840 collaborative or the content but just
00:19:21.559 the hybrid or embling of the modeling
00:19:23.799 per se and you can use a lot of data
00:19:26.919 augmentation techniques or feature
00:19:29.159 engineering the model selection also
00:19:31.360 plays like a really important role which
00:19:33.720 is kind of a brainchild how your
00:19:35.600 recommendation system is going to work
00:19:37.320 behind the hoods so model selection
00:19:39.559 plays a role the next is the Privacy
00:19:41.559 preserving techniques sometimes user
00:19:43.720 would be hesitant in order to share the
00:19:45.720 pii personally identified information
00:19:48.120 right so you can use the Privacy
00:19:49.960 preserving techniques Federated learning
00:19:51.919 is one such technique differential
00:19:53.440 privacy these are different different
00:19:54.720 techniques that you can use whenever you
00:19:56.880 are building all these kind of recommend
00:19:58.679 mendation
00:20:00.559 systems in nutshell what is the
00:20:03.200 importance you know what is the beh so
00:20:05.960 what is um the rational of using the
00:20:08.320 recommendation system and why we need it
00:20:10.640 first of all is we want to Leverage The
00:20:13.080 datadriven insights in order to make the
00:20:15.520 better decisions whenever we are
00:20:17.080 building all these recommendation
00:20:18.520 systems and we want to optimize the
00:20:21.000 resource allocation and personalize user
00:20:23.960 experiences the seamless experience that
00:20:25.880 we face on the Netflix is all because we
00:20:28.400 want these intelligent systems to use
00:20:30.400 our data in order to make the good data
00:20:32.480 driven decisions around it and we also
00:20:34.919 need like the continuous innovation for
00:20:37.480 example I was uh the other day I was
00:20:39.559 having a chat with one of the speakers
00:20:41.640 and he suggested uh you know you need to
00:20:44.000 always enhance the existing code that
00:20:45.840 you have because that's how you optimize
00:20:48.600 the model you can't have the model which
00:20:50.200 is stealed or trained on a data set
00:20:52.480 which is like years ago because that
00:20:54.400 model is not going to learn in the next
00:20:56.840 years right in the future so so always
00:20:59.159 innovate look for the ways in so that
00:21:01.600 you can innovate in the recommendation
00:21:03.200 systems it could be enhancing the
00:21:04.799 existing code base or just optimizing or
00:21:07.200 using some of the techniques which are
00:21:09.280 like everyday releasing out especially
00:21:11.360 in the llm space and we also need to
00:21:13.840 integrate the ethical consideration and
00:21:15.880 privacy preserving techniques which have
00:21:17.640 mentioned before because we want to
00:21:19.640 protect the user's data and we also want
00:21:21.880 to ensure that we are responsibly using
00:21:24.880 the recommendation
00:21:26.720 systems these are like bunch of
00:21:28.799 resources which I will definitely uh
00:21:30.799 vouch out for the first is a Blog which
00:21:33.240 is a really good Blog the next is the
00:21:35.559 Deep learning course in order to
00:21:37.080 understand the basics what is deep
00:21:38.440 learning and you know how you can use
00:21:39.880 acate and it's like open course whereare
00:21:42.080 so you can definitely use it the
00:21:43.960 articles is also like code with the Json
00:21:46.080 is one of my favorite person so you
00:21:47.679 definitely should read uh those articles
00:21:50.640 thank you so much
00:21:59.039 oh
Explore all talks recorded at rubyday 2024
+1