Building LLM powered Applications

Andrei Bondarev

Play on YouTube

Building LLM powered Applications

Andrei Bondarev • December 06, 2023 • Montréal, Canada • Talk

In the talk titled Building LLM powered Applications, presented at the Montreal.rb Meetup in December 2023, Andrei Bondarev, a Solutions Architect and Owner at Source Labs LLC, discusses the impact and integration of Large Language Models (LLMs) into software development, particularly using the Ruby programming language. The presentation highlights the rapid advancements in Generative AI throughout 2023 and its implications for building AI-powered applications. Key points include:

Overall, Bondarev’s talk makes a compelling case for leveraging LLMs in modern Ruby applications while addressing existing challenges and encouraging community involvement in these emerging technologies.

Building LLM powered Applications
Andrei Bondarev • December 06, 2023 • Montréal, Canada • Talk

Montreal.rb Ruby Talk 2023/12 - Building LLM powered Applications - Andrei Bondarev - Solutions Architect / Owner at Source Labs LLC

The 2023 breakthroughs in Generative AI (Artificial Intelligence) have been taking the software development world by storm. We'll take a look at a few components of what is quickly becoming the modern stack for building LLM (Large Language Model) powered applications. Andrei will build a case for Ruby in the emerging AI trend, and show how some of the AI capabilities can be leveraged today!

Montreal.rb Meetup December 2023

00:00:01.960 um I've been running a software development firm we're a small company C

00:00:07.200 Source labs for the last six years and we focus on building custom AI power

00:00:12.960 Technology Solutions and depending on the project or the or the the client size I play

00:00:20.640 either uh a role of an architect engineering manager or a fractional CTO

00:00:26.679 if we work with kind of smaller Venture back company so J of AI hype and hysteria so this

00:00:35.320 year has been taken over by a lot of hype in the AI field uh with the release

00:00:41.559 of Chad GPT companies are actively kind of trying to develop their

00:00:47.800 capabilities in this field and add llm back features to their products there's a massive fries and gen based

00:00:56.120 startups and I think there's some a little bit of this happening where

00:01:03.320 there's builders on the left hand side um and everyone is super excited about

00:01:08.439 all these different Technologies uh about the the Bots the AI agents AGI and chat GPT and all the

00:01:19.240 uh different Vector databases that are popping up and and then and then there's kind of real business problems and real

00:01:27.000 customers on the on the right hand side that have actual problems and and the two are still a little bit kind of

00:01:33.759 disconnected so unless we want to have a our own uh and other blockchain moment I

00:01:39.159 think we need to address this so I found this interesting McKenzie report that

00:01:45.880 I'll have a link in the back uh towards the end of the slides that uh they're estimating that the impact the

00:01:52.360 generative ai's impact on productivity could be equivalent to about 2.5 to 4

00:01:59.280 point uh four trillion dollars annually in value to

00:02:04.759 the global economy and one of the examples how generative AI could drive a lot of value

00:02:11.200 is by uh companies creating their own kind of internal Knowledge Management System so it's estimated that knowledge

00:02:18.800 workers spend about a fif or time or one day every Work Week searching for and

00:02:24.239 gathering information and this virtual AI expert e expert would have access to

00:02:31.040 all of the corporate data and the rest of Ip and the human can just have a conversation in natural language and and

00:02:37.800 fetch the data they need much more effectively so some of the industries

00:02:43.680 that could be transformed um and and stand to gain a

00:02:49.239 lot of improvement and and value from generative AI or customer operations

00:02:54.800 customer service so with SE self-service you could build a chat Bots uh that

00:03:01.920 offer a high personalization and is capable of resolving different complex

00:03:08.440 inquiries and uh you could also introduce customer agent augmentation where a customer

00:03:16.519 service agent would be served relevant response suggestions and customer data

00:03:21.720 and call transcripts kind of in real time uh when it comes to marketing and

00:03:27.879 sales uh it would it could yield a lot of value it could bring a lot of value

00:03:33.280 generative AI could bring a lot of value to a strategy so Gathering Market TR uh

00:03:38.400 analytics drafting up marketing and sales coms it would help with awareness

00:03:44.840 of the brand and conversion and and retention by bringing more human-like

00:03:51.560 product experiences um it's already changing

00:03:56.720 product development so when it comes to a planning phase so generative AI could

00:04:02.760 help with the gathering and analyzing usage analytics and trans to produce

00:04:08.400 product requirements and it's also already helping with the full kind of software development life life cycle so

00:04:15.760 whether it's coding testing or iterating and I think there's also a larger shift

00:04:20.880 at play that is um hasn't fully been

00:04:26.000 realized yet is that I think llm will be um uh a crucial part of every software

00:04:33.479 stack in the future so the large language models are artificial neural networks with kind of

00:04:40.199 general purpose language understanding and generation the exploding popularity after this attention is all you need

00:04:47.080 2017 paper and they excel at a lot of different tasks so they're really good

00:04:55.039 at converting unstructured to structured data so as Developers we kind of spend a

00:05:00.360 lot of time um turning unstructured data to structured data we love to program when

00:05:07.840 we have access to structured data and and we hate when we have to deal with unstructured data so the large language

00:05:15.320 models can already kind of serve as as that bridge the llms are also really good at

00:05:21.880 summarization so you could give it a document and and have it produce a summary or a block post or a book for

00:05:29.919 that matter um they're really good at classification as well so if you need to

00:05:34.960 classify a list of blog articles for example blog post uh whether it should

00:05:41.000 be bucketed as technology sports or business by topic then llms are really

00:05:47.080 good at that text generation as well and I I I definitely need a much better

00:05:52.600 graphic here and there's still a lot of problems so ver with with building llm

00:05:59.560 applications and working with LMS so there's obviously hin naations and I'm sure everyone here has has heard about

00:06:07.440 this when H H naations are when models are generating incorrect or nons

00:06:14.520 nonsensical text and the llms also aren't

00:06:21.800 continuously trained so they have data up to a certain point so for example GPT

00:06:27.199 4 was trained on data up through April 2023 so any world events uh afterwards

00:06:35.520 it just does not know about this and also in some instances the

00:06:42.080 relevant knowledge is not being leveraged so if you're like I've mentioned an example of building a

00:06:48.800 proprietary Knowledge Management System well gb4 is not just going to have

00:06:54.080 knowledge about your uh all all of your internal corporate documents and to to to address this

00:07:00.560 we'll we'll talk about Rag and what retrieval augmented generation is what a

00:07:06.520 retrieval augmented generation system is so this technique was first kind of

00:07:13.840 exposed in in this 20120 paper called retrieval augmented generation for knowledge intensive uh natural language

00:07:21.520 processing tasks so it's a technique for enhancing the accuracy and reliability

00:07:27.319 of generative AI models with facts fetched from external sources and the

00:07:33.400 workflow goes something like the following so you I'll I'll kind of drive your

00:07:40.440 attention to to the bottom here first that you'll you'll see that a user is

00:07:45.720 interfacing with a system and they would pose a question and ask how do I do X

00:07:53.800 and uh within the rack system we would then take this user question we would

00:08:00.599 generate Vector embeddings from that question we would go to the knowledge base typically a vector search

00:08:07.479 database we would find relevant documents by running a similarity search

00:08:12.680 then we would extract that knowledge and then put it in the prompt

00:08:17.759 that is then sent to the llm and then have the llm kind of generate or

00:08:23.000 synthesize an answer and then the llm would answer in natural language and say to do this you need to do this this this

00:08:30.000 this according to these sources so in order to kind

00:08:37.320 of understand this concept a little bit better we'll we'll jump into what Vector embeddings are what similarity search is

00:08:44.320 and uh what a rack prompt looks like so what are vector embeddings so Vector

00:08:51.080 embeddings are it's a machine learning technique to represent data in an

00:08:57.440 n-dimensional space so uh the it's an array of float numbers

00:09:04.519 of length n right so array of size n and

00:09:11.240 the llms generate these embeddings and they assign these float

00:09:17.480 numbers put it all in in one big array of couple hundred through a couple thousand of size 100 or thousand plus

00:09:27.279 and they by doing that they encode the meaning in an embedding space or

00:09:32.920 also called the latent space so for example if we take a phrase

00:09:40.839 this is the holiday season and put it through an embedding model then we will get the following um um Vector

00:09:50.519 embedding and I'll drive your attention here to The Graphic on the right hand

00:09:56.519 side so you can kind of see so so mentioned this this array

00:10:01.680 contains uh hundreds or or thousands of float numbers right so it's impossible

00:10:08.120 to visualize it but if we were to reduce this concept to a three-dimensional

00:10:13.680 space then or two-dimensional three-dimensional space then then then we can kind of imagine it so you can and

00:10:20.920 this is what was done in this graphic on the right hand side so you can see that different concepts are organized and

00:10:29.600 clustered by their meaning so any sports related phrases or or words are all

00:10:36.399 clustered here on the on the bottom right anything that relates to politics

00:10:41.560 or conflicts is uh on the leftand side and

00:10:46.639 Etc and so when we do semantic search what we're actually doing is we take a

00:10:54.760 query and then we identify where it is is in the in this latent space and we

00:11:02.399 tried to find the closest elements so hence the the term the similarity

00:11:07.760 search yeah so again so so what Vector search is or also synonymous with

00:11:13.240 similarity search or semantic searches finding the closest data points

00:11:20.680 by their inherent meaning and there's kind of this is a technical detail

00:11:26.160 there's kind of different ways of ating this distance so you can use the Manhattan distance the

00:11:33.000 ukian cosign chbf um so let's kind of summarize all of

00:11:41.399 this again um the Iraq pipeline looks

00:11:46.839 like follows the following workflow we uh users question comes in we generate

00:11:53.040 Vector embedding for that user question we then go to typically a vector

00:11:59.760 database um although it could be any other kind of traditional data source we

00:12:04.880 run a similarity search to retrieve relevant documents that are relevant to to that question that the llm will then

00:12:11.600 use to answer a specific question then we construct a rack a rack prompt that

00:12:17.199 looks something like the following and the llm will return a response in the

00:12:22.320 natural language so the prompt here gets

00:12:27.519 converted to a or or text and and is then sent to to

00:12:33.839 the llm uh whether it's gp4 or Claude or

00:12:38.959 one of the Google Google Cloud um llms so we would put the context and

00:12:47.360 these are the relevant documents in here so this gets uh concatenated in we

00:12:54.399 insert the original question some instructions which are basically would would just tell the llm uh something

00:13:02.399 like be succinct when you're answering the question or answer it in a certain

00:13:07.480 style or keep your response to a certain length or follow of a certain format or

00:13:14.800 respond in the Json format and this is the kind of Json structure um schema that we that I

00:13:22.800 expect you to to respond with and the answer colon just signals to VM that

00:13:30.000 its result its response follows from Kieran out so it kind of completes the rest of

00:13:35.560 it so what is what is actually link chainer be um so it's a it's a library for

00:13:42.959 leveraging llms to build rack pipelines like I've I've I've described in Vector

00:13:48.120 search and chat Bots and also do workflow automation with um AI

00:13:54.920 agents and um we aim to lean into the Ruby and rails

00:14:02.639 community so that it has the kind of plug andplay look and feel to it like all the

00:14:09.839 other gems and libraries that we've uh gotten to to to love in in in our in our

00:14:18.120 space so I'll show this example of a rag Pipeline and and then a very

00:14:27.120 simple demo of a AI powered internal management system so please I'll try to

00:14:33.920 full screen it please let me know if you can see it or not can you see

00:14:39.600 it okay and I will just scroll through

00:14:52.880 kind of technical details here so okay A bit big bigger let me

00:15:09.920 try I'm not sure that I can make it bigger can you see it or is

00:15:17.800 it okay and I will I will send um I'll actually send uh a link to to this

00:15:24.320 presentation so um you can kind of replay it if you if you want to I I I

00:15:29.680 apologize I recorded this video so you can see that we're instantiating an

00:15:34.880 instance of a vector search database here kind of creating a default

00:15:40.000 schema and in this instance I have a a local benefits brochure that I have

00:15:46.399 stored on my on my local computer and I'm gonna go ahead and add that to my

00:15:51.920 Vector search database and this will be the knowledge base where data will be retrieved from

00:15:59.560 so I'll skip this importing

00:16:05.279 step and once that has been imported you could see that I can ask um we basically

00:16:11.920 built a sort of a Q&A system and I can I can ask you questions about that specific document that I had uploaded so

00:16:20.560 the first question and I I hope I hope you all can see this but I'm asking what is the

00:16:26.319 company's vacation policy how much time can I take off and it's telling me the answer what are all

00:16:33.360 the benefits offered and it's stating that the benefits offered within that benefits brochure that I uploader are

00:16:40.920 include medical dental vision life short and long-term disabilities Etc what are

00:16:47.240 the parking benefits what are the 401K

00:16:54.720 benefits Etc so this is a very simple example with one document but you can

00:17:00.000 imagine a whole kind of corporate Corpus of knowledge uploaded

00:17:06.240 into the system and I would be able to have a have a Q&A style kind of

00:17:12.880 conversation with uh with the system and it would just fetch relevant data and be able to answer different

00:17:21.120 questions so the another thing that I wanted to kind of cover our agents so

00:17:28.199 the AI agents are autonomous general purpose llm powered

00:17:33.760 programs and they're they can be used to automate different workflows and

00:17:39.600 business processes and execute multi-step tasks they work best with

00:17:45.760 powerful llms and they can also use tools and tools are basically different apis or

00:17:52.919 internal databases or different systems and I'll show you an example of

00:18:00.400 an agent here as well that utilizes an agent that was built in that

00:18:06.960 that's in linkchain RB so first we're just instantiating a

00:18:14.600 weather tool and it's and it's basically a tool that could be used to just fetch current

00:18:23.200 weather we have a ruby code interpreter tool a Google search

00:18:31.400 self-explanatory and a calculator and we're going to use the I

00:18:38.200 mentioned that it works with powerful llms but you can you can

00:18:44.039 use any other API as well you don't you're not tied to using

00:18:51.559 openi and so we're instantiating the agent here we're saying we're calling length chain agent react agent

00:18:58.960 we're passing it the llm that's going to be that is going to be using which is open and we're passing it a collection of tools so it has access to basically

00:19:07.520 the weather API the Google search API it can execute Ruby code and it can also

00:19:13.240 use the calculator and you might be wondering

00:19:20.480 why does it the llm need to have access to the calculators because actually the llms are not amazing

00:19:29.000 not that great at math so for more complex calculations uh utilizing the

00:19:36.159 calculators actually yields much better results and so in this example you could

00:19:43.640 see that I'm asking the agent to find the current weather in Boston

00:19:49.720 Massachusetts Washington DC and take the average

00:19:55.440 so the prompt was sent to the open a

00:20:00.679 llm and then it decided that it first needed to invoke the weather

00:20:07.760 tool and find the weather for Boston

00:20:14.640 Massachusetts so you could see what follows is the tool execution so we make

00:20:19.880 a call on the llms behalf to to that tool to find the weather for for Boston

00:20:26.280 and send the result back and then the next step it decides that

00:20:31.799 it needs to invoke the weather tool and find the weather for Washington

00:20:37.679 DC so on the lm's behalf we invoke that tool we call that tool we call that

00:20:43.640 Ai and find the the current weather for Washington DC and send it back to the

00:20:50.960 llm and then it in it calls a calculator with a very simple

00:20:58.799 equation where it takes the two temperatures and splits it down the middle which was the original

00:21:05.600 prompt and then it responds with all the way at the bottom you can see the the

00:21:11.279 average current temperature in Boston Massachusetts and Washington DC is

00:21:16.760 84.2 dot dot dot so the next the next thing we ask it

00:21:24.080 is to find the current Rubble USD exchange

00:21:33.159 rate sorry I'll go back real quick and it decides that it in order to

00:21:41.080 answer that question it needs to do a Google search so it does a Google search

00:21:47.919 to um it it it decides to call the Google Search tool so and we call the

00:21:54.600 Google Search tool on its behalf with the following kind of query Rubble USD

00:22:00.720 exchange rate we send the result back and then the llm synthesizes that answer

00:22:07.600 in a uh more natural language sounding response and lastly we tell it to use a

00:22:16.200 ruby program to Output the sum of FIB of the Fibonacci sequence for 1 through

00:22:21.240 1,00 and we ask it not to Define any Ruby methods so the prompt is sent to the

00:22:29.720 llm and it decides to invoke the Ruby code interpreter and we execute the Ruby tool

00:22:36.799 on its behalf with this kind of equation with this um oneliner that it came up

00:22:44.039 with and it we send the result back and it tells us that the sum of the Fibonacci sequence is this

00:22:56.000 number Okay so why Ruby um I find that this

00:23:03.679 this this is an an outstanding question that uh comes back just about every year

00:23:10.600 um so why did we built this in Ruby I don't think Ruby's going

00:23:16.440 anywhere um I think in this in this um climate of kind of economic uh

00:23:24.200 contraction or stagnation depending on who you ask Ruby is actually an

00:23:29.480 excellent choice uh I I find that there's a a very healthy amount of

00:23:34.919 pragmatism in the Ruby Community we tend to not reinvent the wheel the the whole notion of a there's a gem for that we

00:23:42.559 love to solve actual business problems and and move quickly and don't reinvent

00:23:50.240 the wheel like the um some of the other ecosystems I think I think generally the

00:23:56.919 the monoliths are kind of back in fashion we've done a

00:24:03.360 project within we've we've done a project for client where we actually

00:24:08.559 Consolidated and rewrote their application from about 15 no JS

00:24:14.400 microservices to a single rails monolith who would have thought we would would

00:24:20.120 would be going all the way all the way back and that that was a successful project and I think and I think a lot a

00:24:26.520 lot more companies are starting to do that because they're looking at their

00:24:32.399 costs how much they're spending on maintaining and managing this type of a

00:24:37.559 complicated microservices stack and in in in a lot of different instances it just doesn't make

00:24:43.679 sense so and and also Ruby is very similar to python in terms of it's also

00:24:49.919 written in C there's a lot of python libraries data

00:24:55.760 science Centric libraries that are um that are just pure wrappers pure

00:25:01.720 python wrappers on top on top of C functions there's a reason why we couldn't have equivalence in Ruby I

00:25:08.600 think we should and python also has a is very similar to

00:25:15.080 Ruby in in terms of kind of has the same problems with with multiprocessing and

00:25:22.960 Etc so I think Ruby's I I think we ought to have

00:25:28.360 a lot of these capabilities in in in Ruby as well and that is

00:25:39.399 it thank

00:25:56.679 you

00:26:13.120 any

00:26:26.600 questions no no no audio unfortunately uh

00:26:37.640 yes so how do we do this without too

00:26:56.600 much

00:27:23.840 Andre can you hear me yes give me a thumbs up because we muted the speaker so we didn't have any Echo okay so you

00:27:30.240 can hear me all right I we can probably use the laptop

00:27:35.520 sound if everything is really quiet we should be fine go ahead and say

00:27:43.679 something good enough for everyone all right we're going to do it this way uh extremely sorry about the whole mic

00:27:49.760 situation um so the question is do we have any

00:27:55.720 questions was uh dealing with hallucinations if you have an agent that does a task you

00:28:04.159 have to be able to trust it how do you deal with that all right so the question is about autom um sorry hallucinations

00:28:11.760 so if you have any anything to add to hallucination but also if you create an agent uh how do you deal with

00:28:18.679 hallucinations in general if you were to automate something or if you had to just deal with the concept in general can you

00:28:24.279 elaborate a little bit for us thanks yeah

00:28:29.440 I'm I'm gonna sorry there's a lot of feedback I can hear myself I think it's I think it's good

00:28:37.080 thank thank you for your question so so I I'm going to repeat the question I think

00:28:42.440 if if you build an agent and you have and the agent hallucinates how do you

00:28:49.320 how do you deal with it so I think there's there's there's there's a couple of things you need

00:28:56.399 to you definitely need to be careful about giving it access

00:29:02.320 to your uh your infrastructure and it it may not be so

00:29:09.399 for example we also we also have an agent that can execute SQL queries and I would

00:29:15.799 say if you have if you if if you don't create a separate role that's maybe just

00:29:23.320 a read role in your database and you have the agent execute alter modify and insert

00:29:31.080 queries it may not be a good idea also before before an agent decides

00:29:37.720 to do so you can try to sanitize for um

00:29:44.240 sanitize some some some for some of the output and try to put some filters so

00:29:50.679 you know you would you would parse the response and and again maybe maybe search for those keywords like replace

00:29:55.840 insert alter and just and just raise an error if um the llm resp responds with

00:30:03.519 that um in terms of inaccurate data I think agents are very agents are

00:30:12.320 relatively reliable when it comes to very narrow tasks so for example I I I I showed a

00:30:21.480 couple things right so I showed a couple examples one of the examples was fetching fetching current weather and

00:30:28.600 then doing something with it and then the other example was fetching the exchange rate

00:30:37.240 um I'm I'm not sure I would have the same agent do maybe I'm not sure I would have the

00:30:43.440 same agent responsible for for those different tasks maybe I would probably create separate agents and kind of keep

00:30:49.440 tailoring and keep putting guard rails around them to basically clo closing

00:30:55.559 closing in on the on the number of different tasks that they're able to do because if you want to build a very

00:31:02.880 general agent that that can do everything you're G to have a bad

00:31:19.080 time I changed something in a what think good

00:31:25.120 words yeah I can hear it

00:31:34.240 you lower the volume over here yes on the left side that's it

00:31:39.760 y all right testing again give me your ABC ABC all right

00:31:47.720 cool I think we've got a working setup it turns out there's a volume on a mic it's not an unmute or mute button it's

00:31:54.399 an actual volume so we can pass the mic around next question

00:32:03.080 uh thanks for the great talk uh I saw in your code that you used like there was a specific agent called the react agent

00:32:10.919 are there many such agents and what's the main difference between them how do we

00:32:16.240 pick yeah so currently So currently there's two

00:32:21.279 agents there's a SQL Retriever and there's a react so the react stands for

00:32:28.639 reasoning and acting and basically kind of the way to think about this is that

00:32:35.559 there's a lot of research being done currently into how these llms could

00:32:43.200 be um structured to empower these agents

00:32:48.960 so like the the the the top people in the field are reading research papers in

00:32:54.880 the field every day and are implementing techniques as as they're as they're

00:33:00.039 being published and the techniques are constantly evolving so

00:33:07.240 um actually the the the functionality I'm not I'm not sure if any of you have

00:33:12.639 played around with it the U Assistance or the gpts in the open what open AI

00:33:19.360 just released about about a month ago you know I I I shouldn't say that it

00:33:26.480 was was Innovative but actually a lot of startups have been have been building

00:33:31.559 things like that for for about six months now so they were they were they

00:33:37.960 were kind of they were kind of they were a little bit behind in terms of in terms of these AI

00:33:43.799 agents um I think I think the the the the next ceration and actually what I'm

00:33:49.840 what I'm kind of trying to do with with link chain RB

00:33:55.320 specifically is make these make these agents com combine the different agent implementations and make

00:34:02.320 them a little bit easier to use so I mentioned the SQL retriever agent is is

00:34:08.040 tailored for retrieving for interfacing with a SQL database and then kind of

00:34:14.520 answering questions on top of it and I think I think it needs to I think it it

00:34:20.240 could be combined with the react agent where of a SQL database is just one of the tools that the the the agent has has

00:34:26.560 access

00:34:34.000 to thanks next one hey there uh yeah thanks again for

00:34:41.399 the great talk um this might not be a super coherent question but I'm I'm uh interested in this use case where you

00:34:47.720 were building a question and answer system for like internal company documentation and then the examples

00:34:53.240 you've talked about it seems like you take all this data that's in these private internal documents and you find

00:34:59.280 a way to like mix it together with all the information that's in this large language model and then create something

00:35:05.440 useful out of that and I was just wondering if you could talk a little bit more specifically maybe in technical detail about like how you analyze the

00:35:11.880 internal private company documents and get that data to play well with what's already in the Deep neural network in

00:35:18.440 the large language model yeah that's a good question

00:35:25.800 so so this is why my my talk was was focused around around

00:35:32.800 this retrieval augmented generation technique a lot of companies are doing

00:35:39.000 this right now I I guarantee that most of the startups that are gen AI generative AI

00:35:45.880 startups are literally doing that they're taking some sort of data source and they're building rag systems on top

00:35:53.200 of it so um

00:36:02.599 um there's so so kind of basically what you're doing so there there's there

00:36:08.960 there's a lot there but I'll I'll go into I I'll go into the first concept

00:36:14.480 which is which is taking that proprietary data taking the company data and putting putting it into into a

00:36:20.599 vector database so basically what you're doing in there is

00:36:28.720 um you're chunking the data you're splitting the data you're you're you're

00:36:34.319 you you have a a corpus of of company data right so it's it's PDFs maybe it's

00:36:40.200 audio files maybe it's video files it's um previous I don't know slack

00:36:46.200 conversations it's product requirements documents it's marketing copy it's sales

00:36:52.400 brochure Etc so you're building this kind of um a knowledge system right with

00:37:00.160 the hope that if someone needs to someone if if if let's say a new

00:37:06.960 employee starts right and they have all these questions about why certain decisions were made or what a on brand

00:37:16.079 marketing copy looks like right or how do you even write the code and what are

00:37:22.200 our coding Styles and why is this why does this exist as a separate microserver

00:37:27.800 so the hope is that you would you would build a system and and and have employees interface with a system and be

00:37:33.319 able to kind of better answer answer that data Maybe not not bug other employees to have these questions

00:37:39.319 answered so the first the first thing you would do is you would you would you would collect this data and you would

00:37:44.440 kind of Chunk this data and put it into into a vector database and so in in that

00:37:53.319 process so I kind of walk through what a what Vector embeddings are right and so

00:38:00.040 basically you're taking all this data and you're encoding it in in these

00:38:06.480 massive arrays of float numbers right and there's I'm not going to go into it and I'm also probably not qualified to

00:38:13.640 to talk about about this but there's these very complicated algorithms of how

00:38:19.680 you would take a text and in in EN code

00:38:25.359 and assign it an array of float numbers that encodes its meaning right so it's

00:38:30.800 an array of size 1,000 something 1500

00:38:36.119 right of float numbers that encodes the meaning and what you're doing is you're taking all that data you're generating

00:38:43.560 Vector embeddings for all that data and you're putting it into a vector database right there's a

00:38:49.359 ton of them uh a ton of vector databases have sprung out this

00:38:54.800 year I think I think we're we're going to see which which ones are here to stay which ones are going to be acquired

00:39:01.960 Etc and then so then what that what that

00:39:08.920 allows you to do is now you can come in with with a question right and and

00:39:18.720 search the vector database for documents that match your question right so if I'm

00:39:25.440 coming in and saying why does user

00:39:31.200 authentication why is the email service a separate Lambda

00:39:36.760 function versus a uh weren't we just subscribed why aren we why don't we just

00:39:43.960 have a subscription to to MailChimp for example right it's gonna it's going to go into the vector s database it's going

00:39:50.280 to try to is going to find all those documents that have not just not just the keywords not just the matching

00:39:56.599 keywords but also semantic similarity right by similarity by meaning is going

00:40:03.480 to fetch all those documents and then there's a bit of kind

00:40:10.040 of it's it's um in some instances it's it's it's more of an and it's more of an

00:40:15.960 art than a science here right so you you would take those those relevant

00:40:21.040 documents you would then construct a prompt and and I've mentioned in in one

00:40:26.079 of slides let me let me go back real quick you would constu construct this

00:40:33.839 rag prompt you would take this context concatenate

00:40:40.280 in and you would take your original question which was why does this service exist as a as a microservice right

00:40:47.560 Etc and you would then have the

00:40:53.200 llm Constructor response right so so so the

00:40:58.359 llm the the knowledge that it has you're you're not combining you're you're not

00:41:04.520 adding knowledge to its training data you're just telling it that I want you

00:41:10.400 to answer this question with this knowledge kind of

00:41:17.280 insight right in in in in context and it's and it's reasoning

00:41:25.240 abilities and it's its reasoning abilities give it the power to to be

00:41:31.160 able to do so right to take your question understand what you're asking understand what the context

00:41:38.240 represents right and then and then give you a coherent answer a a a super lengthy and hopefully

00:41:47.760 helpful answer to your question but great

00:41:53.160 question great answer yeah great I mean just to summarize that is the idea sort of that

00:41:59.720 um you're using the uh the vector space to like cluster your internal documents or classify them rather and then you're

00:42:06.720 getting the llm to sort of figure out to like summarize the information in the in

00:42:11.880 the documents so the llm is used for the summarization information U strategy kind of is that kind of what I'm the one

00:42:19.000 word or one sentence idea I I wouldn't say summarization I would maybe use the word

00:42:25.160 compression yeah so you yeah thank you you would you would compress the meaning into this into this array of float

00:42:32.839 numbers representation thanks Andre we have time

00:42:38.440 I think for a few more questions yeah ahead Paul hi thanks for the talk

00:42:46.319 so uh regarding this is there sorry does this protect from

00:42:52.319 prompt injection and I guess do we care about that

00:42:58.000 so so Len chain RB gives you kind of optionality to use any sort of llm so

00:43:05.200 you can use an off-the-shelf open AI llm you can use a I've mentioned a

00:43:14.440 Google Palm llm Google just came out with with a new one today called Gemini they claimed that it's just as good as

00:43:20.880 gp4 I haven't played around with it yet or you can also use a a local llm so for

00:43:28.920 example there's llama 2 which was open sourced by meta you can you can run that

00:43:35.240 locally if you wanted to um you need to kind of have certain compute

00:43:41.319 requirements to to be able to run that in a performant way but you can

00:43:47.520 absolutely do that and what's going to happen is that these models are going to

00:43:54.559 get more efficient ient they're going to get faster they're going to get really

00:44:00.119 powerful they're going to be open source and they're coming to like every device

00:44:05.520 that you own so it's going to be running locally on your phone it's going to be

00:44:11.200 it's going to be small and fast and it's going to be running on your laptop so that's that's that's

00:44:17.280 definitely going to happen um but but but but the best the best open

00:44:24.599 source models are still a little bit behind proprietary ones so when when

00:44:30.040 when the open source models are just as powerful you basically don't have to you don't have to worry about sending your

00:44:36.400 financial statements right to open AI accidentally

00:44:41.960 so and obviously Vector search databases right you can either pay for a managed

00:44:47.960 service or just like any other database you can you can kind of provision yourself right um your question about

00:44:57.319 um prompt injection I prompt

00:45:02.680 injections kind of you know kind of similar to how you would sanitize there there was a question earlier about

00:45:09.599 making sure that the agent doesn't go Rogue on you there's some tricks that

00:45:15.640 you could do with the to to to to protect your system from prompt injections as well but it's not it's not

00:45:23.040 it's not bulletproof I mean people have people have been able to get the get the original prompts for the GitHub co-pilot

00:45:32.599 and the suite of Microsoft Microsoft products that use llms so it's

00:45:41.119 not I don't I don't think it's it's currently a solved solved problem but yeah really really good

00:45:51.280 question thank you don't be

00:45:58.640 shy oh well maybe one last

00:46:05.559 one hi thank you for your talk um the question is um on python we have access

00:46:12.400 to hogging face how could it be possible to use hogging face um repositories with

00:46:18.359 um um Lang chin RB is it something that's going to be added in future or

00:46:24.559 you're seeing it to be added in future yeah so you can so you can use some of

00:46:30.760 the hugging face end points right now some of some of the apis but there

00:46:38.960 is something missing which is like the the the sentence Transformers

00:46:48.200 and I I was actually kind of looking into this the other day and I think I think we can I think we can we can build

00:46:54.880 it in in r Ruby so I don't I I don't want to promise anything but I think I

00:47:00.359 think it's I think it's doable that'd be really nice I mean you can download the hugging face models if

00:47:07.240 you have them locally you still need L chain to do some form transformation

00:47:12.599 between the local model and the L chain Library so for the local models I

00:47:19.680 actually I actually love a tool called AMA I think I think the website is ol

00:47:27.599 do.ai and you can you can just run you could just download any kind of Open

00:47:32.760 Source model and you can run it locally and it exposes an API endpoint and then

00:47:38.280 and then you just treat it like an like an API endpoint running locally at a certain host and

00:47:47.119 port and that if it was downloaded using L Lama or or anything else it would work

00:47:52.800 out of the box with L chain or you still need yeah yes yeah there you go just

00:47:58.640 download it Y and and and actually I'll I'll mention

00:48:04.920 so there's um there's a guy uh Andrew Kane I'm I'm I'm sure you guys have used

00:48:12.520 his libraries a lot I mean he's he's he's a tit Titan in the Ruby space

00:48:20.000 and I was yeah I was I was playing around with his uh torch. RB which was

00:48:25.760 supposed to be a u tensor I'm sorry a pie torch

00:48:30.880 Port so I played around with it a little bit and it's it's really powerful

00:48:37.480 actually so I think there's I think there's a lot of kind of libraries that we can build on top of

00:48:44.559 it definitely uh you can look it up on GitHub and can uh he's the Creator and

00:48:51.119 maintainer of definitely at least one gem that you're using in your uh J file

00:48:58.040 that's you had a question yep yeah I I was wondering if he could expand on his

00:49:03.280 vision of Ruby and AI I guess because you you mentioned it in the passing that

00:49:09.280 there was a similarity and that there was no reason for Ruby to not become kind of a big thing or AI to become a

00:49:15.559 big thing in Ruby and I was kind of curious as to where you see that going did you hear that

00:49:23.200 correctly would would you mind repeating that from please absolutely no problem I

00:49:28.280 so you mentioned before uh the connection between Ruby and Ai and there's no reason why Ruby just like

00:49:35.839 python wouldn't become the next big thing in Ai and using Ruby in AI would become sort of like a default for the

00:49:41.599 rubyist instead of defaulting to uh well let shoose python instead um do you have

00:49:47.680 any insights or anything about the the the that movement yeah it's look it's definitely

00:49:54.839 an uphill back battle python just happened to be in the right place at the right time with the right libraries so

00:50:02.200 there's there's a couple of those kind of data frame

00:50:08.839 libraries that were utilized and and happen to penetrate into the Academia

00:50:15.440 early on so the the pandas and some and numpy that helps you kind

00:50:22.599 of manage data and and change formats and and do quick operations on top of it

00:50:29.440 again the the way the way I see it so I I think we already have

00:50:34.640 equivalents basically Andrew Kane has has has built a lot of them people just

00:50:39.799 don't use them but there's no reason why we couldn't make them to be just as good as the

00:50:47.559 python equivalents again it's it's a very similar language the syntax is

00:50:53.280 similar you know it's just as slow as Ruby right if you think Ruby is

00:51:00.680 slow and and um obviously there's a there's a critical mass there's a

00:51:06.799 massive cohort of data science ml experts that prefer to use Ruby and

00:51:14.599 maybe Ruby never gets to be as as hardcore in that field

00:51:21.160 but we love the the whole plug-and-play experience so if we can if we can

00:51:27.680 achieve 80% of the outcomes with 20% of the effort then I I think a lot of Rubi

00:51:35.760 will will be making that tradeoff right so basically right now you have you have

00:51:40.920 a large Ruby stack or you you you have a ruby stack let's say you have a ruby

00:51:47.000 monolith and you would like to add some of these capabilities to your products and now you're basically trying to

00:51:52.599 decide whether I'm going to spin up a separate p on Lambda or microservice to

00:51:57.960 add these capabilities um and I think the the selling point for using

00:52:06.440 something like linkchain orb is that maybe you could just add a gem file and get 80% of her

00:52:15.280 results true dad I it's 8 o' Andre are you call for

00:52:21.280 one last question yeah absolutely definitely let's keep going then yeah thank you very much for the presentation

00:52:27.880 and yeah you just mentioned that Ruby is not yet like um implemented the parts

00:52:33.000 for um the hogen facee uh sentence Transformer Parts since I I believe a

00:52:38.920 sentence Transformer is quite important for Vector database so I'm curious if this actually uh affects your

00:52:45.839 developments uh in like uh rag or uh

00:52:52.280 system so it it does it does and which

00:52:58.920 is which is why i' I've kind of started started to look into

00:53:04.240 this and there's there's there's obviously a

00:53:09.440 lot there's a lot there's a lot to do there and there there's a lot to do there and

00:53:16.960 and I I think I'm I'm also trying to be I'm also trying to be pragmatic so the

00:53:24.240 more issues I care about are the things that I'm that I'm that I'm prioritizing

00:53:29.440 right but given given how given how slow

00:53:34.559 kind of a adoption still is I think on the Ruby side I think there's I think there's still some time I would actually

00:53:41.640 I would actually love to see many more AI

00:53:47.240 ml flavored talks at Ruby meetups and and

00:53:52.880 conferences like we just I I don't know if any of you have have gone I didn't I

00:53:58.640 didn't go this here there was there was just a Rubicon in in San Diego I don't think they had any AI talks

00:54:07.240 and so we happen to have uh one of the key uh well at least um a prize winner

00:54:14.880 of and a presenter but last year anyway the yeah there was one talk about AI

00:54:21.960 well maybe one was it the workshop yes the workshop yeah the workshop

00:54:28.760 so I just I I personally kind of feel that it it I I I I feel like we're it's a little

00:54:36.559 bit blindsided to not include any AI talks this here

00:54:44.440 right definitely and again given given given the given the hype and the

00:54:49.720 hyera in the in the AI field that's been that that's been unfolding this year um

00:54:55.520 you should take a bus ticket to come to Montreal because it's a very easy segue

00:55:00.599 into next month's Meetup of montreal. RB that focuses on Ruby and AI again with

00:55:08.160 JS presenting this time in the medical field

00:55:13.319 field you're not that far any other

00:55:20.960 questions mic is I'll I'll yeah maybe maybe I can ask a question go for it I'm

00:55:28.000 just curious I'm I'm just curious if anyone has experience building building applications with

00:55:35.599 lolms and and what your experience has

00:55:44.720 been yeah I'm kind of um kind of have some sort of experience because I'm

00:55:51.160 currently working as a AI research inter in a gam company so so yeah my my uh

00:55:56.599 current work is primarily working for a large training a large language model for like in-game characters so it's kind

00:56:03.960 of some experimental functionalities for a lot of like games so yeah this is uh

00:56:10.720 some sort of things that's uh similar to uh the things that you've uh you've done

00:56:16.079 here yeah so yeah we we've also like um

00:56:21.480 implemented some um our version of uh RX system and as well as some classifier to

00:56:28.559 help model to like um like uh classify the questions or something

00:56:36.319 yeah Goa thank you thank you for sharing anyone

00:56:43.520 else you should ask next month maybe yeah that's a good uh it's a good

00:56:50.920 point someone in the audience said we should ask again next month's because how many of you are going to go

00:56:58.039 and try Lang chain when they get back home show a hand yeah you'll have a lot more folks

00:57:04.720 uh with a lot more insights uh next month awesome please please if if if if

00:57:11.680 there's any pains you're having or any questions please ping me tag me DM me

00:57:17.079 Etc you want to show your last slide again with your contact details in the Discord uh and in the meantime big round

00:57:23.400 of applause for Andre thank you thank you thank

00:57:33.319 you thank you so much thank you so much thank you for having me thank you I'll

00:57:39.520 I'll I'll I'll I'll do my best to drop in next month thank you cheers thank you

00:57:44.760 all right yep see you bye

00:57:51.880 bye

Andrei Bondarev

@andreibondarev

Explore all talks recorded at Montreal.rb Meetup

+6

Montreal.rb Meetup