Summarized using AI

Building LLM powered Applications

Andrei Bondarev • December 06, 2023 • Montréal, Canada • Talk

In the talk titled Building LLM powered Applications, presented at the Montreal.rb Meetup in December 2023, Andrei Bondarev, a Solutions Architect and Owner at Source Labs LLC, discusses the impact and integration of Large Language Models (LLMs) into software development, particularly using the Ruby programming language. The presentation highlights the rapid advancements in Generative AI throughout 2023 and its implications for building AI-powered applications. Key points include:

  • AI Hype and Opportunities: The software industry is currently experiencing a surge in interest regarding AI innovations, accelerated by the popularity of tools like ChatGPT. Bondarev notes the disconnect between emerging AI technologies and actual business applications, stressing the need to bridge this gap.

  • Generative AI's Economic Impact: Referencing a McKinsey report, Bondarev estimates that Generative AI could add between $2.5 to $4 trillion in annual economic value by enhancing workplace productivity, particularly through more efficient internal knowledge management systems.

  • Use Cases in Various Industries: Generative AI is positioned to transform customer service, marketing, and product development by enabling advanced chatbots, personalized marketing strategies, and comprehensive data analysis for development needs.

  • LLM Capabilities: Bondarev describes LLMs as neural networks excelling at tasks such as transforming unstructured data into structured formats, summarization, and classification, illustrating their utility across software development processes.

  • Challenges with LLMs: Issues such as "hallucinations" (the generation of incorrect或 nonsensical outputs) and the limits of LLM training data are acknowledged. One proposed solution involves the use of Retrieval-Augmented Generation (RAG), where external knowledge bases can enhance the accuracy of LLM outputs by providing contextually relevant factual data.

  • Implementation of RAG Systems: The talk introduces a practical example of a RAG system that connects user queries to a vector search database, illustrating how to encode proprietary data and build a question-answering interface for internal corporate documentation.

  • Why Ruby? Bondarev argues for the continued relevance of Ruby in the AI landscape, emphasizing its strong community support, maintainability, and the potential to integrate advanced AI functionalities without reinventing the wheel.

  • Future Directions: The presentation concludes with a call to action for the Ruby community to engage more with AI developments and to look towards integrating frameworks like LangChain to simplify the use of LLMs in Ruby applications.

Overall, Bondarev’s talk makes a compelling case for leveraging LLMs in modern Ruby applications while addressing existing challenges and encouraging community involvement in these emerging technologies.

Building LLM powered Applications
Andrei Bondarev • December 06, 2023 • Montréal, Canada • Talk

Montreal.rb Ruby Talk 2023/12 - Building LLM powered Applications - Andrei Bondarev - Solutions Architect / Owner at Source Labs LLC

The 2023 breakthroughs in Generative AI (Artificial Intelligence) have been taking the software development world by storm. We'll take a look at a few components of what is quickly becoming the modern stack for building LLM (Large Language Model) powered applications. Andrei will build a case for Ruby in the emerging AI trend, and show how some of the AI capabilities can be leveraged today!

Montreal.rb Meetup December 2023

00:00:01.960 um I've been running a software development firm we're a small company C
00:00:07.200 Source labs for the last six years and we focus on building custom AI power
00:00:12.960 Technology Solutions and depending on the project or the or the the client size I play
00:00:20.640 either uh a role of an architect engineering manager or a fractional CTO
00:00:26.679 if we work with kind of smaller Venture back company so J of AI hype and hysteria so this
00:00:35.320 year has been taken over by a lot of hype in the AI field uh with the release
00:00:41.559 of Chad GPT companies are actively kind of trying to develop their
00:00:47.800 capabilities in this field and add llm back features to their products there's a massive fries and gen based
00:00:56.120 startups and I think there's some a little bit of this happening where
00:01:03.320 there's builders on the left hand side um and everyone is super excited about
00:01:08.439 all these different Technologies uh about the the Bots the AI agents AGI and chat GPT and all the
00:01:19.240 uh different Vector databases that are popping up and and then and then there's kind of real business problems and real
00:01:27.000 customers on the on the right hand side that have actual problems and and the two are still a little bit kind of
00:01:33.759 disconnected so unless we want to have a our own uh and other blockchain moment I
00:01:39.159 think we need to address this so I found this interesting McKenzie report that
00:01:45.880 I'll have a link in the back uh towards the end of the slides that uh they're estimating that the impact the
00:01:52.360 generative ai's impact on productivity could be equivalent to about 2.5 to 4
00:01:59.280 point uh four trillion dollars annually in value to
00:02:04.759 the global economy and one of the examples how generative AI could drive a lot of value
00:02:11.200 is by uh companies creating their own kind of internal Knowledge Management System so it's estimated that knowledge
00:02:18.800 workers spend about a fif or time or one day every Work Week searching for and
00:02:24.239 gathering information and this virtual AI expert e expert would have access to
00:02:31.040 all of the corporate data and the rest of Ip and the human can just have a conversation in natural language and and
00:02:37.800 fetch the data they need much more effectively so some of the industries
00:02:43.680 that could be transformed um and and stand to gain a
00:02:49.239 lot of improvement and and value from generative AI or customer operations
00:02:54.800 customer service so with SE self-service you could build a chat Bots uh that
00:03:01.920 offer a high personalization and is capable of resolving different complex
00:03:08.440 inquiries and uh you could also introduce customer agent augmentation where a customer
00:03:16.519 service agent would be served relevant response suggestions and customer data
00:03:21.720 and call transcripts kind of in real time uh when it comes to marketing and
00:03:27.879 sales uh it would it could yield a lot of value it could bring a lot of value
00:03:33.280 generative AI could bring a lot of value to a strategy so Gathering Market TR uh
00:03:38.400 analytics drafting up marketing and sales coms it would help with awareness
00:03:44.840 of the brand and conversion and and retention by bringing more human-like
00:03:51.560 product experiences um it's already changing
00:03:56.720 product development so when it comes to a planning phase so generative AI could
00:04:02.760 help with the gathering and analyzing usage analytics and trans to produce
00:04:08.400 product requirements and it's also already helping with the full kind of software development life life cycle so
00:04:15.760 whether it's coding testing or iterating and I think there's also a larger shift
00:04:20.880 at play that is um hasn't fully been
00:04:26.000 realized yet is that I think llm will be um uh a crucial part of every software
00:04:33.479 stack in the future so the large language models are artificial neural networks with kind of
00:04:40.199 general purpose language understanding and generation the exploding popularity after this attention is all you need
00:04:47.080 2017 paper and they excel at a lot of different tasks so they're really good
00:04:55.039 at converting unstructured to structured data so as Developers we kind of spend a
00:05:00.360 lot of time um turning unstructured data to structured data we love to program when
00:05:07.840 we have access to structured data and and we hate when we have to deal with unstructured data so the large language
00:05:15.320 models can already kind of serve as as that bridge the llms are also really good at
00:05:21.880 summarization so you could give it a document and and have it produce a summary or a block post or a book for
00:05:29.919 that matter um they're really good at classification as well so if you need to
00:05:34.960 classify a list of blog articles for example blog post uh whether it should
00:05:41.000 be bucketed as technology sports or business by topic then llms are really
00:05:47.080 good at that text generation as well and I I I definitely need a much better
00:05:52.600 graphic here and there's still a lot of problems so ver with with building llm
00:05:59.560 applications and working with LMS so there's obviously hin naations and I'm sure everyone here has has heard about
00:06:07.440 this when H H naations are when models are generating incorrect or nons
00:06:14.520 nonsensical text and the llms also aren't
00:06:21.800 continuously trained so they have data up to a certain point so for example GPT
00:06:27.199 4 was trained on data up through April 2023 so any world events uh afterwards
00:06:35.520 it just does not know about this and also in some instances the
00:06:42.080 relevant knowledge is not being leveraged so if you're like I've mentioned an example of building a
00:06:48.800 proprietary Knowledge Management System well gb4 is not just going to have
00:06:54.080 knowledge about your uh all all of your internal corporate documents and to to to address this
00:07:00.560 we'll we'll talk about Rag and what retrieval augmented generation is what a
00:07:06.520 retrieval augmented generation system is so this technique was first kind of
00:07:13.840 exposed in in this 20120 paper called retrieval augmented generation for knowledge intensive uh natural language
00:07:21.520 processing tasks so it's a technique for enhancing the accuracy and reliability
00:07:27.319 of generative AI models with facts fetched from external sources and the
00:07:33.400 workflow goes something like the following so you I'll I'll kind of drive your
00:07:40.440 attention to to the bottom here first that you'll you'll see that a user is
00:07:45.720 interfacing with a system and they would pose a question and ask how do I do X
00:07:53.800 and uh within the rack system we would then take this user question we would
00:08:00.599 generate Vector embeddings from that question we would go to the knowledge base typically a vector search
00:08:07.479 database we would find relevant documents by running a similarity search
00:08:12.680 then we would extract that knowledge and then put it in the prompt
00:08:17.759 that is then sent to the llm and then have the llm kind of generate or
00:08:23.000 synthesize an answer and then the llm would answer in natural language and say to do this you need to do this this this
00:08:30.000 this according to these sources so in order to kind
00:08:37.320 of understand this concept a little bit better we'll we'll jump into what Vector embeddings are what similarity search is
00:08:44.320 and uh what a rack prompt looks like so what are vector embeddings so Vector
00:08:51.080 embeddings are it's a machine learning technique to represent data in an
00:08:57.440 n-dimensional space so uh the it's an array of float numbers
00:09:04.519 of length n right so array of size n and
00:09:11.240 the llms generate these embeddings and they assign these float
00:09:17.480 numbers put it all in in one big array of couple hundred through a couple thousand of size 100 or thousand plus
00:09:27.279 and they by doing that they encode the meaning in an embedding space or
00:09:32.920 also called the latent space so for example if we take a phrase
00:09:40.839 this is the holiday season and put it through an embedding model then we will get the following um um Vector
00:09:50.519 embedding and I'll drive your attention here to The Graphic on the right hand
00:09:56.519 side so you can kind of see so so mentioned this this array
00:10:01.680 contains uh hundreds or or thousands of float numbers right so it's impossible
00:10:08.120 to visualize it but if we were to reduce this concept to a three-dimensional
00:10:13.680 space then or two-dimensional three-dimensional space then then then we can kind of imagine it so you can and
00:10:20.920 this is what was done in this graphic on the right hand side so you can see that different concepts are organized and
00:10:29.600 clustered by their meaning so any sports related phrases or or words are all
00:10:36.399 clustered here on the on the bottom right anything that relates to politics
00:10:41.560 or conflicts is uh on the leftand side and
00:10:46.639 Etc and so when we do semantic search what we're actually doing is we take a
00:10:54.760 query and then we identify where it is is in the in this latent space and we
00:11:02.399 tried to find the closest elements so hence the the term the similarity
00:11:07.760 search yeah so again so so what Vector search is or also synonymous with
00:11:13.240 similarity search or semantic searches finding the closest data points
00:11:20.680 by their inherent meaning and there's kind of this is a technical detail
00:11:26.160 there's kind of different ways of ating this distance so you can use the Manhattan distance the
00:11:33.000 ukian cosign chbf um so let's kind of summarize all of
00:11:41.399 this again um the Iraq pipeline looks
00:11:46.839 like follows the following workflow we uh users question comes in we generate
00:11:53.040 Vector embedding for that user question we then go to typically a vector
00:11:59.760 database um although it could be any other kind of traditional data source we
00:12:04.880 run a similarity search to retrieve relevant documents that are relevant to to that question that the llm will then
00:12:11.600 use to answer a specific question then we construct a rack a rack prompt that
00:12:17.199 looks something like the following and the llm will return a response in the
00:12:22.320 natural language so the prompt here gets
00:12:27.519 converted to a or or text and and is then sent to to
00:12:33.839 the llm uh whether it's gp4 or Claude or
00:12:38.959 one of the Google Google Cloud um llms so we would put the context and
00:12:47.360 these are the relevant documents in here so this gets uh concatenated in we
00:12:54.399 insert the original question some instructions which are basically would would just tell the llm uh something
00:13:02.399 like be succinct when you're answering the question or answer it in a certain
00:13:07.480 style or keep your response to a certain length or follow of a certain format or
00:13:14.800 respond in the Json format and this is the kind of Json structure um schema that we that I
00:13:22.800 expect you to to respond with and the answer colon just signals to VM that
00:13:30.000 its result its response follows from Kieran out so it kind of completes the rest of
00:13:35.560 it so what is what is actually link chainer be um so it's a it's a library for
00:13:42.959 leveraging llms to build rack pipelines like I've I've I've described in Vector
00:13:48.120 search and chat Bots and also do workflow automation with um AI
00:13:54.920 agents and um we aim to lean into the Ruby and rails
00:14:02.639 community so that it has the kind of plug andplay look and feel to it like all the
00:14:09.839 other gems and libraries that we've uh gotten to to to love in in in our in our
00:14:18.120 space so I'll show this example of a rag Pipeline and and then a very
00:14:27.120 simple demo of a AI powered internal management system so please I'll try to
00:14:33.920 full screen it please let me know if you can see it or not can you see
00:14:39.600 it okay and I will just scroll through
00:14:52.880 kind of technical details here so okay A bit big bigger let me
00:15:09.920 try I'm not sure that I can make it bigger can you see it or is
00:15:17.800 it okay and I will I will send um I'll actually send uh a link to to this
00:15:24.320 presentation so um you can kind of replay it if you if you want to I I I
00:15:29.680 apologize I recorded this video so you can see that we're instantiating an
00:15:34.880 instance of a vector search database here kind of creating a default
00:15:40.000 schema and in this instance I have a a local benefits brochure that I have
00:15:46.399 stored on my on my local computer and I'm gonna go ahead and add that to my
00:15:51.920 Vector search database and this will be the knowledge base where data will be retrieved from
00:15:59.560 so I'll skip this importing
00:16:05.279 step and once that has been imported you could see that I can ask um we basically
00:16:11.920 built a sort of a Q&A system and I can I can ask you questions about that specific document that I had uploaded so
00:16:20.560 the first question and I I hope I hope you all can see this but I'm asking what is the
00:16:26.319 company's vacation policy how much time can I take off and it's telling me the answer what are all
00:16:33.360 the benefits offered and it's stating that the benefits offered within that benefits brochure that I uploader are
00:16:40.920 include medical dental vision life short and long-term disabilities Etc what are
00:16:47.240 the parking benefits what are the 401K
00:16:54.720 benefits Etc so this is a very simple example with one document but you can
00:17:00.000 imagine a whole kind of corporate Corpus of knowledge uploaded
00:17:06.240 into the system and I would be able to have a have a Q&A style kind of
00:17:12.880 conversation with uh with the system and it would just fetch relevant data and be able to answer different
00:17:21.120 questions so the another thing that I wanted to kind of cover our agents so
00:17:28.199 the AI agents are autonomous general purpose llm powered
00:17:33.760 programs and they're they can be used to automate different workflows and
00:17:39.600 business processes and execute multi-step tasks they work best with
00:17:45.760 powerful llms and they can also use tools and tools are basically different apis or
00:17:52.919 internal databases or different systems and I'll show you an example of
00:18:00.400 an agent here as well that utilizes an agent that was built in that
00:18:06.960 that's in linkchain RB so first we're just instantiating a
00:18:14.600 weather tool and it's and it's basically a tool that could be used to just fetch current
00:18:23.200 weather we have a ruby code interpreter tool a Google search
00:18:31.400 self-explanatory and a calculator and we're going to use the I
00:18:38.200 mentioned that it works with powerful llms but you can you can
00:18:44.039 use any other API as well you don't you're not tied to using
00:18:51.559 openi and so we're instantiating the agent here we're saying we're calling length chain agent react agent
00:18:58.960 we're passing it the llm that's going to be that is going to be using which is open and we're passing it a collection of tools so it has access to basically
00:19:07.520 the weather API the Google search API it can execute Ruby code and it can also
00:19:13.240 use the calculator and you might be wondering
00:19:20.480 why does it the llm need to have access to the calculators because actually the llms are not amazing
00:19:29.000 not that great at math so for more complex calculations uh utilizing the
00:19:36.159 calculators actually yields much better results and so in this example you could
00:19:43.640 see that I'm asking the agent to find the current weather in Boston
00:19:49.720 Massachusetts Washington DC and take the average
00:19:55.440 so the prompt was sent to the open a
00:20:00.679 llm and then it decided that it first needed to invoke the weather
00:20:07.760 tool and find the weather for Boston
00:20:14.640 Massachusetts so you could see what follows is the tool execution so we make
00:20:19.880 a call on the llms behalf to to that tool to find the weather for for Boston
00:20:26.280 and send the result back and then the next step it decides that
00:20:31.799 it needs to invoke the weather tool and find the weather for Washington
00:20:37.679 DC so on the lm's behalf we invoke that tool we call that tool we call that
00:20:43.640 Ai and find the the current weather for Washington DC and send it back to the
00:20:50.960 llm and then it in it calls a calculator with a very simple
00:20:58.799 equation where it takes the two temperatures and splits it down the middle which was the original
00:21:05.600 prompt and then it responds with all the way at the bottom you can see the the
00:21:11.279 average current temperature in Boston Massachusetts and Washington DC is
00:21:16.760 84.2 dot dot dot so the next the next thing we ask it
00:21:24.080 is to find the current Rubble USD exchange
00:21:33.159 rate sorry I'll go back real quick and it decides that it in order to
00:21:41.080 answer that question it needs to do a Google search so it does a Google search
00:21:47.919 to um it it it decides to call the Google Search tool so and we call the
00:21:54.600 Google Search tool on its behalf with the following kind of query Rubble USD
00:22:00.720 exchange rate we send the result back and then the llm synthesizes that answer
00:22:07.600 in a uh more natural language sounding response and lastly we tell it to use a
00:22:16.200 ruby program to Output the sum of FIB of the Fibonacci sequence for 1 through
00:22:21.240 1,00 and we ask it not to Define any Ruby methods so the prompt is sent to the
00:22:29.720 llm and it decides to invoke the Ruby code interpreter and we execute the Ruby tool
00:22:36.799 on its behalf with this kind of equation with this um oneliner that it came up
00:22:44.039 with and it we send the result back and it tells us that the sum of the Fibonacci sequence is this
00:22:56.000 number Okay so why Ruby um I find that this
00:23:03.679 this this is an an outstanding question that uh comes back just about every year
00:23:10.600 um so why did we built this in Ruby I don't think Ruby's going
00:23:16.440 anywhere um I think in this in this um climate of kind of economic uh
00:23:24.200 contraction or stagnation depending on who you ask Ruby is actually an
00:23:29.480 excellent choice uh I I find that there's a a very healthy amount of
00:23:34.919 pragmatism in the Ruby Community we tend to not reinvent the wheel the the whole notion of a there's a gem for that we
00:23:42.559 love to solve actual business problems and and move quickly and don't reinvent
00:23:50.240 the wheel like the um some of the other ecosystems I think I think generally the
00:23:56.919 the monoliths are kind of back in fashion we've done a
00:24:03.360 project within we've we've done a project for client where we actually
00:24:08.559 Consolidated and rewrote their application from about 15 no JS
00:24:14.400 microservices to a single rails monolith who would have thought we would would
00:24:20.120 would be going all the way all the way back and that that was a successful project and I think and I think a lot a
00:24:26.520 lot more companies are starting to do that because they're looking at their
00:24:32.399 costs how much they're spending on maintaining and managing this type of a
00:24:37.559 complicated microservices stack and in in in a lot of different instances it just doesn't make
00:24:43.679 sense so and and also Ruby is very similar to python in terms of it's also
00:24:49.919 written in C there's a lot of python libraries data
00:24:55.760 science Centric libraries that are um that are just pure wrappers pure
00:25:01.720 python wrappers on top on top of C functions there's a reason why we couldn't have equivalence in Ruby I
00:25:08.600 think we should and python also has a is very similar to
00:25:15.080 Ruby in in terms of kind of has the same problems with with multiprocessing and
00:25:22.960 Etc so I think Ruby's I I think we ought to have
00:25:28.360 a lot of these capabilities in in in Ruby as well and that is
00:25:39.399 it thank
00:25:56.679 you
00:26:13.120 any
00:26:26.600 questions no no no audio unfortunately uh
00:26:37.640 yes so how do we do this without too
00:26:56.600 much
00:27:23.840 Andre can you hear me yes give me a thumbs up because we muted the speaker so we didn't have any Echo okay so you
00:27:30.240 can hear me all right I we can probably use the laptop
00:27:35.520 sound if everything is really quiet we should be fine go ahead and say
00:27:43.679 something good enough for everyone all right we're going to do it this way uh extremely sorry about the whole mic
00:27:49.760 situation um so the question is do we have any
00:27:55.720 questions was uh dealing with hallucinations if you have an agent that does a task you
00:28:04.159 have to be able to trust it how do you deal with that all right so the question is about autom um sorry hallucinations
00:28:11.760 so if you have any anything to add to hallucination but also if you create an agent uh how do you deal with
00:28:18.679 hallucinations in general if you were to automate something or if you had to just deal with the concept in general can you
00:28:24.279 elaborate a little bit for us thanks yeah
00:28:29.440 I'm I'm gonna sorry there's a lot of feedback I can hear myself I think it's I think it's good
00:28:37.080 thank thank you for your question so so I I'm going to repeat the question I think
00:28:42.440 if if you build an agent and you have and the agent hallucinates how do you
00:28:49.320 how do you deal with it so I think there's there's there's there's a couple of things you need
00:28:56.399 to you definitely need to be careful about giving it access
00:29:02.320 to your uh your infrastructure and it it may not be so
00:29:09.399 for example we also we also have an agent that can execute SQL queries and I would
00:29:15.799 say if you have if you if if you don't create a separate role that's maybe just
00:29:23.320 a read role in your database and you have the agent execute alter modify and insert
00:29:31.080 queries it may not be a good idea also before before an agent decides
00:29:37.720 to do so you can try to sanitize for um
00:29:44.240 sanitize some some some for some of the output and try to put some filters so
00:29:50.679 you know you would you would parse the response and and again maybe maybe search for those keywords like replace
00:29:55.840 insert alter and just and just raise an error if um the llm resp responds with
00:30:03.519 that um in terms of inaccurate data I think agents are very agents are
00:30:12.320 relatively reliable when it comes to very narrow tasks so for example I I I I showed a
00:30:21.480 couple things right so I showed a couple examples one of the examples was fetching fetching current weather and
00:30:28.600 then doing something with it and then the other example was fetching the exchange rate
00:30:37.240 um I'm I'm not sure I would have the same agent do maybe I'm not sure I would have the
00:30:43.440 same agent responsible for for those different tasks maybe I would probably create separate agents and kind of keep
00:30:49.440 tailoring and keep putting guard rails around them to basically clo closing
00:30:55.559 closing in on the on the number of different tasks that they're able to do because if you want to build a very
00:31:02.880 general agent that that can do everything you're G to have a bad
00:31:19.080 time I changed something in a what think good
00:31:25.120 words yeah I can hear it
00:31:34.240 you lower the volume over here yes on the left side that's it
00:31:39.760 y all right testing again give me your ABC ABC all right
00:31:47.720 cool I think we've got a working setup it turns out there's a volume on a mic it's not an unmute or mute button it's
00:31:54.399 an actual volume so we can pass the mic around next question
00:32:03.080 uh thanks for the great talk uh I saw in your code that you used like there was a specific agent called the react agent
00:32:10.919 are there many such agents and what's the main difference between them how do we
00:32:16.240 pick yeah so currently So currently there's two
00:32:21.279 agents there's a SQL Retriever and there's a react so the react stands for
00:32:28.639 reasoning and acting and basically kind of the way to think about this is that
00:32:35.559 there's a lot of research being done currently into how these llms could
00:32:43.200 be um structured to empower these agents
00:32:48.960 so like the the the the top people in the field are reading research papers in
00:32:54.880 the field every day and are implementing techniques as as they're as they're
00:33:00.039 being published and the techniques are constantly evolving so
00:33:07.240 um actually the the the functionality I'm not I'm not sure if any of you have
00:33:12.639 played around with it the U Assistance or the gpts in the open what open AI
00:33:19.360 just released about about a month ago you know I I I shouldn't say that it
00:33:26.480 was was Innovative but actually a lot of startups have been have been building
00:33:31.559 things like that for for about six months now so they were they were they
00:33:37.960 were kind of they were kind of they were a little bit behind in terms of in terms of these AI
00:33:43.799 agents um I think I think the the the the next ceration and actually what I'm
00:33:49.840 what I'm kind of trying to do with with link chain RB
00:33:55.320 specifically is make these make these agents com combine the different agent implementations and make
00:34:02.320 them a little bit easier to use so I mentioned the SQL retriever agent is is
00:34:08.040 tailored for retrieving for interfacing with a SQL database and then kind of
00:34:14.520 answering questions on top of it and I think I think it needs to I think it it
00:34:20.240 could be combined with the react agent where of a SQL database is just one of the tools that the the the agent has has
00:34:26.560 access
00:34:34.000 to thanks next one hey there uh yeah thanks again for
00:34:41.399 the great talk um this might not be a super coherent question but I'm I'm uh interested in this use case where you
00:34:47.720 were building a question and answer system for like internal company documentation and then the examples
00:34:53.240 you've talked about it seems like you take all this data that's in these private internal documents and you find
00:34:59.280 a way to like mix it together with all the information that's in this large language model and then create something
00:35:05.440 useful out of that and I was just wondering if you could talk a little bit more specifically maybe in technical detail about like how you analyze the
00:35:11.880 internal private company documents and get that data to play well with what's already in the Deep neural network in
00:35:18.440 the large language model yeah that's a good question
00:35:25.800 so so this is why my my talk was was focused around around
00:35:32.800 this retrieval augmented generation technique a lot of companies are doing
00:35:39.000 this right now I I guarantee that most of the startups that are gen AI generative AI
00:35:45.880 startups are literally doing that they're taking some sort of data source and they're building rag systems on top
00:35:53.200 of it so um
00:36:02.599 um there's so so kind of basically what you're doing so there there's there
00:36:08.960 there's a lot there but I'll I'll go into I I'll go into the first concept
00:36:14.480 which is which is taking that proprietary data taking the company data and putting putting it into into a
00:36:20.599 vector database so basically what you're doing in there is
00:36:28.720 um you're chunking the data you're splitting the data you're you're you're
00:36:34.319 you you have a a corpus of of company data right so it's it's PDFs maybe it's
00:36:40.200 audio files maybe it's video files it's um previous I don't know slack
00:36:46.200 conversations it's product requirements documents it's marketing copy it's sales
00:36:52.400 brochure Etc so you're building this kind of um a knowledge system right with
00:37:00.160 the hope that if someone needs to someone if if if let's say a new
00:37:06.960 employee starts right and they have all these questions about why certain decisions were made or what a on brand
00:37:16.079 marketing copy looks like right or how do you even write the code and what are
00:37:22.200 our coding Styles and why is this why does this exist as a separate microserver
00:37:27.800 so the hope is that you would you would build a system and and and have employees interface with a system and be
00:37:33.319 able to kind of better answer answer that data Maybe not not bug other employees to have these questions
00:37:39.319 answered so the first the first thing you would do is you would you would you would collect this data and you would
00:37:44.440 kind of Chunk this data and put it into into a vector database and so in in that
00:37:53.319 process so I kind of walk through what a what Vector embeddings are right and so
00:38:00.040 basically you're taking all this data and you're encoding it in in these
00:38:06.480 massive arrays of float numbers right and there's I'm not going to go into it and I'm also probably not qualified to
00:38:13.640 to talk about about this but there's these very complicated algorithms of how
00:38:19.680 you would take a text and in in EN code
00:38:25.359 and assign it an array of float numbers that encodes its meaning right so it's
00:38:30.800 an array of size 1,000 something 1500
00:38:36.119 right of float numbers that encodes the meaning and what you're doing is you're taking all that data you're generating
00:38:43.560 Vector embeddings for all that data and you're putting it into a vector database right there's a
00:38:49.359 ton of them uh a ton of vector databases have sprung out this
00:38:54.800 year I think I think we're we're going to see which which ones are here to stay which ones are going to be acquired
00:39:01.960 Etc and then so then what that what that
00:39:08.920 allows you to do is now you can come in with with a question right and and
00:39:18.720 search the vector database for documents that match your question right so if I'm
00:39:25.440 coming in and saying why does user
00:39:31.200 authentication why is the email service a separate Lambda
00:39:36.760 function versus a uh weren't we just subscribed why aren we why don't we just
00:39:43.960 have a subscription to to MailChimp for example right it's gonna it's going to go into the vector s database it's going
00:39:50.280 to try to is going to find all those documents that have not just not just the keywords not just the matching
00:39:56.599 keywords but also semantic similarity right by similarity by meaning is going
00:40:03.480 to fetch all those documents and then there's a bit of kind
00:40:10.040 of it's it's um in some instances it's it's it's more of an and it's more of an
00:40:15.960 art than a science here right so you you would take those those relevant
00:40:21.040 documents you would then construct a prompt and and I've mentioned in in one
00:40:26.079 of slides let me let me go back real quick you would constu construct this
00:40:33.839 rag prompt you would take this context concatenate
00:40:40.280 in and you would take your original question which was why does this service exist as a as a microservice right
00:40:47.560 Etc and you would then have the
00:40:53.200 llm Constructor response right so so so the
00:40:58.359 llm the the knowledge that it has you're you're not combining you're you're not
00:41:04.520 adding knowledge to its training data you're just telling it that I want you
00:41:10.400 to answer this question with this knowledge kind of
00:41:17.280 insight right in in in in context and it's and it's reasoning
00:41:25.240 abilities and it's its reasoning abilities give it the power to to be
00:41:31.160 able to do so right to take your question understand what you're asking understand what the context
00:41:38.240 represents right and then and then give you a coherent answer a a a super lengthy and hopefully
00:41:47.760 helpful answer to your question but great
00:41:53.160 question great answer yeah great I mean just to summarize that is the idea sort of that
00:41:59.720 um you're using the uh the vector space to like cluster your internal documents or classify them rather and then you're
00:42:06.720 getting the llm to sort of figure out to like summarize the information in the in
00:42:11.880 the documents so the llm is used for the summarization information U strategy kind of is that kind of what I'm the one
00:42:19.000 word or one sentence idea I I wouldn't say summarization I would maybe use the word
00:42:25.160 compression yeah so you yeah thank you you would you would compress the meaning into this into this array of float
00:42:32.839 numbers representation thanks Andre we have time
00:42:38.440 I think for a few more questions yeah ahead Paul hi thanks for the talk
00:42:46.319 so uh regarding this is there sorry does this protect from
00:42:52.319 prompt injection and I guess do we care about that
00:42:58.000 so so Len chain RB gives you kind of optionality to use any sort of llm so
00:43:05.200 you can use an off-the-shelf open AI llm you can use a I've mentioned a
00:43:14.440 Google Palm llm Google just came out with with a new one today called Gemini they claimed that it's just as good as
00:43:20.880 gp4 I haven't played around with it yet or you can also use a a local llm so for
00:43:28.920 example there's llama 2 which was open sourced by meta you can you can run that
00:43:35.240 locally if you wanted to um you need to kind of have certain compute
00:43:41.319 requirements to to be able to run that in a performant way but you can
00:43:47.520 absolutely do that and what's going to happen is that these models are going to
00:43:54.559 get more efficient ient they're going to get faster they're going to get really
00:44:00.119 powerful they're going to be open source and they're coming to like every device
00:44:05.520 that you own so it's going to be running locally on your phone it's going to be
00:44:11.200 it's going to be small and fast and it's going to be running on your laptop so that's that's that's
00:44:17.280 definitely going to happen um but but but but the best the best open
00:44:24.599 source models are still a little bit behind proprietary ones so when when
00:44:30.040 when the open source models are just as powerful you basically don't have to you don't have to worry about sending your
00:44:36.400 financial statements right to open AI accidentally
00:44:41.960 so and obviously Vector search databases right you can either pay for a managed
00:44:47.960 service or just like any other database you can you can kind of provision yourself right um your question about
00:44:57.319 um prompt injection I prompt
00:45:02.680 injections kind of you know kind of similar to how you would sanitize there there was a question earlier about
00:45:09.599 making sure that the agent doesn't go Rogue on you there's some tricks that
00:45:15.640 you could do with the to to to to protect your system from prompt injections as well but it's not it's not
00:45:23.040 it's not bulletproof I mean people have people have been able to get the get the original prompts for the GitHub co-pilot
00:45:32.599 and the suite of Microsoft Microsoft products that use llms so it's
00:45:41.119 not I don't I don't think it's it's currently a solved solved problem but yeah really really good
00:45:51.280 question thank you don't be
00:45:58.640 shy oh well maybe one last
00:46:05.559 one hi thank you for your talk um the question is um on python we have access
00:46:12.400 to hogging face how could it be possible to use hogging face um repositories with
00:46:18.359 um um Lang chin RB is it something that's going to be added in future or
00:46:24.559 you're seeing it to be added in future yeah so you can so you can use some of
00:46:30.760 the hugging face end points right now some of some of the apis but there
00:46:38.960 is something missing which is like the the the sentence Transformers
00:46:48.200 and I I was actually kind of looking into this the other day and I think I think we can I think we can we can build
00:46:54.880 it in in r Ruby so I don't I I don't want to promise anything but I think I
00:47:00.359 think it's I think it's doable that'd be really nice I mean you can download the hugging face models if
00:47:07.240 you have them locally you still need L chain to do some form transformation
00:47:12.599 between the local model and the L chain Library so for the local models I
00:47:19.680 actually I actually love a tool called AMA I think I think the website is ol
00:47:27.599 do.ai and you can you can just run you could just download any kind of Open
00:47:32.760 Source model and you can run it locally and it exposes an API endpoint and then
00:47:38.280 and then you just treat it like an like an API endpoint running locally at a certain host and
00:47:47.119 port and that if it was downloaded using L Lama or or anything else it would work
00:47:52.800 out of the box with L chain or you still need yeah yes yeah there you go just
00:47:58.640 download it Y and and and actually I'll I'll mention
00:48:04.920 so there's um there's a guy uh Andrew Kane I'm I'm I'm sure you guys have used
00:48:12.520 his libraries a lot I mean he's he's he's a tit Titan in the Ruby space
00:48:20.000 and I was yeah I was I was playing around with his uh torch. RB which was
00:48:25.760 supposed to be a u tensor I'm sorry a pie torch
00:48:30.880 Port so I played around with it a little bit and it's it's really powerful
00:48:37.480 actually so I think there's I think there's a lot of kind of libraries that we can build on top of
00:48:44.559 it definitely uh you can look it up on GitHub and can uh he's the Creator and
00:48:51.119 maintainer of definitely at least one gem that you're using in your uh J file
00:48:58.040 that's you had a question yep yeah I I was wondering if he could expand on his
00:49:03.280 vision of Ruby and AI I guess because you you mentioned it in the passing that
00:49:09.280 there was a similarity and that there was no reason for Ruby to not become kind of a big thing or AI to become a
00:49:15.559 big thing in Ruby and I was kind of curious as to where you see that going did you hear that
00:49:23.200 correctly would would you mind repeating that from please absolutely no problem I
00:49:28.280 so you mentioned before uh the connection between Ruby and Ai and there's no reason why Ruby just like
00:49:35.839 python wouldn't become the next big thing in Ai and using Ruby in AI would become sort of like a default for the
00:49:41.599 rubyist instead of defaulting to uh well let shoose python instead um do you have
00:49:47.680 any insights or anything about the the the that movement yeah it's look it's definitely
00:49:54.839 an uphill back battle python just happened to be in the right place at the right time with the right libraries so
00:50:02.200 there's there's a couple of those kind of data frame
00:50:08.839 libraries that were utilized and and happen to penetrate into the Academia
00:50:15.440 early on so the the pandas and some and numpy that helps you kind
00:50:22.599 of manage data and and change formats and and do quick operations on top of it
00:50:29.440 again the the way the way I see it so I I think we already have
00:50:34.640 equivalents basically Andrew Kane has has has built a lot of them people just
00:50:39.799 don't use them but there's no reason why we couldn't make them to be just as good as the
00:50:47.559 python equivalents again it's it's a very similar language the syntax is
00:50:53.280 similar you know it's just as slow as Ruby right if you think Ruby is
00:51:00.680 slow and and um obviously there's a there's a critical mass there's a
00:51:06.799 massive cohort of data science ml experts that prefer to use Ruby and
00:51:14.599 maybe Ruby never gets to be as as hardcore in that field
00:51:21.160 but we love the the whole plug-and-play experience so if we can if we can
00:51:27.680 achieve 80% of the outcomes with 20% of the effort then I I think a lot of Rubi
00:51:35.760 will will be making that tradeoff right so basically right now you have you have
00:51:40.920 a large Ruby stack or you you you have a ruby stack let's say you have a ruby
00:51:47.000 monolith and you would like to add some of these capabilities to your products and now you're basically trying to
00:51:52.599 decide whether I'm going to spin up a separate p on Lambda or microservice to
00:51:57.960 add these capabilities um and I think the the selling point for using
00:52:06.440 something like linkchain orb is that maybe you could just add a gem file and get 80% of her
00:52:15.280 results true dad I it's 8 o' Andre are you call for
00:52:21.280 one last question yeah absolutely definitely let's keep going then yeah thank you very much for the presentation
00:52:27.880 and yeah you just mentioned that Ruby is not yet like um implemented the parts
00:52:33.000 for um the hogen facee uh sentence Transformer Parts since I I believe a
00:52:38.920 sentence Transformer is quite important for Vector database so I'm curious if this actually uh affects your
00:52:45.839 developments uh in like uh rag or uh
00:52:52.280 system so it it does it does and which
00:52:58.920 is which is why i' I've kind of started started to look into
00:53:04.240 this and there's there's there's obviously a
00:53:09.440 lot there's a lot there's a lot to do there and there there's a lot to do there and
00:53:16.960 and I I think I'm I'm also trying to be I'm also trying to be pragmatic so the
00:53:24.240 more issues I care about are the things that I'm that I'm that I'm prioritizing
00:53:29.440 right but given given how given how slow
00:53:34.559 kind of a adoption still is I think on the Ruby side I think there's I think there's still some time I would actually
00:53:41.640 I would actually love to see many more AI
00:53:47.240 ml flavored talks at Ruby meetups and and
00:53:52.880 conferences like we just I I don't know if any of you have have gone I didn't I
00:53:58.640 didn't go this here there was there was just a Rubicon in in San Diego I don't think they had any AI talks
00:54:07.240 and so we happen to have uh one of the key uh well at least um a prize winner
00:54:14.880 of and a presenter but last year anyway the yeah there was one talk about AI
00:54:21.960 well maybe one was it the workshop yes the workshop yeah the workshop
00:54:28.760 so I just I I personally kind of feel that it it I I I I feel like we're it's a little
00:54:36.559 bit blindsided to not include any AI talks this here
00:54:44.440 right definitely and again given given given the given the hype and the
00:54:49.720 hyera in the in the AI field that's been that that's been unfolding this year um
00:54:55.520 you should take a bus ticket to come to Montreal because it's a very easy segue
00:55:00.599 into next month's Meetup of montreal. RB that focuses on Ruby and AI again with
00:55:08.160 JS presenting this time in the medical field
00:55:13.319 field you're not that far any other
00:55:20.960 questions mic is I'll I'll yeah maybe maybe I can ask a question go for it I'm
00:55:28.000 just curious I'm I'm just curious if anyone has experience building building applications with
00:55:35.599 lolms and and what your experience has
00:55:44.720 been yeah I'm kind of um kind of have some sort of experience because I'm
00:55:51.160 currently working as a AI research inter in a gam company so so yeah my my uh
00:55:56.599 current work is primarily working for a large training a large language model for like in-game characters so it's kind
00:56:03.960 of some experimental functionalities for a lot of like games so yeah this is uh
00:56:10.720 some sort of things that's uh similar to uh the things that you've uh you've done
00:56:16.079 here yeah so yeah we we've also like um
00:56:21.480 implemented some um our version of uh RX system and as well as some classifier to
00:56:28.559 help model to like um like uh classify the questions or something
00:56:36.319 yeah Goa thank you thank you for sharing anyone
00:56:43.520 else you should ask next month maybe yeah that's a good uh it's a good
00:56:50.920 point someone in the audience said we should ask again next month's because how many of you are going to go
00:56:58.039 and try Lang chain when they get back home show a hand yeah you'll have a lot more folks
00:57:04.720 uh with a lot more insights uh next month awesome please please if if if if
00:57:11.680 there's any pains you're having or any questions please ping me tag me DM me
00:57:17.079 Etc you want to show your last slide again with your contact details in the Discord uh and in the meantime big round
00:57:23.400 of applause for Andre thank you thank you thank
00:57:33.319 you thank you so much thank you so much thank you for having me thank you I'll
00:57:39.520 I'll I'll I'll I'll do my best to drop in next month thank you cheers thank you
00:57:44.760 all right yep see you bye
00:57:51.880 bye
Explore all talks recorded at Montreal.rb Meetup
+6