00:00:01.960
um I've been running a software development firm we're a small company C
00:00:07.200
Source labs for the last six years and we focus on building custom AI power
00:00:12.960
Technology Solutions and depending on the project or the or the the client size I play
00:00:20.640
either uh a role of an architect engineering manager or a fractional CTO
00:00:26.679
if we work with kind of smaller Venture back company so J of AI hype and hysteria so this
00:00:35.320
year has been taken over by a lot of hype in the AI field uh with the release
00:00:41.559
of Chad GPT companies are actively kind of trying to develop their
00:00:47.800
capabilities in this field and add llm back features to their products there's a massive fries and gen based
00:00:56.120
startups and I think there's some a little bit of this happening where
00:01:03.320
there's builders on the left hand side um and everyone is super excited about
00:01:08.439
all these different Technologies uh about the the Bots the AI agents AGI and chat GPT and all the
00:01:19.240
uh different Vector databases that are popping up and and then and then there's kind of real business problems and real
00:01:27.000
customers on the on the right hand side that have actual problems and and the two are still a little bit kind of
00:01:33.759
disconnected so unless we want to have a our own uh and other blockchain moment I
00:01:39.159
think we need to address this so I found this interesting McKenzie report that
00:01:45.880
I'll have a link in the back uh towards the end of the slides that uh they're estimating that the impact the
00:01:52.360
generative ai's impact on productivity could be equivalent to about 2.5 to 4
00:01:59.280
point uh four trillion dollars annually in value to
00:02:04.759
the global economy and one of the examples how generative AI could drive a lot of value
00:02:11.200
is by uh companies creating their own kind of internal Knowledge Management System so it's estimated that knowledge
00:02:18.800
workers spend about a fif or time or one day every Work Week searching for and
00:02:24.239
gathering information and this virtual AI expert e expert would have access to
00:02:31.040
all of the corporate data and the rest of Ip and the human can just have a conversation in natural language and and
00:02:37.800
fetch the data they need much more effectively so some of the industries
00:02:43.680
that could be transformed um and and stand to gain a
00:02:49.239
lot of improvement and and value from generative AI or customer operations
00:02:54.800
customer service so with SE self-service you could build a chat Bots uh that
00:03:01.920
offer a high personalization and is capable of resolving different complex
00:03:08.440
inquiries and uh you could also introduce customer agent augmentation where a customer
00:03:16.519
service agent would be served relevant response suggestions and customer data
00:03:21.720
and call transcripts kind of in real time uh when it comes to marketing and
00:03:27.879
sales uh it would it could yield a lot of value it could bring a lot of value
00:03:33.280
generative AI could bring a lot of value to a strategy so Gathering Market TR uh
00:03:38.400
analytics drafting up marketing and sales coms it would help with awareness
00:03:44.840
of the brand and conversion and and retention by bringing more human-like
00:03:51.560
product experiences um it's already changing
00:03:56.720
product development so when it comes to a planning phase so generative AI could
00:04:02.760
help with the gathering and analyzing usage analytics and trans to produce
00:04:08.400
product requirements and it's also already helping with the full kind of software development life life cycle so
00:04:15.760
whether it's coding testing or iterating and I think there's also a larger shift
00:04:20.880
at play that is um hasn't fully been
00:04:26.000
realized yet is that I think llm will be um uh a crucial part of every software
00:04:33.479
stack in the future so the large language models are artificial neural networks with kind of
00:04:40.199
general purpose language understanding and generation the exploding popularity after this attention is all you need
00:04:47.080
2017 paper and they excel at a lot of different tasks so they're really good
00:04:55.039
at converting unstructured to structured data so as Developers we kind of spend a
00:05:00.360
lot of time um turning unstructured data to structured data we love to program when
00:05:07.840
we have access to structured data and and we hate when we have to deal with unstructured data so the large language
00:05:15.320
models can already kind of serve as as that bridge the llms are also really good at
00:05:21.880
summarization so you could give it a document and and have it produce a summary or a block post or a book for
00:05:29.919
that matter um they're really good at classification as well so if you need to
00:05:34.960
classify a list of blog articles for example blog post uh whether it should
00:05:41.000
be bucketed as technology sports or business by topic then llms are really
00:05:47.080
good at that text generation as well and I I I definitely need a much better
00:05:52.600
graphic here and there's still a lot of problems so ver with with building llm
00:05:59.560
applications and working with LMS so there's obviously hin naations and I'm sure everyone here has has heard about
00:06:07.440
this when H H naations are when models are generating incorrect or nons
00:06:14.520
nonsensical text and the llms also aren't
00:06:21.800
continuously trained so they have data up to a certain point so for example GPT
00:06:27.199
4 was trained on data up through April 2023 so any world events uh afterwards
00:06:35.520
it just does not know about this and also in some instances the
00:06:42.080
relevant knowledge is not being leveraged so if you're like I've mentioned an example of building a
00:06:48.800
proprietary Knowledge Management System well gb4 is not just going to have
00:06:54.080
knowledge about your uh all all of your internal corporate documents and to to to address this
00:07:00.560
we'll we'll talk about Rag and what retrieval augmented generation is what a
00:07:06.520
retrieval augmented generation system is so this technique was first kind of
00:07:13.840
exposed in in this 20120 paper called retrieval augmented generation for knowledge intensive uh natural language
00:07:21.520
processing tasks so it's a technique for enhancing the accuracy and reliability
00:07:27.319
of generative AI models with facts fetched from external sources and the
00:07:33.400
workflow goes something like the following so you I'll I'll kind of drive your
00:07:40.440
attention to to the bottom here first that you'll you'll see that a user is
00:07:45.720
interfacing with a system and they would pose a question and ask how do I do X
00:07:53.800
and uh within the rack system we would then take this user question we would
00:08:00.599
generate Vector embeddings from that question we would go to the knowledge base typically a vector search
00:08:07.479
database we would find relevant documents by running a similarity search
00:08:12.680
then we would extract that knowledge and then put it in the prompt
00:08:17.759
that is then sent to the llm and then have the llm kind of generate or
00:08:23.000
synthesize an answer and then the llm would answer in natural language and say to do this you need to do this this this
00:08:30.000
this according to these sources so in order to kind
00:08:37.320
of understand this concept a little bit better we'll we'll jump into what Vector embeddings are what similarity search is
00:08:44.320
and uh what a rack prompt looks like so what are vector embeddings so Vector
00:08:51.080
embeddings are it's a machine learning technique to represent data in an
00:08:57.440
n-dimensional space so uh the it's an array of float numbers
00:09:04.519
of length n right so array of size n and
00:09:11.240
the llms generate these embeddings and they assign these float
00:09:17.480
numbers put it all in in one big array of couple hundred through a couple thousand of size 100 or thousand plus
00:09:27.279
and they by doing that they encode the meaning in an embedding space or
00:09:32.920
also called the latent space so for example if we take a phrase
00:09:40.839
this is the holiday season and put it through an embedding model then we will get the following um um Vector
00:09:50.519
embedding and I'll drive your attention here to The Graphic on the right hand
00:09:56.519
side so you can kind of see so so mentioned this this array
00:10:01.680
contains uh hundreds or or thousands of float numbers right so it's impossible
00:10:08.120
to visualize it but if we were to reduce this concept to a three-dimensional
00:10:13.680
space then or two-dimensional three-dimensional space then then then we can kind of imagine it so you can and
00:10:20.920
this is what was done in this graphic on the right hand side so you can see that different concepts are organized and
00:10:29.600
clustered by their meaning so any sports related phrases or or words are all
00:10:36.399
clustered here on the on the bottom right anything that relates to politics
00:10:41.560
or conflicts is uh on the leftand side and
00:10:46.639
Etc and so when we do semantic search what we're actually doing is we take a
00:10:54.760
query and then we identify where it is is in the in this latent space and we
00:11:02.399
tried to find the closest elements so hence the the term the similarity
00:11:07.760
search yeah so again so so what Vector search is or also synonymous with
00:11:13.240
similarity search or semantic searches finding the closest data points
00:11:20.680
by their inherent meaning and there's kind of this is a technical detail
00:11:26.160
there's kind of different ways of ating this distance so you can use the Manhattan distance the
00:11:33.000
ukian cosign chbf um so let's kind of summarize all of
00:11:41.399
this again um the Iraq pipeline looks
00:11:46.839
like follows the following workflow we uh users question comes in we generate
00:11:53.040
Vector embedding for that user question we then go to typically a vector
00:11:59.760
database um although it could be any other kind of traditional data source we
00:12:04.880
run a similarity search to retrieve relevant documents that are relevant to to that question that the llm will then
00:12:11.600
use to answer a specific question then we construct a rack a rack prompt that
00:12:17.199
looks something like the following and the llm will return a response in the
00:12:22.320
natural language so the prompt here gets
00:12:27.519
converted to a or or text and and is then sent to to
00:12:33.839
the llm uh whether it's gp4 or Claude or
00:12:38.959
one of the Google Google Cloud um llms so we would put the context and
00:12:47.360
these are the relevant documents in here so this gets uh concatenated in we
00:12:54.399
insert the original question some instructions which are basically would would just tell the llm uh something
00:13:02.399
like be succinct when you're answering the question or answer it in a certain
00:13:07.480
style or keep your response to a certain length or follow of a certain format or
00:13:14.800
respond in the Json format and this is the kind of Json structure um schema that we that I
00:13:22.800
expect you to to respond with and the answer colon just signals to VM that
00:13:30.000
its result its response follows from Kieran out so it kind of completes the rest of
00:13:35.560
it so what is what is actually link chainer be um so it's a it's a library for
00:13:42.959
leveraging llms to build rack pipelines like I've I've I've described in Vector
00:13:48.120
search and chat Bots and also do workflow automation with um AI
00:13:54.920
agents and um we aim to lean into the Ruby and rails
00:14:02.639
community so that it has the kind of plug andplay look and feel to it like all the
00:14:09.839
other gems and libraries that we've uh gotten to to to love in in in our in our
00:14:18.120
space so I'll show this example of a rag Pipeline and and then a very
00:14:27.120
simple demo of a AI powered internal management system so please I'll try to
00:14:33.920
full screen it please let me know if you can see it or not can you see
00:14:39.600
it okay and I will just scroll through
00:14:52.880
kind of technical details here so okay A bit big bigger let me
00:15:09.920
try I'm not sure that I can make it bigger can you see it or is
00:15:17.800
it okay and I will I will send um I'll actually send uh a link to to this
00:15:24.320
presentation so um you can kind of replay it if you if you want to I I I
00:15:29.680
apologize I recorded this video so you can see that we're instantiating an
00:15:34.880
instance of a vector search database here kind of creating a default
00:15:40.000
schema and in this instance I have a a local benefits brochure that I have
00:15:46.399
stored on my on my local computer and I'm gonna go ahead and add that to my
00:15:51.920
Vector search database and this will be the knowledge base where data will be retrieved from
00:15:59.560
so I'll skip this importing
00:16:05.279
step and once that has been imported you could see that I can ask um we basically
00:16:11.920
built a sort of a Q&A system and I can I can ask you questions about that specific document that I had uploaded so
00:16:20.560
the first question and I I hope I hope you all can see this but I'm asking what is the
00:16:26.319
company's vacation policy how much time can I take off and it's telling me the answer what are all
00:16:33.360
the benefits offered and it's stating that the benefits offered within that benefits brochure that I uploader are
00:16:40.920
include medical dental vision life short and long-term disabilities Etc what are
00:16:47.240
the parking benefits what are the 401K
00:16:54.720
benefits Etc so this is a very simple example with one document but you can
00:17:00.000
imagine a whole kind of corporate Corpus of knowledge uploaded
00:17:06.240
into the system and I would be able to have a have a Q&A style kind of
00:17:12.880
conversation with uh with the system and it would just fetch relevant data and be able to answer different
00:17:21.120
questions so the another thing that I wanted to kind of cover our agents so
00:17:28.199
the AI agents are autonomous general purpose llm powered
00:17:33.760
programs and they're they can be used to automate different workflows and
00:17:39.600
business processes and execute multi-step tasks they work best with
00:17:45.760
powerful llms and they can also use tools and tools are basically different apis or
00:17:52.919
internal databases or different systems and I'll show you an example of
00:18:00.400
an agent here as well that utilizes an agent that was built in that
00:18:06.960
that's in linkchain RB so first we're just instantiating a
00:18:14.600
weather tool and it's and it's basically a tool that could be used to just fetch current
00:18:23.200
weather we have a ruby code interpreter tool a Google search
00:18:31.400
self-explanatory and a calculator and we're going to use the I
00:18:38.200
mentioned that it works with powerful llms but you can you can
00:18:44.039
use any other API as well you don't you're not tied to using
00:18:51.559
openi and so we're instantiating the agent here we're saying we're calling length chain agent react agent
00:18:58.960
we're passing it the llm that's going to be that is going to be using which is open and we're passing it a collection of tools so it has access to basically
00:19:07.520
the weather API the Google search API it can execute Ruby code and it can also
00:19:13.240
use the calculator and you might be wondering
00:19:20.480
why does it the llm need to have access to the calculators because actually the llms are not amazing
00:19:29.000
not that great at math so for more complex calculations uh utilizing the
00:19:36.159
calculators actually yields much better results and so in this example you could
00:19:43.640
see that I'm asking the agent to find the current weather in Boston
00:19:49.720
Massachusetts Washington DC and take the average
00:19:55.440
so the prompt was sent to the open a
00:20:00.679
llm and then it decided that it first needed to invoke the weather
00:20:07.760
tool and find the weather for Boston
00:20:14.640
Massachusetts so you could see what follows is the tool execution so we make
00:20:19.880
a call on the llms behalf to to that tool to find the weather for for Boston
00:20:26.280
and send the result back and then the next step it decides that
00:20:31.799
it needs to invoke the weather tool and find the weather for Washington
00:20:37.679
DC so on the lm's behalf we invoke that tool we call that tool we call that
00:20:43.640
Ai and find the the current weather for Washington DC and send it back to the
00:20:50.960
llm and then it in it calls a calculator with a very simple
00:20:58.799
equation where it takes the two temperatures and splits it down the middle which was the original
00:21:05.600
prompt and then it responds with all the way at the bottom you can see the the
00:21:11.279
average current temperature in Boston Massachusetts and Washington DC is
00:21:16.760
84.2 dot dot dot so the next the next thing we ask it
00:21:24.080
is to find the current Rubble USD exchange
00:21:33.159
rate sorry I'll go back real quick and it decides that it in order to
00:21:41.080
answer that question it needs to do a Google search so it does a Google search
00:21:47.919
to um it it it decides to call the Google Search tool so and we call the
00:21:54.600
Google Search tool on its behalf with the following kind of query Rubble USD
00:22:00.720
exchange rate we send the result back and then the llm synthesizes that answer
00:22:07.600
in a uh more natural language sounding response and lastly we tell it to use a
00:22:16.200
ruby program to Output the sum of FIB of the Fibonacci sequence for 1 through
00:22:21.240
1,00 and we ask it not to Define any Ruby methods so the prompt is sent to the
00:22:29.720
llm and it decides to invoke the Ruby code interpreter and we execute the Ruby tool
00:22:36.799
on its behalf with this kind of equation with this um oneliner that it came up
00:22:44.039
with and it we send the result back and it tells us that the sum of the Fibonacci sequence is this
00:22:56.000
number Okay so why Ruby um I find that this
00:23:03.679
this this is an an outstanding question that uh comes back just about every year
00:23:10.600
um so why did we built this in Ruby I don't think Ruby's going
00:23:16.440
anywhere um I think in this in this um climate of kind of economic uh
00:23:24.200
contraction or stagnation depending on who you ask Ruby is actually an
00:23:29.480
excellent choice uh I I find that there's a a very healthy amount of
00:23:34.919
pragmatism in the Ruby Community we tend to not reinvent the wheel the the whole notion of a there's a gem for that we
00:23:42.559
love to solve actual business problems and and move quickly and don't reinvent
00:23:50.240
the wheel like the um some of the other ecosystems I think I think generally the
00:23:56.919
the monoliths are kind of back in fashion we've done a
00:24:03.360
project within we've we've done a project for client where we actually
00:24:08.559
Consolidated and rewrote their application from about 15 no JS
00:24:14.400
microservices to a single rails monolith who would have thought we would would
00:24:20.120
would be going all the way all the way back and that that was a successful project and I think and I think a lot a
00:24:26.520
lot more companies are starting to do that because they're looking at their
00:24:32.399
costs how much they're spending on maintaining and managing this type of a
00:24:37.559
complicated microservices stack and in in in a lot of different instances it just doesn't make
00:24:43.679
sense so and and also Ruby is very similar to python in terms of it's also
00:24:49.919
written in C there's a lot of python libraries data
00:24:55.760
science Centric libraries that are um that are just pure wrappers pure
00:25:01.720
python wrappers on top on top of C functions there's a reason why we couldn't have equivalence in Ruby I
00:25:08.600
think we should and python also has a is very similar to
00:25:15.080
Ruby in in terms of kind of has the same problems with with multiprocessing and
00:25:22.960
Etc so I think Ruby's I I think we ought to have
00:25:28.360
a lot of these capabilities in in in Ruby as well and that is
00:25:39.399
it thank
00:25:56.679
you
00:26:13.120
any
00:26:26.600
questions no no no audio unfortunately uh
00:26:37.640
yes so how do we do this without too
00:26:56.600
much
00:27:23.840
Andre can you hear me yes give me a thumbs up because we muted the speaker so we didn't have any Echo okay so you
00:27:30.240
can hear me all right I we can probably use the laptop
00:27:35.520
sound if everything is really quiet we should be fine go ahead and say
00:27:43.679
something good enough for everyone all right we're going to do it this way uh extremely sorry about the whole mic
00:27:49.760
situation um so the question is do we have any
00:27:55.720
questions was uh dealing with hallucinations if you have an agent that does a task you
00:28:04.159
have to be able to trust it how do you deal with that all right so the question is about autom um sorry hallucinations
00:28:11.760
so if you have any anything to add to hallucination but also if you create an agent uh how do you deal with
00:28:18.679
hallucinations in general if you were to automate something or if you had to just deal with the concept in general can you
00:28:24.279
elaborate a little bit for us thanks yeah
00:28:29.440
I'm I'm gonna sorry there's a lot of feedback I can hear myself I think it's I think it's good
00:28:37.080
thank thank you for your question so so I I'm going to repeat the question I think
00:28:42.440
if if you build an agent and you have and the agent hallucinates how do you
00:28:49.320
how do you deal with it so I think there's there's there's there's a couple of things you need
00:28:56.399
to you definitely need to be careful about giving it access
00:29:02.320
to your uh your infrastructure and it it may not be so
00:29:09.399
for example we also we also have an agent that can execute SQL queries and I would
00:29:15.799
say if you have if you if if you don't create a separate role that's maybe just
00:29:23.320
a read role in your database and you have the agent execute alter modify and insert
00:29:31.080
queries it may not be a good idea also before before an agent decides
00:29:37.720
to do so you can try to sanitize for um
00:29:44.240
sanitize some some some for some of the output and try to put some filters so
00:29:50.679
you know you would you would parse the response and and again maybe maybe search for those keywords like replace
00:29:55.840
insert alter and just and just raise an error if um the llm resp responds with
00:30:03.519
that um in terms of inaccurate data I think agents are very agents are
00:30:12.320
relatively reliable when it comes to very narrow tasks so for example I I I I showed a
00:30:21.480
couple things right so I showed a couple examples one of the examples was fetching fetching current weather and
00:30:28.600
then doing something with it and then the other example was fetching the exchange rate
00:30:37.240
um I'm I'm not sure I would have the same agent do maybe I'm not sure I would have the
00:30:43.440
same agent responsible for for those different tasks maybe I would probably create separate agents and kind of keep
00:30:49.440
tailoring and keep putting guard rails around them to basically clo closing
00:30:55.559
closing in on the on the number of different tasks that they're able to do because if you want to build a very
00:31:02.880
general agent that that can do everything you're G to have a bad
00:31:19.080
time I changed something in a what think good
00:31:25.120
words yeah I can hear it
00:31:34.240
you lower the volume over here yes on the left side that's it
00:31:39.760
y all right testing again give me your ABC ABC all right
00:31:47.720
cool I think we've got a working setup it turns out there's a volume on a mic it's not an unmute or mute button it's
00:31:54.399
an actual volume so we can pass the mic around next question
00:32:03.080
uh thanks for the great talk uh I saw in your code that you used like there was a specific agent called the react agent
00:32:10.919
are there many such agents and what's the main difference between them how do we
00:32:16.240
pick yeah so currently So currently there's two
00:32:21.279
agents there's a SQL Retriever and there's a react so the react stands for
00:32:28.639
reasoning and acting and basically kind of the way to think about this is that
00:32:35.559
there's a lot of research being done currently into how these llms could
00:32:43.200
be um structured to empower these agents
00:32:48.960
so like the the the the top people in the field are reading research papers in
00:32:54.880
the field every day and are implementing techniques as as they're as they're
00:33:00.039
being published and the techniques are constantly evolving so
00:33:07.240
um actually the the the functionality I'm not I'm not sure if any of you have
00:33:12.639
played around with it the U Assistance or the gpts in the open what open AI
00:33:19.360
just released about about a month ago you know I I I shouldn't say that it
00:33:26.480
was was Innovative but actually a lot of startups have been have been building
00:33:31.559
things like that for for about six months now so they were they were they
00:33:37.960
were kind of they were kind of they were a little bit behind in terms of in terms of these AI
00:33:43.799
agents um I think I think the the the the next ceration and actually what I'm
00:33:49.840
what I'm kind of trying to do with with link chain RB
00:33:55.320
specifically is make these make these agents com combine the different agent implementations and make
00:34:02.320
them a little bit easier to use so I mentioned the SQL retriever agent is is
00:34:08.040
tailored for retrieving for interfacing with a SQL database and then kind of
00:34:14.520
answering questions on top of it and I think I think it needs to I think it it
00:34:20.240
could be combined with the react agent where of a SQL database is just one of the tools that the the the agent has has
00:34:26.560
access
00:34:34.000
to thanks next one hey there uh yeah thanks again for
00:34:41.399
the great talk um this might not be a super coherent question but I'm I'm uh interested in this use case where you
00:34:47.720
were building a question and answer system for like internal company documentation and then the examples
00:34:53.240
you've talked about it seems like you take all this data that's in these private internal documents and you find
00:34:59.280
a way to like mix it together with all the information that's in this large language model and then create something
00:35:05.440
useful out of that and I was just wondering if you could talk a little bit more specifically maybe in technical detail about like how you analyze the
00:35:11.880
internal private company documents and get that data to play well with what's already in the Deep neural network in
00:35:18.440
the large language model yeah that's a good question
00:35:25.800
so so this is why my my talk was was focused around around
00:35:32.800
this retrieval augmented generation technique a lot of companies are doing
00:35:39.000
this right now I I guarantee that most of the startups that are gen AI generative AI
00:35:45.880
startups are literally doing that they're taking some sort of data source and they're building rag systems on top
00:35:53.200
of it so um
00:36:02.599
um there's so so kind of basically what you're doing so there there's there
00:36:08.960
there's a lot there but I'll I'll go into I I'll go into the first concept
00:36:14.480
which is which is taking that proprietary data taking the company data and putting putting it into into a
00:36:20.599
vector database so basically what you're doing in there is
00:36:28.720
um you're chunking the data you're splitting the data you're you're you're
00:36:34.319
you you have a a corpus of of company data right so it's it's PDFs maybe it's
00:36:40.200
audio files maybe it's video files it's um previous I don't know slack
00:36:46.200
conversations it's product requirements documents it's marketing copy it's sales
00:36:52.400
brochure Etc so you're building this kind of um a knowledge system right with
00:37:00.160
the hope that if someone needs to someone if if if let's say a new
00:37:06.960
employee starts right and they have all these questions about why certain decisions were made or what a on brand
00:37:16.079
marketing copy looks like right or how do you even write the code and what are
00:37:22.200
our coding Styles and why is this why does this exist as a separate microserver
00:37:27.800
so the hope is that you would you would build a system and and and have employees interface with a system and be
00:37:33.319
able to kind of better answer answer that data Maybe not not bug other employees to have these questions
00:37:39.319
answered so the first the first thing you would do is you would you would you would collect this data and you would
00:37:44.440
kind of Chunk this data and put it into into a vector database and so in in that
00:37:53.319
process so I kind of walk through what a what Vector embeddings are right and so
00:38:00.040
basically you're taking all this data and you're encoding it in in these
00:38:06.480
massive arrays of float numbers right and there's I'm not going to go into it and I'm also probably not qualified to
00:38:13.640
to talk about about this but there's these very complicated algorithms of how
00:38:19.680
you would take a text and in in EN code
00:38:25.359
and assign it an array of float numbers that encodes its meaning right so it's
00:38:30.800
an array of size 1,000 something 1500
00:38:36.119
right of float numbers that encodes the meaning and what you're doing is you're taking all that data you're generating
00:38:43.560
Vector embeddings for all that data and you're putting it into a vector database right there's a
00:38:49.359
ton of them uh a ton of vector databases have sprung out this
00:38:54.800
year I think I think we're we're going to see which which ones are here to stay which ones are going to be acquired
00:39:01.960
Etc and then so then what that what that
00:39:08.920
allows you to do is now you can come in with with a question right and and
00:39:18.720
search the vector database for documents that match your question right so if I'm
00:39:25.440
coming in and saying why does user
00:39:31.200
authentication why is the email service a separate Lambda
00:39:36.760
function versus a uh weren't we just subscribed why aren we why don't we just
00:39:43.960
have a subscription to to MailChimp for example right it's gonna it's going to go into the vector s database it's going
00:39:50.280
to try to is going to find all those documents that have not just not just the keywords not just the matching
00:39:56.599
keywords but also semantic similarity right by similarity by meaning is going
00:40:03.480
to fetch all those documents and then there's a bit of kind
00:40:10.040
of it's it's um in some instances it's it's it's more of an and it's more of an
00:40:15.960
art than a science here right so you you would take those those relevant
00:40:21.040
documents you would then construct a prompt and and I've mentioned in in one
00:40:26.079
of slides let me let me go back real quick you would constu construct this
00:40:33.839
rag prompt you would take this context concatenate
00:40:40.280
in and you would take your original question which was why does this service exist as a as a microservice right
00:40:47.560
Etc and you would then have the
00:40:53.200
llm Constructor response right so so so the
00:40:58.359
llm the the knowledge that it has you're you're not combining you're you're not
00:41:04.520
adding knowledge to its training data you're just telling it that I want you
00:41:10.400
to answer this question with this knowledge kind of
00:41:17.280
insight right in in in in context and it's and it's reasoning
00:41:25.240
abilities and it's its reasoning abilities give it the power to to be
00:41:31.160
able to do so right to take your question understand what you're asking understand what the context
00:41:38.240
represents right and then and then give you a coherent answer a a a super lengthy and hopefully
00:41:47.760
helpful answer to your question but great
00:41:53.160
question great answer yeah great I mean just to summarize that is the idea sort of that
00:41:59.720
um you're using the uh the vector space to like cluster your internal documents or classify them rather and then you're
00:42:06.720
getting the llm to sort of figure out to like summarize the information in the in
00:42:11.880
the documents so the llm is used for the summarization information U strategy kind of is that kind of what I'm the one
00:42:19.000
word or one sentence idea I I wouldn't say summarization I would maybe use the word
00:42:25.160
compression yeah so you yeah thank you you would you would compress the meaning into this into this array of float
00:42:32.839
numbers representation thanks Andre we have time
00:42:38.440
I think for a few more questions yeah ahead Paul hi thanks for the talk
00:42:46.319
so uh regarding this is there sorry does this protect from
00:42:52.319
prompt injection and I guess do we care about that
00:42:58.000
so so Len chain RB gives you kind of optionality to use any sort of llm so
00:43:05.200
you can use an off-the-shelf open AI llm you can use a I've mentioned a
00:43:14.440
Google Palm llm Google just came out with with a new one today called Gemini they claimed that it's just as good as
00:43:20.880
gp4 I haven't played around with it yet or you can also use a a local llm so for
00:43:28.920
example there's llama 2 which was open sourced by meta you can you can run that
00:43:35.240
locally if you wanted to um you need to kind of have certain compute
00:43:41.319
requirements to to be able to run that in a performant way but you can
00:43:47.520
absolutely do that and what's going to happen is that these models are going to
00:43:54.559
get more efficient ient they're going to get faster they're going to get really
00:44:00.119
powerful they're going to be open source and they're coming to like every device
00:44:05.520
that you own so it's going to be running locally on your phone it's going to be
00:44:11.200
it's going to be small and fast and it's going to be running on your laptop so that's that's that's
00:44:17.280
definitely going to happen um but but but but the best the best open
00:44:24.599
source models are still a little bit behind proprietary ones so when when
00:44:30.040
when the open source models are just as powerful you basically don't have to you don't have to worry about sending your
00:44:36.400
financial statements right to open AI accidentally
00:44:41.960
so and obviously Vector search databases right you can either pay for a managed
00:44:47.960
service or just like any other database you can you can kind of provision yourself right um your question about
00:44:57.319
um prompt injection I prompt
00:45:02.680
injections kind of you know kind of similar to how you would sanitize there there was a question earlier about
00:45:09.599
making sure that the agent doesn't go Rogue on you there's some tricks that
00:45:15.640
you could do with the to to to to protect your system from prompt injections as well but it's not it's not
00:45:23.040
it's not bulletproof I mean people have people have been able to get the get the original prompts for the GitHub co-pilot
00:45:32.599
and the suite of Microsoft Microsoft products that use llms so it's
00:45:41.119
not I don't I don't think it's it's currently a solved solved problem but yeah really really good
00:45:51.280
question thank you don't be
00:45:58.640
shy oh well maybe one last
00:46:05.559
one hi thank you for your talk um the question is um on python we have access
00:46:12.400
to hogging face how could it be possible to use hogging face um repositories with
00:46:18.359
um um Lang chin RB is it something that's going to be added in future or
00:46:24.559
you're seeing it to be added in future yeah so you can so you can use some of
00:46:30.760
the hugging face end points right now some of some of the apis but there
00:46:38.960
is something missing which is like the the the sentence Transformers
00:46:48.200
and I I was actually kind of looking into this the other day and I think I think we can I think we can we can build
00:46:54.880
it in in r Ruby so I don't I I don't want to promise anything but I think I
00:47:00.359
think it's I think it's doable that'd be really nice I mean you can download the hugging face models if
00:47:07.240
you have them locally you still need L chain to do some form transformation
00:47:12.599
between the local model and the L chain Library so for the local models I
00:47:19.680
actually I actually love a tool called AMA I think I think the website is ol
00:47:27.599
do.ai and you can you can just run you could just download any kind of Open
00:47:32.760
Source model and you can run it locally and it exposes an API endpoint and then
00:47:38.280
and then you just treat it like an like an API endpoint running locally at a certain host and
00:47:47.119
port and that if it was downloaded using L Lama or or anything else it would work
00:47:52.800
out of the box with L chain or you still need yeah yes yeah there you go just
00:47:58.640
download it Y and and and actually I'll I'll mention
00:48:04.920
so there's um there's a guy uh Andrew Kane I'm I'm I'm sure you guys have used
00:48:12.520
his libraries a lot I mean he's he's he's a tit Titan in the Ruby space
00:48:20.000
and I was yeah I was I was playing around with his uh torch. RB which was
00:48:25.760
supposed to be a u tensor I'm sorry a pie torch
00:48:30.880
Port so I played around with it a little bit and it's it's really powerful
00:48:37.480
actually so I think there's I think there's a lot of kind of libraries that we can build on top of
00:48:44.559
it definitely uh you can look it up on GitHub and can uh he's the Creator and
00:48:51.119
maintainer of definitely at least one gem that you're using in your uh J file
00:48:58.040
that's you had a question yep yeah I I was wondering if he could expand on his
00:49:03.280
vision of Ruby and AI I guess because you you mentioned it in the passing that
00:49:09.280
there was a similarity and that there was no reason for Ruby to not become kind of a big thing or AI to become a
00:49:15.559
big thing in Ruby and I was kind of curious as to where you see that going did you hear that
00:49:23.200
correctly would would you mind repeating that from please absolutely no problem I
00:49:28.280
so you mentioned before uh the connection between Ruby and Ai and there's no reason why Ruby just like
00:49:35.839
python wouldn't become the next big thing in Ai and using Ruby in AI would become sort of like a default for the
00:49:41.599
rubyist instead of defaulting to uh well let shoose python instead um do you have
00:49:47.680
any insights or anything about the the the that movement yeah it's look it's definitely
00:49:54.839
an uphill back battle python just happened to be in the right place at the right time with the right libraries so
00:50:02.200
there's there's a couple of those kind of data frame
00:50:08.839
libraries that were utilized and and happen to penetrate into the Academia
00:50:15.440
early on so the the pandas and some and numpy that helps you kind
00:50:22.599
of manage data and and change formats and and do quick operations on top of it
00:50:29.440
again the the way the way I see it so I I think we already have
00:50:34.640
equivalents basically Andrew Kane has has has built a lot of them people just
00:50:39.799
don't use them but there's no reason why we couldn't make them to be just as good as the
00:50:47.559
python equivalents again it's it's a very similar language the syntax is
00:50:53.280
similar you know it's just as slow as Ruby right if you think Ruby is
00:51:00.680
slow and and um obviously there's a there's a critical mass there's a
00:51:06.799
massive cohort of data science ml experts that prefer to use Ruby and
00:51:14.599
maybe Ruby never gets to be as as hardcore in that field
00:51:21.160
but we love the the whole plug-and-play experience so if we can if we can
00:51:27.680
achieve 80% of the outcomes with 20% of the effort then I I think a lot of Rubi
00:51:35.760
will will be making that tradeoff right so basically right now you have you have
00:51:40.920
a large Ruby stack or you you you have a ruby stack let's say you have a ruby
00:51:47.000
monolith and you would like to add some of these capabilities to your products and now you're basically trying to
00:51:52.599
decide whether I'm going to spin up a separate p on Lambda or microservice to
00:51:57.960
add these capabilities um and I think the the selling point for using
00:52:06.440
something like linkchain orb is that maybe you could just add a gem file and get 80% of her
00:52:15.280
results true dad I it's 8 o' Andre are you call for
00:52:21.280
one last question yeah absolutely definitely let's keep going then yeah thank you very much for the presentation
00:52:27.880
and yeah you just mentioned that Ruby is not yet like um implemented the parts
00:52:33.000
for um the hogen facee uh sentence Transformer Parts since I I believe a
00:52:38.920
sentence Transformer is quite important for Vector database so I'm curious if this actually uh affects your
00:52:45.839
developments uh in like uh rag or uh
00:52:52.280
system so it it does it does and which
00:52:58.920
is which is why i' I've kind of started started to look into
00:53:04.240
this and there's there's there's obviously a
00:53:09.440
lot there's a lot there's a lot to do there and there there's a lot to do there and
00:53:16.960
and I I think I'm I'm also trying to be I'm also trying to be pragmatic so the
00:53:24.240
more issues I care about are the things that I'm that I'm that I'm prioritizing
00:53:29.440
right but given given how given how slow
00:53:34.559
kind of a adoption still is I think on the Ruby side I think there's I think there's still some time I would actually
00:53:41.640
I would actually love to see many more AI
00:53:47.240
ml flavored talks at Ruby meetups and and
00:53:52.880
conferences like we just I I don't know if any of you have have gone I didn't I
00:53:58.640
didn't go this here there was there was just a Rubicon in in San Diego I don't think they had any AI talks
00:54:07.240
and so we happen to have uh one of the key uh well at least um a prize winner
00:54:14.880
of and a presenter but last year anyway the yeah there was one talk about AI
00:54:21.960
well maybe one was it the workshop yes the workshop yeah the workshop
00:54:28.760
so I just I I personally kind of feel that it it I I I I feel like we're it's a little
00:54:36.559
bit blindsided to not include any AI talks this here
00:54:44.440
right definitely and again given given given the given the hype and the
00:54:49.720
hyera in the in the AI field that's been that that's been unfolding this year um
00:54:55.520
you should take a bus ticket to come to Montreal because it's a very easy segue
00:55:00.599
into next month's Meetup of montreal. RB that focuses on Ruby and AI again with
00:55:08.160
JS presenting this time in the medical field
00:55:13.319
field you're not that far any other
00:55:20.960
questions mic is I'll I'll yeah maybe maybe I can ask a question go for it I'm
00:55:28.000
just curious I'm I'm just curious if anyone has experience building building applications with
00:55:35.599
lolms and and what your experience has
00:55:44.720
been yeah I'm kind of um kind of have some sort of experience because I'm
00:55:51.160
currently working as a AI research inter in a gam company so so yeah my my uh
00:55:56.599
current work is primarily working for a large training a large language model for like in-game characters so it's kind
00:56:03.960
of some experimental functionalities for a lot of like games so yeah this is uh
00:56:10.720
some sort of things that's uh similar to uh the things that you've uh you've done
00:56:16.079
here yeah so yeah we we've also like um
00:56:21.480
implemented some um our version of uh RX system and as well as some classifier to
00:56:28.559
help model to like um like uh classify the questions or something
00:56:36.319
yeah Goa thank you thank you for sharing anyone
00:56:43.520
else you should ask next month maybe yeah that's a good uh it's a good
00:56:50.920
point someone in the audience said we should ask again next month's because how many of you are going to go
00:56:58.039
and try Lang chain when they get back home show a hand yeah you'll have a lot more folks
00:57:04.720
uh with a lot more insights uh next month awesome please please if if if if
00:57:11.680
there's any pains you're having or any questions please ping me tag me DM me
00:57:17.079
Etc you want to show your last slide again with your contact details in the Discord uh and in the meantime big round
00:57:23.400
of applause for Andre thank you thank you thank
00:57:33.319
you thank you so much thank you so much thank you for having me thank you I'll
00:57:39.520
I'll I'll I'll I'll do my best to drop in next month thank you cheers thank you
00:57:44.760
all right yep see you bye
00:57:51.880
bye