00:00:13.639
how's everyone doing this morning awesome um super excited to be
00:00:18.960
here uh thank you for uh having me uh this is an amazing conference uh I I I
00:00:25.320
really do enjoy Boulder is my second time here um so today I'm gonna talk talk about building AI agents in Ruby um
00:00:33.440
so how many of you work with uh llms uh and how many of you um uh have
00:00:42.000
built an application uh that uses an LM before okay a few maybe 10 uh how many
00:00:48.640
of you can explain what what rag is okay uh less so like three
00:00:55.719
people all right um so um my in my day-to-day on a small uh software
00:01:01.160
development Consulting agency where we help uh uh VC backed uh startups and uh
00:01:07.840
size Enterprises uh build custom software um the rest of my time I spent
00:01:13.439
uh developing uh open source uh Solutions and uh kind of doing um applied um AI work and trying to figure
00:01:21.000
out what are the best ways to implement some of these um newer capabilities that
00:01:26.040
we've been seeing in the last couple years into our stacks um so um I've been able to collect some
00:01:34.040
of these stickers uh throughout and and uh been fortunate enough to help some of these companies um who remembers spree
00:01:41.479
um does anyone still run a a spree store or or solidos what an what an iconic U library
00:01:49.799
um and so I think this summarizes the uh gen impact pretty well um what used to
00:01:56.399
take um half a year to to build out um now can just be done uh within uh a
00:02:03.600
couple days or a couple of weeks um and so some of these most
00:02:09.720
common uh machine learning tasks um such as uh taking unstructured data and
00:02:15.920
converting it to a structure format or summarizing large text or um classifying
00:02:21.599
documents uh translation content generation named entity recognition um
00:02:27.879
are just an API Fall Away um so as these uh capabilities uh
00:02:36.360
have significantly dropped in price um there's a long tail of uh medium to
00:02:42.080
small-sized companies that are implementing these capabilities um so now uh some of these AI features are no
00:02:49.239
longer reserved to the Fang companies um and so I'd like to propose
00:02:55.560
um a view that um we're going to have this intelligent compute um engine
00:03:02.760
inside of every single stack um just like every single stack Tech stack now has databases and cache and encryption
00:03:10.360
and cues and Lambda function storage um and it's uh supports different
00:03:16.200
protocols um you're going to have um an llm uh embedded uh into your Tech stack
00:03:24.200
uh accomplishing different tasks um and really I think I think we're going to have ai AG actually uh embedded into
00:03:32.640
every single Enterprise and and chipping away a different tasks and workflows of
00:03:38.239
um every single department so of course this is not the
00:03:43.480
first time that we're talking about AI agents uh so in the 1950s when um Allan
00:03:49.280
Turing had published his famous um uh paper he introduced the the concept of
00:03:55.599
intelligent machines it is the same paper uh that he introduced the the turn test and then in the 70s and 80s uh the
00:04:04.239
community researched uh and worked on Expert systems and software agents and you might remember the kind of the
00:04:10.360
original chat Bots with um Alexa and Siri and we're we're taking another stab
00:04:16.959
at it now so U every um major tech company is
00:04:22.400
presenting their Vision around this um so for example Google has um its own
00:04:27.919
custom uh AI agent Builder uh that's ironically uh called uh a gem manager you can build
00:04:35.160
gems Ruby's making a comeback um so what is an AI agent well
00:04:42.479
the textbook definition is that it is an autonomous system uh that's capable of
00:04:48.720
perceiving its environment making decisions and taking actions um to
00:04:55.400
achieve specific goals so environment awareness uh decision-making and uh
00:05:01.039
action taking so these two terms are U used
00:05:06.880
interchangeably um and I certainly use them interchangeably as well um I draw
00:05:12.039
this kind of minor distinction that U the assistant is is is more of a conversational so um it's a multi-turn
00:05:19.759
conversation with a human or an Ever system um and agent is more of um um an
00:05:27.720
autonomous system kind of like a background job you give it some tasks and it goes off
00:05:32.800
running um so what are some of the use cases well of of course uh the number
00:05:39.120
one use case is broadly automating business processes um so and and and I
00:05:45.520
think we can tackle low uh low IQ tasks uh we can augment workers with personal
00:05:52.560
assistance uh we can also handle uh time consuming tasks so things that humans
00:05:58.720
just can't do like summarizing uh like writing up an executive summary for a
00:06:04.560
200 Page document within within a matter of a couple seconds uh or moderating a
00:06:09.960
million images within a couple hours and so for example in in in my
00:06:15.960
small Consulting business we could be uh creating invoices from from time sheets categorizing business
00:06:22.360
expenses uh writing proposals so remixing our service offerings with the
00:06:28.400
uh the notes that have been taken the client conversations and and writing up uh proposals um writing job descriptions
00:06:36.960
uh jur tickets um and so um when you
00:06:42.000
think about um building an AI agent uh these are kind of some of the components to uh think about so so there's a
00:06:50.000
reasoning and planning module which is uh provided by the llm um we need to um
00:06:59.160
in our tasks goals objectives workflows uh to the AI agent so so it has a purpose
00:07:06.800
and um we need a way to run the agent uh so that's the con concept of triggers
00:07:12.560
and we'll look at uh some of these um in a sec um so when it comes to
00:07:19.440
reasoning and and and and planning um when we give our agent um tasks and
00:07:26.280
goals it needs to be able to make U decision based on that information it needs to be able to formulate a plan um
00:07:34.360
and then the for example the plan reflection is is the concept where it's able to recursively revisit its plan and
00:07:41.199
and make sure that it's um it's still valid so whenever people talk about um
00:07:47.800
reasoning and planning they bring up a Chain of Thought um and really all all that Chain of Thought is is is is just
00:07:54.800
it's forcing the AI to explain its reasoning um
00:08:00.159
so similar to how humans when we're um thinking out thinking out loud through
00:08:06.520
the problem we're much more likely to arrive uh to a better answer so for
00:08:13.400
example um in this uh in this example if if we
00:08:20.240
ask us kind of random uh brain cruncher um and just just force it to give an
00:08:26.840
answer um then it's much less likely to give us um a good answer than um
00:08:33.039
spending some time uh thinking how to approach this this this question it's so
00:08:39.760
it's much more likely to get to a better answer this way um and when I talk about uh business
00:08:48.560
logic so I'm talking about any kind of business related tasks goals um
00:08:56.080
objectives workflows standard operating procedures so so um I'm leading up to uh
00:09:02.279
to a demo that I'm going to do in a bit but um you can imagine if we run an uh
00:09:08.720
e-commerce store for example um we have procedures for um processing new orders
00:09:16.720
or processing uh returns that will enumerate all the different steps with
00:09:24.519
some logic um within those steps
00:09:30.120
so when it comes to triggers we need to uh we need to be able to um run uh the
00:09:35.760
agent so kind of very similar to like sidekick um right it either monitors um
00:09:43.399
State changes uh it runs on a schedule like a cron tab uh or it's event driven
00:09:50.440
web hook um or you like
00:09:56.839
that um or you can manually run it um and and um AI agents uh well
00:10:06.920
really uh large language models are are are stateless uh so we need we need some
00:10:12.320
sort of mechanism to to save progress uh save the context uh save the tool
00:10:19.600
calling output to to memory and uh retrieve it later and whenever people talk about
00:10:26.600
memory they talk about rag um I'm not going to um spend too much time explaining what
00:10:31.920
rag is but but and and I think we've largely over complicated what it what it is uh all throughout kind of 2023 and um
00:10:40.440
partially this year as well all that it is is that you take relevant context and
00:10:46.959
you put it in the prompt that's it um you don't necessarily need to use uh a
00:10:52.040
vector database and generate embeddings and all that um that that's the implementation detail it's it's just
00:10:58.639
that you're injecting relevant uh context into into the prompts of that
00:11:04.040
model can can can use that information so it can be your proprietary data
00:11:10.399
Etc um so tool calling and function calling are are also
00:11:16.480
synonymous um and you use them to produce structured outputs so um having
00:11:24.480
an uh having the the model uh generate um response in a predefined Json schema
00:11:32.720
or when you uh needed to the uh intent detection utilize
00:11:39.959
external tools so think apis um and and so you would use tool
00:11:46.320
calling to uh get data from external sources because um your for example those
00:11:53.360
external sources are are proprietary and and and and these foundational models uh
00:11:58.880
we're not trained on your specific data um uh the models are trained in a
00:12:04.360
snapshot in time so uh it has no concept of of of time and and and certainly
00:12:09.680
doesn't um it's unable to access real-time data um when you needed to
00:12:15.959
take actions um or execute deterministic tasks so um there's a really good
00:12:23.079
article kind of um um called uh math is Heart by uh Gary Marcus
00:12:30.600
um basically showing that um these these
00:12:35.639
uh llms don't really know how to do how to do math um and it doesn't really make
00:12:41.920
sense to use them for those tasks um and so for example if if if I
00:12:49.000
ask it to um sum up two uh really large numbers it looks like the correct result
00:12:56.720
here at the bottom but but it's not uh but this this this problem has been
00:13:01.800
solved I mean uh we can we can use a tool such as the uh the code interpreter in in this case and it just produces a
00:13:08.760
program um that runs in a sandbox and uh so that's the tool um and provides the
00:13:16.000
the response back um and so really the implementation
00:13:21.160
looks uh like the following so um again I'm uh leading us to a uh the e-commerce
00:13:28.000
related demo I'm going to do so so imagine we um so the function definition
00:13:33.160
is in the open API is spec um and we can
00:13:39.440
declare it uh um different functions so in this case we have a Inventory
00:13:45.199
management um class that has a function called find product um and it accepts a
00:13:51.720
SK and whenever within the conversation there's a message um that is relevant
00:13:58.480
the uh model uh the uh intent detection chooses to invoke um to invoke this
00:14:05.240
function so we um extract the function name extract the arguments uh send it to
00:14:11.240
to that object with that um uh that method name and
00:14:16.320
arguments um and so it looks kind of like this uh so we need a way to inject
00:14:22.639
uh the instructions to the business logic to the AI agent and we utilize a LMS uh to reason impl
00:14:30.839
uh store and retrieve from memory take actions via tools um if it's
00:14:36.480
conversational converse with with the user and then uh we need some sort of a way to uh trigger the
00:14:42.959
agent um and so um I I wanted to kind of briefly introduce a library that I've
00:14:48.800
been uh working on called linkchain RB um and it's a it's a way for uh building
00:14:54.880
LM powerered applications in Ruby um it's uh
00:14:59.920
it's a living organism it's evolving as kind of my conceptualization changes and
00:15:06.000
and and the abstractions and and um what I really think would bring the
00:15:13.560
value as well um so um we have uh so
00:15:19.440
there's a couple of things that it provides it's uh it provides a unified interface into different kind of LMS so
00:15:25.120
you can um test out um a number of them really quickly swap them in and out you can build AI agents and
00:15:31.680
there's also a pipeline to uh to do rag as well um
00:15:38.360
demo so um imagine we have this uh fictional
00:15:43.880
e-commerce store uh called uh nerds and threads uh that sells comfortable nerdy t-shirts for software Engineers that
00:15:51.040
work from home so who would shop there um and an e-commerce store
00:15:59.120
um has the following Services uh kind of self-explanatory what they what they do
00:16:04.399
right and you can you can imagine these are different service objects with with functions um and we're going to put an
00:16:10.959
AI agent there to uh facilit to run the store to facilitate the execution to run
00:16:17.160
our um standard operating procedures um so this is how I I think
00:16:25.680
about this um so the Ruby on Rails promise has always been has always been developers focus on writing business
00:16:32.279
logic and not the plumbing right so as I think as rails developers we string
00:16:39.120
Services together right and it's and it's uh um and we focus on on on these
00:16:45.720
business use cases first and foremost um and previously you you'd
00:16:51.519
have business logic and in in in models uh or service objects or controllers if you're bad um
00:17:00.560
and and and now I think some of this logic is going to start shifting to to
00:17:05.880
prompts and really AI agents um so we're going to set up this
00:17:13.000
uh uh assistant class uh that we're going to inject the instructions on the
00:17:18.679
left hand side um we're going to use an LM uh for
00:17:24.160
reasoning and planning and we're going to utilize these uh services
00:17:30.640
um so this is what one of one of the classes um looks like
00:17:37.760
um I'm not sure I can make it bigger um but it it is a customer management class
00:17:43.799
and we have two functions uh create customer that takes in the name and email and uh find customer that takes
00:17:50.320
takes an email pulls it from from a SQL light database um and we're extending it
00:17:55.880
with a uh linkchain tool definition module that provides that Divine function DSL
00:18:02.120
that basically converts uh this definition into an uh open uh open API
00:18:08.120
spec um I mentioned earlier okay so I think this where we go
00:18:13.760
to the demo um okay so we have this uh nerds and threats um AI agent and we're
00:18:20.679
g to uh pass these
00:18:27.039
instructions and so um you're an AI that uh runs an
00:18:32.679
e-commerce store called nerds and threads that sells comfy nerdy t-shirts for software Engineers that work for home you have access to the shipping
00:18:38.520
service inventory service order management Etc uh you're only responsible for processing new orders
00:18:45.240
refuse all of our workflows um and we write out the steps
00:18:50.480
for processing new orders so um imagine we have a uh Point
00:18:57.960
of Sales system that sends the following events uh there's a new order event with
00:19:03.280
the customer email quantity the skew of the item uh and the and the
00:19:15.559
address um and uh the first thing we do is we try to find the customer so we call the
00:19:22.480
uh customer management Tool uh customer is not found we try to create the
00:19:27.559
customer uh uh success customer ID we have a customer ID we find the product
00:19:34.039
uh we retrieve the product we get back the skew the price and uh there's 10 items uh in
00:19:40.640
stock uh we charge the customer which is uh five items at the price of
00:19:46.559
$24.99 uh we create the order record um and because the address is in
00:19:54.000
the US uh we use FedEx
00:19:59.240
FedEx and uh we and and this is a copy of the email that uh gets sent out um
00:20:08.960
and so um and I was actually uh slowing down the execution so uh in between kind
00:20:16.240
of every turn there there was a sleep statement um so now let's say oh sorry
00:20:23.240
um so now let's say um a return order um event comes in with uh that customer
00:20:30.720
email um and the order ID I changed my mind I want to return it
00:20:37.039
um and the system takes in the return order event
00:20:43.559
and uh refuses to uh process the return um so we're going to try to see
00:20:50.360
if we can uh Implement that logic really quickly uh so we
00:20:59.799
append our return order steps uh which say uh return order step-by-step
00:21:06.159
procedures follow them in this exact sequential order uh look up the order calculate the total amount uh refund the
00:21:13.120
payment Mark the order as refunded um and actually here we're also going to say that you're only responsible for
00:21:19.600
processing new orders and uh returning orders so if we take this uh return
00:21:28.120
order event as well let's see if it is able to
00:21:42.480
order refund the
00:21:47.760
customer yep successfully return
00:21:52.840
um and oh so this was recorded in case I botched the live one
00:22:01.039
um and another thing we can do um which I think is a really good uh use
00:22:07.080
cases is is like text textas SQL um so
00:22:12.120
for example we can uh we're utilizing uh the internal uh tool in in uh in link
00:22:19.039
cherb which is a database uh Tool uh which can uh execute queries and and um
00:22:26.720
um um output the the the schema for your uh database connection string so for
00:22:33.279
example we say describe the
00:22:41.440
database and it should utilize the internal uh dump schema
00:22:48.559
tool um and now we can say
00:22:54.240
uh uh what is it
00:22:59.480
oh drop yeah well of course of course of course um if if you're going to be doing this you should create a dedicated a
00:23:07.240
dedicated database role that has permissions and access and you should sanitize the output so you don't
00:23:13.880
accidentally drop alter
00:23:21.360
Etc right right um and now we can say uh what it is uh let's say say how
00:23:31.760
many what see how
00:23:37.120
much what was I going to say
00:23:47.480
oh oh on
00:23:55.760
orders okay finds a customer
00:24:08.720
H total spent okay oh because I returned it yes
00:24:15.400
you're right you're right okay you're paying
00:24:20.799
attention that was a test so
00:24:26.799
um why would you use this um so of course of course it's of course it's
00:24:32.399
far-fetched um and and it seems like e-commerce has been has been solved for
00:24:37.760
a very long time we haven't seen uh all that much Innovation um and um you can change the
00:24:46.080
requirements on the fly so um you know a CEO comes in and says well this is our
00:24:51.960
10e anniversary of our store and to all of our loyal customers we want to offer 10% discount on on the ERS
00:24:59.600
today but how do you define a loyal customer well you know they need to have spent you know $100 or more in uh in the
00:25:06.720
last six months great let's let's type it out let's test it let's ship it right because the product team is going to
00:25:11.760
tell you well not in this brand uh and we have to do uh scaled agile and um we
00:25:19.480
haven't planned it out yet Etc you can inject in intelligence in in into into
00:25:25.039
your process um and you can tackle comp complex workflows as
00:25:31.600
well um so of course you need to you need to be able to evaluate this and and
00:25:37.200
you need to um just bombard it with inputs and and and outputs and and see
00:25:44.600
how your AI agent performs you can ask you can ask itself uh to evaluate how it
00:25:52.600
did according to certain criteria so um kind of recursively
00:25:58.520
uh test itself um there's plenty of benchmarks on the uh on the hugging phase but of
00:26:04.840
course for for your specific use cases you should be creating these data sets on your own um and if if the um agent
00:26:15.159
reliability is not to your satisfaction then you should be reducing the number of responsibilities or tasks or kind of
00:26:22.799
the the uh shrinking the decision tree that it operates um on
00:26:29.559
um and and of course you might say well Andre these things hallucina and just really unstable and and this is kind of
00:26:37.320
the way I I I've started thinking about this so so modern software Still Still fails and it fails because because of
00:26:44.919
the dependencies that we can't control so maybe we can draw a parallel and say that well AI systems can can fail
00:26:52.399
because of inaccurate or incomplete data or bias in the data um
00:26:59.279
this is a rubby conference so U modern software can't scale because it modern
00:27:04.919
software fails because it it can't scale well AI systems also uh currently struggle from from uh the massive uh
00:27:13.559
compute needs there's Cloud outages that affects uh both sides there's cyber
00:27:19.600
attacks on on on on one side and and adversarial attacks on the other side um
00:27:25.799
we don't test our software enough um and you could kind of draw a parallel
00:27:31.480
to to this blackbox behavior um and and and of course there's unclear still unclear liability
00:27:39.039
and accountability in terms of the the decisions that these AI agents makes
00:27:44.360
who's responsible for it but I think these these These are engineering problems that will be
00:27:52.640
addressed um so why am I doing all of this in Ruby well I love the language I
00:27:57.960
think it's I think it's very elegant to express your ideas um and it lets me
00:28:04.080
explore different different concepts very quickly without again without thinking of of uh semicolons and and all
00:28:11.600
this uh implementation um and I I I actually
00:28:16.720
think in terms of um uh DS ml uh AI capabilities we're not
00:28:25.320
that far from python where we we have have all these amazing libraries it but
00:28:30.640
they exist in the snapshot in time and and of course companies need to invest in in maintaining them but I think I
00:28:37.120
think the main problem is is the problem of perception so you're not going to lose you're not going to lose your job
00:28:43.200
uh on a uh AI project because you because you picked python even even if the project fails but you might you
00:28:49.799
might just lose your job if you picked Ruby because it's a very contrarian decision so we um we looked at the
00:28:58.760
uh gen uh wave uh some of the uh task
00:29:04.440
tasks that it offers um we looked at how we can package up and and and think
00:29:10.480
about building AI agents um I demoed this e-commerce example where it
00:29:16.320
connects to different services that you can find in any e-commerce floore out there and how we can get the AI agent to
00:29:23.120
to run it um and I expressed my uh love to Ruby at the end and and um um and
00:29:40.600
you yeah was a very good question I think I think training is going to be oh um so your question was um as
00:29:48.559
you're as you're iterating on a on a on the task um when do you stop prompt
00:29:54.880
engineering and when do you start thinking about training custom models um so I think I think the the the
00:30:02.480
beauty of all this is that um you don't have to train custom models in in terms
00:30:08.279
of I think I think training custom models is is is going to be reserved to still the the the largest companies out
00:30:15.120
there kind of the the the largest Roi on the line um and and uh the the the intention
00:30:23.760
is that these these foundational models are going to do you know are going to be able to tackle 90% of the of the of the
00:30:30.039
use cases but in and and certainly the
00:30:35.399
uh the approach should be take the most capable commercial model out there uh
00:30:41.880
prove out that your concept works and and then scale down to maybe smaller models or open source models you know
00:30:48.840
once off of off of hugging face did I uh answer your question
00:30:55.559
yeah um well I I'm going to I'm I'm going to compare it to python right so
00:31:01.159
very similar to python written on top of C same kind of concurrency story
00:31:06.720
um and uh a lot of times these these python libraries break into the SE bindings right and and I'm sure everyone
00:31:14.679
is familiar with Andrew Kane's work and he builts a lot of kind of equivalents
00:31:19.760
in the Ruby World um and I just I just I just find
00:31:25.639
that we need to we need to invest in in in maintaining those libraries so
00:31:32.200
um you know I mean if if you're if you're so so the beauty of this is is
00:31:39.200
that um there there's this kind of concept uh there's title emerging of an
00:31:45.639
AI engineer right and and it's it's basically full stack Engineers that are
00:31:52.399
much more familiar with some of these data science um ml AI Concepts right but
00:31:59.279
we're not necessarily going to be uh we're not necessarily going to be training models but we're going to be
00:32:06.279
um uh we're going to be doing applied AI yeah anyone
00:32:13.120
else yeah that's a very good question um sorry so I'm going to try to repeat it
00:32:18.880
um so in the demo that I did I wrote out specific constraints for the execution
00:32:26.320
flow like uh one of them was if if the address is in the US use uh FedEx if if
00:32:32.399
it's outside use uh DHL um so how do we make sure that these
00:32:38.720
constraints are respected and um uh how
00:32:45.039
do we have the guard rails uh to make sure that um uh it's safe to to release
00:32:51.360
into the wild right yeah um so to be frank I'm I'm still trying to figure out
00:32:57.679
like the best way to package up these these uh these workflows
00:33:03.919
into these workflows um into these systems so so there's a lot of people that are looking
00:33:10.440
at uh graph databases and kind of representing workflows as as uh as a
00:33:15.679
tree and and really just utilizing the llm like for for making decisions right
00:33:22.559
so if the if the if the node is is generated shipping label right there's a
00:33:28.120
decision to be made that's a natural language decision whether he addresses
00:33:33.240
in the US or uh in the in Europe so use uh FedEx or use use DHL um the the the
00:33:41.279
beauty of of of using an llm at at this step is that um it's able to process
00:33:49.399
like um an infinite number of different permutations right so instead of like
00:33:55.639
writing a really long El if el else or Rex right that instead of enumerating
00:34:02.919
every single possible case that you could think of um in in uh instances
00:34:08.839
where the number of permutations is nearly infinite right the llm can take its best guess right um but certainly
00:34:19.399
certainly um certainly I would put some structure around like inputs and outputs
00:34:24.679
right and and uh uh these foundational model companies are introducing all
00:34:30.159
these Concepts to try to put some more rigid rigidity around this uh inference
00:34:38.560
uh like tokens generated so you know the previously like the way you would generate uh Json is you would uh
00:34:45.879
actually write out in the prompt and say uh please generate generate the Json
00:34:50.960
that aderes to the schema and it's like well is it is it is is it going to put it here or here or in the middle and are
00:34:57.240
we going to back tick like and now and now it's um and now that support for
00:35:02.599
that functionality is like a first class citizen right um and as they do more of
00:35:08.720
this it it it it will gel much um much better with our systems or kind of
00:35:15.160
traditional deterministic systems yeah so um in this uh on this
00:35:23.839
slide yeah on the on the slide um and I I hope
00:35:30.839
it's big enough you could see um and certainly in the demo I showed that um we're we're uh describing its its Its
00:35:38.079
Behavior right so it's telling it we're we're telling it like what what its purpose is and uh that it it is an AI
00:35:45.400
agent that runs an e-commerce store it connects to all these Services Etc um
00:35:51.800
there's another thing here where um we have a way to connect uh like external
00:35:58.599
data sources uh as as one of the tools for uh
00:36:04.480
prompt uh for for for basically rag right so you can give it access to let's
00:36:12.599
say um an external data source and say um the rules of this game right which
00:36:18.720
which you know whatever like uh maybe it's a rule book that's that's that's
00:36:24.160
kind of thix so we're we don't want to put the uh the full thing in there but we want to let it know that it's able to
00:36:33.240
look up the rules right and then and then we can write it write out in the steps and say um and say always look up
00:36:40.920
the rules when you encounter a specific scenario right so it will call into
00:36:46.400
about service right and we do some sort of full teex search or again Vector
00:36:52.240
surge to look up the rules based on based on that query and it will inject those rules and to into the prompt and
00:36:59.079
and then that's was kind of called in in context learning which is uh when the AI
00:37:05.920
like uses this new information learns this new information um within the the
00:37:11.960
context yeah does that answer your question if we have an existing
00:37:18.560
e-commerce system and we take this beautiful system and try to put it back into it are you finding that you're
00:37:24.800
having to like how much change has to happen to that existing system to make
00:37:29.960
this work particularly if you don't have kind of a data set to double check and
00:37:35.079
back test before you start working yeah well you yeah I mean you should definitely be developing your evaluation
00:37:41.160
data set but um just like I uh kind of ironically uh
00:37:49.280
phrased it uh in this slide um you take the you take the original jur tickets uh
00:37:55.200
that say as a as a user I would like do I would like to be able to um complete a
00:38:01.240
new order and these are the steps required um so you would you would take that kind of reshape it right and and
00:38:08.319
and uh try to extract some of some of his business logic um and and hand it
00:38:13.880
off to to an AI agent yeah but but the you know there's there's an uh there's
00:38:20.400
kind of an art to it which is like knowing the limitations like understanding the the the ripe use cas
00:38:28.440
cases um slice and slicing and dicing the problem correctly
00:38:34.480
so and then uh yeah do you answer yep so I was curious in this cont cont how you
00:38:42.480
thinkin so for example you asked it uh about the total sales for a user that
00:38:48.800
know and I'm not surprised for a user about what for the user that it knows I'm so I'm not surprised it gave the
00:38:54.079
correct answer for that but if you asked it for a user that it did didn't know I wouldn't be shocked if it made up an
00:39:00.839
answer you know based on the experience I've had working with LM so how do you
00:39:06.200
think about hallucination you know as a problem to be addressed in this context yeah yeah
00:39:12.720
um well this is this is where this is where again this is where the structured
00:39:18.240
output um is able to shape the generated answer right
00:39:26.440
because um so in in in this example where where where you say that um I ask uh you know how much how
00:39:36.880
much money this what's the LTV of of of this customer right and and the customer
00:39:42.359
doesn't exist so we tries to like hallucinate an answer because it was trained on some sort of adjacent data
00:39:48.920
right well if we expect a structured answer right and that's the way that we
00:39:55.280
interface with these systems then it's going to generate something
00:40:00.599
like um it's going to generate something like you know customer email you know
00:40:06.520
Harry Potter do you know gmail.com right and we're we're going to connect our uh
00:40:13.480
order management or custom customer management system we're going to try to look up that customer and and it's not going to be found in the
00:40:20.599
system so I I understand it's probably like not
00:40:26.119
to the the the full satisfaction the the the answer that I gave you I also I also
00:40:31.359
think that um with time we will be able
00:40:37.240
to steer these systems in in a in a much better way in terms of like operating
00:40:43.640
within a certain with within a certain um uh context right so
00:40:49.359
like you will be able to narrow down its knowledge to specific specific areas
00:40:56.640
right so like these foundational models they know about uh Healthcare and history and and and and medicine and and
00:41:03.720
you know mechanical engineering and and I I I think we're going to be able
00:41:09.839
to narrow down uh what you call Laden space that that it kind of operates in
00:41:15.880
so it will be a lot less likely to um give you some completely out of place
00:41:22.599
like outrageous answer yeah um
00:41:27.920
so if if uh but does does the error does the error
00:41:33.319
happen uh in the uh in the actual like shipping label service is is that what
00:41:40.240
you mean yeah so so your your service should be uh should be a uh a good
00:41:47.960
service uh just ju ju well I'll explain um just just like an API right when you
00:41:55.240
interface with an API and it just returns a 500 with no body you're like
00:42:01.599
what so um you could have the shipping service and in some instances for
00:42:08.560
example uh where it tries to look up the customer and the customer record doesn't exist we just return a string that says
00:42:14.280
customer not found right so it then uses that information to
00:42:21.079
um to create the customer record right because in the in the instructions we had um create a customer record create a
00:42:29.280
customer record if if if new customer right so as long as you're returning an
00:42:35.200
airor message and I mean you can you can you know connect to some sort of like monitoring tools and and um that your
00:42:42.240
developers are monitoring um but if you want to communicate that error to uh to
00:42:48.599
the AI agent you should you should return that you should say the shipping label failed the shipping label uh
00:42:55.440
creation failed because you didn't provide a uh shipping method right and
00:43:01.119
it and at that point it should autoc correct and and kind of resent that uh resend that order or I'm sorry re
00:43:08.640
reissue that uh corrected API call yeah thank you thank you