Talks

Summarized using AI

Building AI Agents in Ruby

Andrei Bondarev • October 07, 2024 • Boulder, CO

In this talk titled "Building AI Agents in Ruby," presented by Andrei Bondarev at the Rocky Mountain Ruby 2024 conference, the focus is on integrating AI and machine learning capabilities into the Ruby programming language. The speaker emphasizes the need for the Ruby ecosystem to advance in AI, machine learning (ML), and data science (DS) to keep up with modern tech stacks.

Key points discussed include:
- Understanding Generative AI: Bondarev introduces Generative AI and its importance for application developers, highlighting its transformational effects on development processes where tasks previously taking months can now be completed in days.
- AI Agents Overview: The speaker explains what an AI agent is—a semi-autonomous system that uses language models (LLMs) to perceive environments, make decisions, and take actions to accomplish specific goals.
- Core Components of AI Agents: Bondarev details the components critical to building AI agents, such as reasoning and planning modules powered by LLMs, triggers for execution, and memory management through Retrieval Augmented Generation (RAG).
- Demos and Use Cases: Examples from Bondarev's consulting work are shared, demonstrating practical applications of AI agents in automating business processes, like handling invoices and customer management in an e-commerce setting. The speaker shows a demo based on a fictional e-commerce store, emphasizing how an AI agent can facilitate operations such as processing new orders and handling returns.
- Challenges and Considerations: There’s an acknowledgment of challenges like AI reliability, data bias, and the complexities in integrating these agents into existing workflows. The talk also touches on hallucination in model responses and the importance of structured outputs to mitigate incorrect conclusions from the models.
- Ruby as a Viable Language for AI: Bondarev advocates for Ruby, asserting its elegance and the potential for developers to leverage its capabilities for AI applications, suggesting that while there’s a perception barrier, Ruby can thrive in AI given the right frameworks and libraries.

The presentation concludes by underscoring Ruby's strengths as a language for AI applications and the need for concerted efforts to build an accessible and rich library ecosystem to facilitate that growth. Through Bondarev’s insights, developers are encouraged to explore AI integration into their Ruby projects, emphasizing that the ongoing evolution in AI capabilities makes this an opportune time for innovation in the Ruby community.

Building AI Agents in Ruby
Andrei Bondarev • October 07, 2024 • Boulder, CO

The Coatue AI report is putting AI models at the centerpiece of all modern tech stacks going forward that Application Devs will be using to build on top of. It would not be controversial to say that the Ruby ecosystem lacks in its support and adoption of AI, ML and DS libraries. If we’d like to stay relevant in the future, we need to start building the foundations now. We’ll look at what Generative AI is, what kind of applications developers in other communities are building and how Ruby can be used to build similar applications today. We’ll cover Retrieval Augmented Generation (RAG), vector embeddings and semantic search, prompt engineering, and what the state of art (SOTA) in evaluating LLM output looks like today. We will also cover AI Agents, semi-autonomous general purpose LLM-backed applications, and what they’re capable of today. We'll make a case why Ruby is a great language to build these applications because of its strengths and its incredible ecosystem. After the slides, we'll build an AI Agent in 15 min.

Rocky Mountain Ruby 2024

00:00:13.639 how's everyone doing this morning awesome um super excited to be
00:00:18.960 here uh thank you for uh having me uh this is an amazing conference uh I I I
00:00:25.320 really do enjoy Boulder is my second time here um so today I'm gonna talk talk about building AI agents in Ruby um
00:00:33.440 so how many of you work with uh llms uh and how many of you um uh have
00:00:42.000 built an application uh that uses an LM before okay a few maybe 10 uh how many
00:00:48.640 of you can explain what what rag is okay uh less so like three
00:00:55.719 people all right um so um my in my day-to-day on a small uh software
00:01:01.160 development Consulting agency where we help uh uh VC backed uh startups and uh
00:01:07.840 size Enterprises uh build custom software um the rest of my time I spent
00:01:13.439 uh developing uh open source uh Solutions and uh kind of doing um applied um AI work and trying to figure
00:01:21.000 out what are the best ways to implement some of these um newer capabilities that
00:01:26.040 we've been seeing in the last couple years into our stacks um so um I've been able to collect some
00:01:34.040 of these stickers uh throughout and and uh been fortunate enough to help some of these companies um who remembers spree
00:01:41.479 um does anyone still run a a spree store or or solidos what an what an iconic U library
00:01:49.799 um and so I think this summarizes the uh gen impact pretty well um what used to
00:01:56.399 take um half a year to to build out um now can just be done uh within uh a
00:02:03.600 couple days or a couple of weeks um and so some of these most
00:02:09.720 common uh machine learning tasks um such as uh taking unstructured data and
00:02:15.920 converting it to a structure format or summarizing large text or um classifying
00:02:21.599 documents uh translation content generation named entity recognition um
00:02:27.879 are just an API Fall Away um so as these uh capabilities uh
00:02:36.360 have significantly dropped in price um there's a long tail of uh medium to
00:02:42.080 small-sized companies that are implementing these capabilities um so now uh some of these AI features are no
00:02:49.239 longer reserved to the Fang companies um and so I'd like to propose
00:02:55.560 um a view that um we're going to have this intelligent compute um engine
00:03:02.760 inside of every single stack um just like every single stack Tech stack now has databases and cache and encryption
00:03:10.360 and cues and Lambda function storage um and it's uh supports different
00:03:16.200 protocols um you're going to have um an llm uh embedded uh into your Tech stack
00:03:24.200 uh accomplishing different tasks um and really I think I think we're going to have ai AG actually uh embedded into
00:03:32.640 every single Enterprise and and chipping away a different tasks and workflows of
00:03:38.239 um every single department so of course this is not the
00:03:43.480 first time that we're talking about AI agents uh so in the 1950s when um Allan
00:03:49.280 Turing had published his famous um uh paper he introduced the the concept of
00:03:55.599 intelligent machines it is the same paper uh that he introduced the the turn test and then in the 70s and 80s uh the
00:04:04.239 community researched uh and worked on Expert systems and software agents and you might remember the kind of the
00:04:10.360 original chat Bots with um Alexa and Siri and we're we're taking another stab
00:04:16.959 at it now so U every um major tech company is
00:04:22.400 presenting their Vision around this um so for example Google has um its own
00:04:27.919 custom uh AI agent Builder uh that's ironically uh called uh a gem manager you can build
00:04:35.160 gems Ruby's making a comeback um so what is an AI agent well
00:04:42.479 the textbook definition is that it is an autonomous system uh that's capable of
00:04:48.720 perceiving its environment making decisions and taking actions um to
00:04:55.400 achieve specific goals so environment awareness uh decision-making and uh
00:05:01.039 action taking so these two terms are U used
00:05:06.880 interchangeably um and I certainly use them interchangeably as well um I draw
00:05:12.039 this kind of minor distinction that U the assistant is is is more of a conversational so um it's a multi-turn
00:05:19.759 conversation with a human or an Ever system um and agent is more of um um an
00:05:27.720 autonomous system kind of like a background job you give it some tasks and it goes off
00:05:32.800 running um so what are some of the use cases well of of course uh the number
00:05:39.120 one use case is broadly automating business processes um so and and and I
00:05:45.520 think we can tackle low uh low IQ tasks uh we can augment workers with personal
00:05:52.560 assistance uh we can also handle uh time consuming tasks so things that humans
00:05:58.720 just can't do like summarizing uh like writing up an executive summary for a
00:06:04.560 200 Page document within within a matter of a couple seconds uh or moderating a
00:06:09.960 million images within a couple hours and so for example in in in my
00:06:15.960 small Consulting business we could be uh creating invoices from from time sheets categorizing business
00:06:22.360 expenses uh writing proposals so remixing our service offerings with the
00:06:28.400 uh the notes that have been taken the client conversations and and writing up uh proposals um writing job descriptions
00:06:36.960 uh jur tickets um and so um when you
00:06:42.000 think about um building an AI agent uh these are kind of some of the components to uh think about so so there's a
00:06:50.000 reasoning and planning module which is uh provided by the llm um we need to um
00:06:59.160 in our tasks goals objectives workflows uh to the AI agent so so it has a purpose
00:07:06.800 and um we need a way to run the agent uh so that's the con concept of triggers
00:07:12.560 and we'll look at uh some of these um in a sec um so when it comes to
00:07:19.440 reasoning and and and and planning um when we give our agent um tasks and
00:07:26.280 goals it needs to be able to make U decision based on that information it needs to be able to formulate a plan um
00:07:34.360 and then the for example the plan reflection is is the concept where it's able to recursively revisit its plan and
00:07:41.199 and make sure that it's um it's still valid so whenever people talk about um
00:07:47.800 reasoning and planning they bring up a Chain of Thought um and really all all that Chain of Thought is is is is just
00:07:54.800 it's forcing the AI to explain its reasoning um
00:08:00.159 so similar to how humans when we're um thinking out thinking out loud through
00:08:06.520 the problem we're much more likely to arrive uh to a better answer so for
00:08:13.400 example um in this uh in this example if if we
00:08:20.240 ask us kind of random uh brain cruncher um and just just force it to give an
00:08:26.840 answer um then it's much less likely to give us um a good answer than um
00:08:33.039 spending some time uh thinking how to approach this this this question it's so
00:08:39.760 it's much more likely to get to a better answer this way um and when I talk about uh business
00:08:48.560 logic so I'm talking about any kind of business related tasks goals um
00:08:56.080 objectives workflows standard operating procedures so so um I'm leading up to uh
00:09:02.279 to a demo that I'm going to do in a bit but um you can imagine if we run an uh
00:09:08.720 e-commerce store for example um we have procedures for um processing new orders
00:09:16.720 or processing uh returns that will enumerate all the different steps with
00:09:24.519 some logic um within those steps
00:09:30.120 so when it comes to triggers we need to uh we need to be able to um run uh the
00:09:35.760 agent so kind of very similar to like sidekick um right it either monitors um
00:09:43.399 State changes uh it runs on a schedule like a cron tab uh or it's event driven
00:09:50.440 web hook um or you like
00:09:56.839 that um or you can manually run it um and and um AI agents uh well
00:10:06.920 really uh large language models are are are stateless uh so we need we need some
00:10:12.320 sort of mechanism to to save progress uh save the context uh save the tool
00:10:19.600 calling output to to memory and uh retrieve it later and whenever people talk about
00:10:26.600 memory they talk about rag um I'm not going to um spend too much time explaining what
00:10:31.920 rag is but but and and I think we've largely over complicated what it what it is uh all throughout kind of 2023 and um
00:10:40.440 partially this year as well all that it is is that you take relevant context and
00:10:46.959 you put it in the prompt that's it um you don't necessarily need to use uh a
00:10:52.040 vector database and generate embeddings and all that um that that's the implementation detail it's it's just
00:10:58.639 that you're injecting relevant uh context into into the prompts of that
00:11:04.040 model can can can use that information so it can be your proprietary data
00:11:10.399 Etc um so tool calling and function calling are are also
00:11:16.480 synonymous um and you use them to produce structured outputs so um having
00:11:24.480 an uh having the the model uh generate um response in a predefined Json schema
00:11:32.720 or when you uh needed to the uh intent detection utilize
00:11:39.959 external tools so think apis um and and so you would use tool
00:11:46.320 calling to uh get data from external sources because um your for example those
00:11:53.360 external sources are are proprietary and and and and these foundational models uh
00:11:58.880 we're not trained on your specific data um uh the models are trained in a
00:12:04.360 snapshot in time so uh it has no concept of of of time and and and certainly
00:12:09.680 doesn't um it's unable to access real-time data um when you needed to
00:12:15.959 take actions um or execute deterministic tasks so um there's a really good
00:12:23.079 article kind of um um called uh math is Heart by uh Gary Marcus
00:12:30.600 um basically showing that um these these
00:12:35.639 uh llms don't really know how to do how to do math um and it doesn't really make
00:12:41.920 sense to use them for those tasks um and so for example if if if I
00:12:49.000 ask it to um sum up two uh really large numbers it looks like the correct result
00:12:56.720 here at the bottom but but it's not uh but this this this problem has been
00:13:01.800 solved I mean uh we can we can use a tool such as the uh the code interpreter in in this case and it just produces a
00:13:08.760 program um that runs in a sandbox and uh so that's the tool um and provides the
00:13:16.000 the response back um and so really the implementation
00:13:21.160 looks uh like the following so um again I'm uh leading us to a uh the e-commerce
00:13:28.000 related demo I'm going to do so so imagine we um so the function definition
00:13:33.160 is in the open API is spec um and we can
00:13:39.440 declare it uh um different functions so in this case we have a Inventory
00:13:45.199 management um class that has a function called find product um and it accepts a
00:13:51.720 SK and whenever within the conversation there's a message um that is relevant
00:13:58.480 the uh model uh the uh intent detection chooses to invoke um to invoke this
00:14:05.240 function so we um extract the function name extract the arguments uh send it to
00:14:11.240 to that object with that um uh that method name and
00:14:16.320 arguments um and so it looks kind of like this uh so we need a way to inject
00:14:22.639 uh the instructions to the business logic to the AI agent and we utilize a LMS uh to reason impl
00:14:30.839 uh store and retrieve from memory take actions via tools um if it's
00:14:36.480 conversational converse with with the user and then uh we need some sort of a way to uh trigger the
00:14:42.959 agent um and so um I I wanted to kind of briefly introduce a library that I've
00:14:48.800 been uh working on called linkchain RB um and it's a it's a way for uh building
00:14:54.880 LM powerered applications in Ruby um it's uh
00:14:59.920 it's a living organism it's evolving as kind of my conceptualization changes and
00:15:06.000 and and the abstractions and and um what I really think would bring the
00:15:13.560 value as well um so um we have uh so
00:15:19.440 there's a couple of things that it provides it's uh it provides a unified interface into different kind of LMS so
00:15:25.120 you can um test out um a number of them really quickly swap them in and out you can build AI agents and
00:15:31.680 there's also a pipeline to uh to do rag as well um
00:15:38.360 demo so um imagine we have this uh fictional
00:15:43.880 e-commerce store uh called uh nerds and threads uh that sells comfortable nerdy t-shirts for software Engineers that
00:15:51.040 work from home so who would shop there um and an e-commerce store
00:15:59.120 um has the following Services uh kind of self-explanatory what they what they do
00:16:04.399 right and you can you can imagine these are different service objects with with functions um and we're going to put an
00:16:10.959 AI agent there to uh facilit to run the store to facilitate the execution to run
00:16:17.160 our um standard operating procedures um so this is how I I think
00:16:25.680 about this um so the Ruby on Rails promise has always been has always been developers focus on writing business
00:16:32.279 logic and not the plumbing right so as I think as rails developers we string
00:16:39.120 Services together right and it's and it's uh um and we focus on on on these
00:16:45.720 business use cases first and foremost um and previously you you'd
00:16:51.519 have business logic and in in in models uh or service objects or controllers if you're bad um
00:17:00.560 and and and now I think some of this logic is going to start shifting to to
00:17:05.880 prompts and really AI agents um so we're going to set up this
00:17:13.000 uh uh assistant class uh that we're going to inject the instructions on the
00:17:18.679 left hand side um we're going to use an LM uh for
00:17:24.160 reasoning and planning and we're going to utilize these uh services
00:17:30.640 um so this is what one of one of the classes um looks like
00:17:37.760 um I'm not sure I can make it bigger um but it it is a customer management class
00:17:43.799 and we have two functions uh create customer that takes in the name and email and uh find customer that takes
00:17:50.320 takes an email pulls it from from a SQL light database um and we're extending it
00:17:55.880 with a uh linkchain tool definition module that provides that Divine function DSL
00:18:02.120 that basically converts uh this definition into an uh open uh open API
00:18:08.120 spec um I mentioned earlier okay so I think this where we go
00:18:13.760 to the demo um okay so we have this uh nerds and threats um AI agent and we're
00:18:20.679 g to uh pass these
00:18:27.039 instructions and so um you're an AI that uh runs an
00:18:32.679 e-commerce store called nerds and threads that sells comfy nerdy t-shirts for software Engineers that work for home you have access to the shipping
00:18:38.520 service inventory service order management Etc uh you're only responsible for processing new orders
00:18:45.240 refuse all of our workflows um and we write out the steps
00:18:50.480 for processing new orders so um imagine we have a uh Point
00:18:57.960 of Sales system that sends the following events uh there's a new order event with
00:19:03.280 the customer email quantity the skew of the item uh and the and the
00:19:15.559 address um and uh the first thing we do is we try to find the customer so we call the
00:19:22.480 uh customer management Tool uh customer is not found we try to create the
00:19:27.559 customer uh uh success customer ID we have a customer ID we find the product
00:19:34.039 uh we retrieve the product we get back the skew the price and uh there's 10 items uh in
00:19:40.640 stock uh we charge the customer which is uh five items at the price of
00:19:46.559 $24.99 uh we create the order record um and because the address is in
00:19:54.000 the US uh we use FedEx
00:19:59.240 FedEx and uh we and and this is a copy of the email that uh gets sent out um
00:20:08.960 and so um and I was actually uh slowing down the execution so uh in between kind
00:20:16.240 of every turn there there was a sleep statement um so now let's say oh sorry
00:20:23.240 um so now let's say um a return order um event comes in with uh that customer
00:20:30.720 email um and the order ID I changed my mind I want to return it
00:20:37.039 um and the system takes in the return order event
00:20:43.559 and uh refuses to uh process the return um so we're going to try to see
00:20:50.360 if we can uh Implement that logic really quickly uh so we
00:20:59.799 append our return order steps uh which say uh return order step-by-step
00:21:06.159 procedures follow them in this exact sequential order uh look up the order calculate the total amount uh refund the
00:21:13.120 payment Mark the order as refunded um and actually here we're also going to say that you're only responsible for
00:21:19.600 processing new orders and uh returning orders so if we take this uh return
00:21:28.120 order event as well let's see if it is able to
00:21:42.480 order refund the
00:21:47.760 customer yep successfully return
00:21:52.840 um and oh so this was recorded in case I botched the live one
00:22:01.039 um and another thing we can do um which I think is a really good uh use
00:22:07.080 cases is is like text textas SQL um so
00:22:12.120 for example we can uh we're utilizing uh the internal uh tool in in uh in link
00:22:19.039 cherb which is a database uh Tool uh which can uh execute queries and and um
00:22:26.720 um um output the the the schema for your uh database connection string so for
00:22:33.279 example we say describe the
00:22:41.440 database and it should utilize the internal uh dump schema
00:22:48.559 tool um and now we can say
00:22:54.240 uh uh what is it
00:22:59.480 oh drop yeah well of course of course of course um if if you're going to be doing this you should create a dedicated a
00:23:07.240 dedicated database role that has permissions and access and you should sanitize the output so you don't
00:23:13.880 accidentally drop alter
00:23:21.360 Etc right right um and now we can say uh what it is uh let's say say how
00:23:31.760 many what see how
00:23:37.120 much what was I going to say
00:23:47.480 oh oh on
00:23:55.760 orders okay finds a customer
00:24:08.720 H total spent okay oh because I returned it yes
00:24:15.400 you're right you're right okay you're paying
00:24:20.799 attention that was a test so
00:24:26.799 um why would you use this um so of course of course it's of course it's
00:24:32.399 far-fetched um and and it seems like e-commerce has been has been solved for
00:24:37.760 a very long time we haven't seen uh all that much Innovation um and um you can change the
00:24:46.080 requirements on the fly so um you know a CEO comes in and says well this is our
00:24:51.960 10e anniversary of our store and to all of our loyal customers we want to offer 10% discount on on the ERS
00:24:59.600 today but how do you define a loyal customer well you know they need to have spent you know $100 or more in uh in the
00:25:06.720 last six months great let's let's type it out let's test it let's ship it right because the product team is going to
00:25:11.760 tell you well not in this brand uh and we have to do uh scaled agile and um we
00:25:19.480 haven't planned it out yet Etc you can inject in intelligence in in into into
00:25:25.039 your process um and you can tackle comp complex workflows as
00:25:31.600 well um so of course you need to you need to be able to evaluate this and and
00:25:37.200 you need to um just bombard it with inputs and and and outputs and and see
00:25:44.600 how your AI agent performs you can ask you can ask itself uh to evaluate how it
00:25:52.600 did according to certain criteria so um kind of recursively
00:25:58.520 uh test itself um there's plenty of benchmarks on the uh on the hugging phase but of
00:26:04.840 course for for your specific use cases you should be creating these data sets on your own um and if if the um agent
00:26:15.159 reliability is not to your satisfaction then you should be reducing the number of responsibilities or tasks or kind of
00:26:22.799 the the uh shrinking the decision tree that it operates um on
00:26:29.559 um and and of course you might say well Andre these things hallucina and just really unstable and and this is kind of
00:26:37.320 the way I I I've started thinking about this so so modern software Still Still fails and it fails because because of
00:26:44.919 the dependencies that we can't control so maybe we can draw a parallel and say that well AI systems can can fail
00:26:52.399 because of inaccurate or incomplete data or bias in the data um
00:26:59.279 this is a rubby conference so U modern software can't scale because it modern
00:27:04.919 software fails because it it can't scale well AI systems also uh currently struggle from from uh the massive uh
00:27:13.559 compute needs there's Cloud outages that affects uh both sides there's cyber
00:27:19.600 attacks on on on on one side and and adversarial attacks on the other side um
00:27:25.799 we don't test our software enough um and you could kind of draw a parallel
00:27:31.480 to to this blackbox behavior um and and and of course there's unclear still unclear liability
00:27:39.039 and accountability in terms of the the decisions that these AI agents makes
00:27:44.360 who's responsible for it but I think these these These are engineering problems that will be
00:27:52.640 addressed um so why am I doing all of this in Ruby well I love the language I
00:27:57.960 think it's I think it's very elegant to express your ideas um and it lets me
00:28:04.080 explore different different concepts very quickly without again without thinking of of uh semicolons and and all
00:28:11.600 this uh implementation um and I I I actually
00:28:16.720 think in terms of um uh DS ml uh AI capabilities we're not
00:28:25.320 that far from python where we we have have all these amazing libraries it but
00:28:30.640 they exist in the snapshot in time and and of course companies need to invest in in maintaining them but I think I
00:28:37.120 think the main problem is is the problem of perception so you're not going to lose you're not going to lose your job
00:28:43.200 uh on a uh AI project because you because you picked python even even if the project fails but you might you
00:28:49.799 might just lose your job if you picked Ruby because it's a very contrarian decision so we um we looked at the
00:28:58.760 uh gen uh wave uh some of the uh task
00:29:04.440 tasks that it offers um we looked at how we can package up and and and think
00:29:10.480 about building AI agents um I demoed this e-commerce example where it
00:29:16.320 connects to different services that you can find in any e-commerce floore out there and how we can get the AI agent to
00:29:23.120 to run it um and I expressed my uh love to Ruby at the end and and um um and
00:29:40.600 you yeah was a very good question I think I think training is going to be oh um so your question was um as
00:29:48.559 you're as you're iterating on a on a on the task um when do you stop prompt
00:29:54.880 engineering and when do you start thinking about training custom models um so I think I think the the the
00:30:02.480 beauty of all this is that um you don't have to train custom models in in terms
00:30:08.279 of I think I think training custom models is is is going to be reserved to still the the the largest companies out
00:30:15.120 there kind of the the the largest Roi on the line um and and uh the the the intention
00:30:23.760 is that these these foundational models are going to do you know are going to be able to tackle 90% of the of the of the
00:30:30.039 use cases but in and and certainly the
00:30:35.399 uh the approach should be take the most capable commercial model out there uh
00:30:41.880 prove out that your concept works and and then scale down to maybe smaller models or open source models you know
00:30:48.840 once off of off of hugging face did I uh answer your question
00:30:55.559 yeah um well I I'm going to I'm I'm going to compare it to python right so
00:31:01.159 very similar to python written on top of C same kind of concurrency story
00:31:06.720 um and uh a lot of times these these python libraries break into the SE bindings right and and I'm sure everyone
00:31:14.679 is familiar with Andrew Kane's work and he builts a lot of kind of equivalents
00:31:19.760 in the Ruby World um and I just I just I just find
00:31:25.639 that we need to we need to invest in in in maintaining those libraries so
00:31:32.200 um you know I mean if if you're if you're so so the beauty of this is is
00:31:39.200 that um there there's this kind of concept uh there's title emerging of an
00:31:45.639 AI engineer right and and it's it's basically full stack Engineers that are
00:31:52.399 much more familiar with some of these data science um ml AI Concepts right but
00:31:59.279 we're not necessarily going to be uh we're not necessarily going to be training models but we're going to be
00:32:06.279 um uh we're going to be doing applied AI yeah anyone
00:32:13.120 else yeah that's a very good question um sorry so I'm going to try to repeat it
00:32:18.880 um so in the demo that I did I wrote out specific constraints for the execution
00:32:26.320 flow like uh one of them was if if the address is in the US use uh FedEx if if
00:32:32.399 it's outside use uh DHL um so how do we make sure that these
00:32:38.720 constraints are respected and um uh how
00:32:45.039 do we have the guard rails uh to make sure that um uh it's safe to to release
00:32:51.360 into the wild right yeah um so to be frank I'm I'm still trying to figure out
00:32:57.679 like the best way to package up these these uh these workflows
00:33:03.919 into these workflows um into these systems so so there's a lot of people that are looking
00:33:10.440 at uh graph databases and kind of representing workflows as as uh as a
00:33:15.679 tree and and really just utilizing the llm like for for making decisions right
00:33:22.559 so if the if the if the node is is generated shipping label right there's a
00:33:28.120 decision to be made that's a natural language decision whether he addresses
00:33:33.240 in the US or uh in the in Europe so use uh FedEx or use use DHL um the the the
00:33:41.279 beauty of of of using an llm at at this step is that um it's able to process
00:33:49.399 like um an infinite number of different permutations right so instead of like
00:33:55.639 writing a really long El if el else or Rex right that instead of enumerating
00:34:02.919 every single possible case that you could think of um in in uh instances
00:34:08.839 where the number of permutations is nearly infinite right the llm can take its best guess right um but certainly
00:34:19.399 certainly um certainly I would put some structure around like inputs and outputs
00:34:24.679 right and and uh uh these foundational model companies are introducing all
00:34:30.159 these Concepts to try to put some more rigid rigidity around this uh inference
00:34:38.560 uh like tokens generated so you know the previously like the way you would generate uh Json is you would uh
00:34:45.879 actually write out in the prompt and say uh please generate generate the Json
00:34:50.960 that aderes to the schema and it's like well is it is it is is it going to put it here or here or in the middle and are
00:34:57.240 we going to back tick like and now and now it's um and now that support for
00:35:02.599 that functionality is like a first class citizen right um and as they do more of
00:35:08.720 this it it it it will gel much um much better with our systems or kind of
00:35:15.160 traditional deterministic systems yeah so um in this uh on this
00:35:23.839 slide yeah on the on the slide um and I I hope
00:35:30.839 it's big enough you could see um and certainly in the demo I showed that um we're we're uh describing its its Its
00:35:38.079 Behavior right so it's telling it we're we're telling it like what what its purpose is and uh that it it is an AI
00:35:45.400 agent that runs an e-commerce store it connects to all these Services Etc um
00:35:51.800 there's another thing here where um we have a way to connect uh like external
00:35:58.599 data sources uh as as one of the tools for uh
00:36:04.480 prompt uh for for for basically rag right so you can give it access to let's
00:36:12.599 say um an external data source and say um the rules of this game right which
00:36:18.720 which you know whatever like uh maybe it's a rule book that's that's that's
00:36:24.160 kind of thix so we're we don't want to put the uh the full thing in there but we want to let it know that it's able to
00:36:33.240 look up the rules right and then and then we can write it write out in the steps and say um and say always look up
00:36:40.920 the rules when you encounter a specific scenario right so it will call into
00:36:46.400 about service right and we do some sort of full teex search or again Vector
00:36:52.240 surge to look up the rules based on based on that query and it will inject those rules and to into the prompt and
00:36:59.079 and then that's was kind of called in in context learning which is uh when the AI
00:37:05.920 like uses this new information learns this new information um within the the
00:37:11.960 context yeah does that answer your question if we have an existing
00:37:18.560 e-commerce system and we take this beautiful system and try to put it back into it are you finding that you're
00:37:24.800 having to like how much change has to happen to that existing system to make
00:37:29.960 this work particularly if you don't have kind of a data set to double check and
00:37:35.079 back test before you start working yeah well you yeah I mean you should definitely be developing your evaluation
00:37:41.160 data set but um just like I uh kind of ironically uh
00:37:49.280 phrased it uh in this slide um you take the you take the original jur tickets uh
00:37:55.200 that say as a as a user I would like do I would like to be able to um complete a
00:38:01.240 new order and these are the steps required um so you would you would take that kind of reshape it right and and
00:38:08.319 and uh try to extract some of some of his business logic um and and hand it
00:38:13.880 off to to an AI agent yeah but but the you know there's there's an uh there's
00:38:20.400 kind of an art to it which is like knowing the limitations like understanding the the the ripe use cas
00:38:28.440 cases um slice and slicing and dicing the problem correctly
00:38:34.480 so and then uh yeah do you answer yep so I was curious in this cont cont how you
00:38:42.480 thinkin so for example you asked it uh about the total sales for a user that
00:38:48.800 know and I'm not surprised for a user about what for the user that it knows I'm so I'm not surprised it gave the
00:38:54.079 correct answer for that but if you asked it for a user that it did didn't know I wouldn't be shocked if it made up an
00:39:00.839 answer you know based on the experience I've had working with LM so how do you
00:39:06.200 think about hallucination you know as a problem to be addressed in this context yeah yeah
00:39:12.720 um well this is this is where this is where again this is where the structured
00:39:18.240 output um is able to shape the generated answer right
00:39:26.440 because um so in in in this example where where where you say that um I ask uh you know how much how
00:39:36.880 much money this what's the LTV of of of this customer right and and the customer
00:39:42.359 doesn't exist so we tries to like hallucinate an answer because it was trained on some sort of adjacent data
00:39:48.920 right well if we expect a structured answer right and that's the way that we
00:39:55.280 interface with these systems then it's going to generate something
00:40:00.599 like um it's going to generate something like you know customer email you know
00:40:06.520 Harry Potter do you know gmail.com right and we're we're going to connect our uh
00:40:13.480 order management or custom customer management system we're going to try to look up that customer and and it's not going to be found in the
00:40:20.599 system so I I understand it's probably like not
00:40:26.119 to the the the full satisfaction the the the answer that I gave you I also I also
00:40:31.359 think that um with time we will be able
00:40:37.240 to steer these systems in in a in a much better way in terms of like operating
00:40:43.640 within a certain with within a certain um uh context right so
00:40:49.359 like you will be able to narrow down its knowledge to specific specific areas
00:40:56.640 right so like these foundational models they know about uh Healthcare and history and and and and medicine and and
00:41:03.720 you know mechanical engineering and and I I I think we're going to be able
00:41:09.839 to narrow down uh what you call Laden space that that it kind of operates in
00:41:15.880 so it will be a lot less likely to um give you some completely out of place
00:41:22.599 like outrageous answer yeah um
00:41:27.920 so if if uh but does does the error does the error
00:41:33.319 happen uh in the uh in the actual like shipping label service is is that what
00:41:40.240 you mean yeah so so your your service should be uh should be a uh a good
00:41:47.960 service uh just ju ju well I'll explain um just just like an API right when you
00:41:55.240 interface with an API and it just returns a 500 with no body you're like
00:42:01.599 what so um you could have the shipping service and in some instances for
00:42:08.560 example uh where it tries to look up the customer and the customer record doesn't exist we just return a string that says
00:42:14.280 customer not found right so it then uses that information to
00:42:21.079 um to create the customer record right because in the in the instructions we had um create a customer record create a
00:42:29.280 customer record if if if new customer right so as long as you're returning an
00:42:35.200 airor message and I mean you can you can you know connect to some sort of like monitoring tools and and um that your
00:42:42.240 developers are monitoring um but if you want to communicate that error to uh to
00:42:48.599 the AI agent you should you should return that you should say the shipping label failed the shipping label uh
00:42:55.440 creation failed because you didn't provide a uh shipping method right and
00:43:01.119 it and at that point it should autoc correct and and kind of resent that uh resend that order or I'm sorry re
00:43:08.640 reissue that uh corrected API call yeah thank you thank you
Explore all talks recorded at Rocky Mountain Ruby 2024
+22