00:00:13.679
Today, we'll talk about building LLM-powered applications in Ruby. I started my career 13 years ago building Rails applications for the U.S. Department of Health and Human Services, creating public-facing applications for millions of consumers. Then I moved on to continue building Rails applications at USA Today. Does anyone remember SPRE? Sean Scoffield, who was the original author, hired me, and I worked with one of the most amazing Rails teams at the time.
00:00:26.960
So, we're going to talk about generative AI. What is generative AI? It is a type of artificial intelligence that generates text, audio, video, etc. For the purposes of this presentation, we'll focus on text. Large language models (LLMs) are deep learning artificial neural networks with general-purpose language understanding and generation. They exploded in popularity after the 2017 Google paper, "Attention is All You Need," and the underlying architecture is called Transformers.
00:00:58.760
The impact of generative AI is that we are suddenly able to deploy commercially viable systems in just a few days. Previously, it would take about a month to collect data, three months to train the models, and another three months to deploy and optimize those models to run on hardware. Now, you can use APIs to get something into a proof of concept (POC) state or production state within a couple of days.
00:01:37.560
LLMs excel in various tasks. Some of the tasks they are particularly good at include structuring data, which means taking unstructured data and converting it into a structured format, such as returning JSON or YAML that developers can consume. They are also proficient in summarizing text; you could send them a short narrative and get a summary back. Other tasks include classifying data, translating languages, generating content, and answering questions in a chatbot style.
00:02:38.840
I would like to present a vision of where I think this technology is headed. I don't think anyone truly knows where it's going, so I encourage you to consider it holistically, without getting tied up in implementation details. I believe that generative AI will become a part of every tech stack, just like databases, cache, encryption, queues, Lambda functions, storage, and protocols. Every tech stack is going to have a running process that will rely on AI for certain functions.
00:03:06.960
The firm The2 presents this vision where AI becomes a crucial part of application architecture, with developers building applications on top of these AI systems. For example, in every single project, we have "if-else" statements requiring the enumeration of different options. There's a finite amount of options, leading to a predetermined output based on input. With AI, we have the flexibility to account for more ambiguity. For any given input, AI can choose the best-suited candidate among various outcomes, even for inputs it has never encountered before.
00:03:50.640
For example, an AI agent could write a file, approve a document, or text my partner. This has always been the promise of Rails: for business owners, the Rails teams should focus mainly on writing business logic—rather than solving engineering problems. Developers should concentrate on writing business logic and not reinventing engineering solutions. In the old world before AI, business logic was often scattered throughout fat models or service objects. In the new world, I propose that some business logic will reside in prompts.
00:04:29.880
In this example, imagine a simulated e-commerce store, where an AI agent orchestrates business logic and communicates between different systems the e-commerce store might connect with. Almost every commerce store connects to a payment gateway via an API, an inventory management system via another API, accounting systems, and shipping services. I propose that you could write out your standard operating procedures for your business, include that in the prompt, and have the AI orchestrate the execution of handling orders, processing returns, and managing the customer loyalty program.
00:05:05.920
This takes us to AI agents. AI agents are autonomous or semi-autonomous general-purpose LLM-powered programs that can use tools, essentially APIs and integrations, through function calling. They work best with powerful LLMs, and OpenAI's GPT-4 is currently the leader in this space, although that changes frequently. These agents can be used to automate workflows in business processes.
00:05:44.160
Anyone who has ever attempted to build these agents knows they can be pretty unreliable. I think about this reliability in terms of the upper triangular shape: as the number of tasks or responsibilities that these agents are responsible for shrinks, their reliability increases. You can narrow down the set of responsibilities that AI agents will execute or the decision tree they follow.
00:06:19.600
Looking at the chart, the x-axis represents a focused agent—perhaps a single or dual task agent—on the left-hand side, while a general-purpose agent appears on the right. The upper left quadrant and bottom right quadrant could be at least proof of concept (POC) ready. If you achieve a general-purpose reliable agent, well, that would be Artificial General Intelligence (AGI).