Talks

Yet Another Ruby DSL for LLM

#rubyconftw 2023

在過去一年裏我們經歷了 LLM 的巨大衝擊,但是對於使用 LLM 實作應用依然困難重重。如何利用 Ruby 的 meta-programming 更好地對 LLM 進行處理和約束從而實作下一世代的 AI 應用是本 Topic 試圖討論的關鍵問題。

RubyConf Taiwan 2023

00:00:28.960 Hello everyone, this is Delton. I'm very happy to have the chance to share this topic at RubyConf Taiwan this year. When you see this video, I obviously failed to come to Taiwan due to some travel issues. I think it has been about four years since my last visit, and I miss you all so much. Fortunately, I've recovered from a high fever recently, so at least I won't spread any viruses during the conference. Let's proceed with this remotely.
00:00:53.239 Large language models and services like ChatGPT are becoming quite popular recently. If you want to build an AI application in Ruby, here's how to get started. Step one: Open the OpenAI API documentation to understand how to send requests and handle responses from their servers. Step two: Write some basic Ruby code. If you use a Ruby gem like Faraday, it will make this process much easier. Just send a request and process the results from OpenAI. That's all you need to build an application in Ruby. Thank you for listening!
00:01:23.760 However, if you've ever built such a system, you might find that your large language model is often uncontrollable. It tends to provide false facts and can damage customer confidence. If you aim to build a real and complicated real-world application, you'll discover it’s almost impossible. Simply relying on prompt engineering to create complex systems, such as agent-based systems or AI-driven NPCs for your game applications, isn't sufficient. We need to delve deeper to understand large language models and how we can exercise control over them.
00:02:49.599 Let's start with a well-known research paper, 'Attention is All You Need,' published in 2017. This paper has been widely read in the AI industry. Although it became popular in 2017-2018, its fame has surged recently with more people, including end-users and venture capitalists, studying its content. Not everyone has an AI background, so let’s discuss its key contributions. The paper introduces an encoder-decoder architecture, which is not vital for models like GPT, as some are encoders-only and others are decoders-only. However, it introduces a component called multi-head attention, which is the attention mechanism. This attention allows the model to weigh different parts of the input sequence.
00:04:20.680 The attention formula, which consists of queries, keys, and values, operates by processing a distribution of possibilities and outputting a probability distribution of the value. This is important because it allows us to train a complicated language model capable of understanding and representing human language. With numerous layers and parameters, we can define large language models, but they often exhibit unstable behavior. Issues arise not only from the prompts but also from the changing weights over time, leading to confusion when building applications on top of them.
00:06:39.280 To build reliable applications, we need to use logic inference rather than merely relying on probability distributions generated by large language models. The goal is to predict the next token in a way that reflects true logical reasoning rather than just statistical likelihood, as tokens represent the next element in a sequence without inherent logic or problem-solving capabilities.
00:07:01.999 To create effective AI applications, we must understand our actual requirements and the specifications that will help us analyze how to utilize large language models properly. Many frameworks assist in developing AI applications with large language models, like Luncheon for Python. Most frameworks focus on helping developers design better prompts, but we should also explore actual use cases.
00:09:18.760 Let's take a look at a demo. Imagine a security guard in a building where a visitor tries to bypass security. When asked about entering the building, the AI responds with 'I'm sorry but you are not allowed to enter.' This response is predefined. However, when I say that I'm an employee, the AI recognizes my intention and asks further questions, which I answer incorrectly, resulting in denial of entry. Adjusting my phrasing, the AI asks me to prove my employment once again, demonstrating its capacity for decision-making based on input.
00:10:39.840 Building such an application requires cleverly handling tokens. This echoes the challenges faced in the 1970s and 80s with the Atari 2600, where programmers had to manage limited resources effectively. Here, the facts should be inferred rather than guessed by the AI. Using large language models for logical inferences requires extensive background information about the tasks, leading to potential limitations due to token capacity.
00:12:25.360 When formulating propositions, we can justify them through three primary methods: appealing to authority, using inductive reasoning similar to what large language models do, or employing deductive reasoning. While inductive reasoning can yield a 90% truth, true logical inference should yield 100% certainty. The challenge lies in how we perform deductive reasoning with computers.
00:14:35.960 In the 1990s, researchers developed logic computers capable of performing logical calculations, based on rules. An example rule states that if P implies Q, then not Q implies not P. Logic programming is significant for applications requiring precise inference, such as games with intricate logic systems. Properly defining rules allows for structured querying, which is crucial for AI systems.
00:16:03.040 Logic programming allows us to handle varying user inputs without building overly complicated state machines. By establishing clear rules and queries, we can extract the users’ intentions, ensuring that responses align with user inputs effectively. Many AI applications, like in game development, utilize this methodology to generate infinite gameplay possibilities.
00:17:44.720 We choose Ruby for integrating these logic systems due to its friendly syntax for metaprogramming and DSL (Domain-Specific Language) support. Ruby is highly flexible for abstracting algorithms and mathematical processes, enabling developers to avoid exposing complexity to end users. We aim to build a DSL that handles user inputs, integrates with language models, and performs necessary logical operations.
00:19:31.280 Before deploying these applications, several performance challenges must be addressed. For requests that exceed one second, Ruby applications may experience slower performance due to the threading model. Our solution involves implementing a ticketed response system where requests are processed in the background, and users receive progress updates.
00:21:40.720 To extract user intentions from input effectively, we employ a multilingual model for tasks like cosine similarity calculations. This approach facilitates emotion detection based on user input, particularly helpful for projects involving user interactions. Similarity assessments occur across multiple languages, ensuring broader applicability in international contexts.
00:23:55.680 While many suggest using vector databases, we prefer Postgres with a PG vector extension. Using a single database enables us to maintain consistency and integrity. Handling logic programming across Ruby and Prolog can become complex due to request assumptions in threading models, leading us to implement a JSON RPC for this integration.
00:25:21.920 The DSL we are developing encompasses AI logic. For example, a predefined emotion can dictate specific AI behaviors, like decision-making processes modeled through logic rules. By integrating the AI core within this DSL, we can ensure that outputs meet predefined formats.
00:28:23.840 This DSL allows us to create stable and complex AI applications by leveraging Ruby and large language models. We're working towards open-sourcing this project by April or May next year. We aim to minimize disruptions and provide our developers with a streamlined experience.
00:29:38.320 Thank you all for attending this talk.
00:30:04.679 Now, let's open the floor for questions.
00:30:38.960 Thank you for your attention and participation. If no one has further questions, we can conclude today's session. Thank you, everyone! I appreciate the discussions and your attendance. Let's take a lunch break, and the next session will start at 1:00 PM. Thank you!