Introduction To Event Sourcing How To Use It With Ruby

wroc_love.rb 2022

00:00:16.560 Hi everyone, I'm Paweł. Can you hear me well?

00:00:22.480 Alright, awesome! I work as a software developer and consultant in a software house called Visuality in Wrocław. I'm extremely interested in topics around domain-driven design, event sourcing, event-driven architecture, and everything related to events.

00:00:34.160 But enough about me. Today, I'm going to tell you about event sourcing and what you can do with it in Ruby. We'll talk about what it is, go over some basic concepts you cannot really skip when discussing event sourcing, and provide a short introduction to a toolkit that you can use in Ruby.

00:00:50.399 So let's start. Meet Mary. She's a client of our e-commerce platform and has been with us for years. She has made only one purchase, and we also know that she's into chess and interested in crime novels. How do we know this? Our e-commerce uses a classic relational database. We have a users table, with an entry for Mary—her name is Mary Jane Parker, and her email is married.watson. Well, it doesn’t really match, but whatever. We also have the creation date, so we know when she created her account.

00:01:15.920 I've mentioned that she has made only one purchase. Thus, we have an orders table with some order items, as traditional as it gets. As for her interests, we have yet another table that joins interests and users, so we know that she has expressed interest in certain subjects. To sum it up, we have data stored in a relational database, and that information is persistent and serves as our source of truth. There were various events while she was using our application; she made some changes, but those changes were transient. We didn’t pay attention to them and only recorded the final state, which is saved in our database. So, we know who Mary is and what she has done within our system.

00:02:01.759 Now, let's meet Mary again, but this time we'll track every change she introduced to our system. Once again, she is a client and has been with us for years. At some point, she created an account with the first name Mary, last name Watson, and now the email matches: married.watson.com. Over the years, she has changed her first name and last name. Later on, she made one purchase and adjusted the quantities of items in her order. Initially, she added too much, but then she removed some items and finalized the order. As for her interests, we already know there are two. She added interests but later removed them and replaced them with something else.

00:02:48.000 If we sum up all the information, we know that she is still our customer and has been with us for years. However, we also know that she has adjusted her first and last names, possibly due to marriage or divorce. We still know that she made only one purchase, but we now also know that she is interested in a survival kit, on top of that we confirm she's into chess and crime novels, although she used to be into hiking and sailing. Now, the data is stored as events, and those events are persistent, serving as our source of truth. We can sum up all the events and determine the current state. However, that state is transient; we can always forget about the final state and rethink the events to analyze the entire history.

00:04:08.880 We can now understand who Mary is, the current state she is in, and why. By analyzing all the steps, we can reconstruct the history of what has really happened. This approach is called event sourcing, which is a pattern for storing data as events in an append-only log. We store events and append them at the end of our log, never modifying past events. Just like in real life, we would like to do that, but we cannot; we can only add new events at the end of our log. It's essential for event sourcing that these events are stored in the order in which they occurred.

00:05:00.800 Let's see where it could be useful to keep all the events from the past. Imagine an application where team members create documents. They have to say they are done, and someone has to approve it. A normal path would include someone creating the document, marking it as done, and then someone approving it. However, consider the alternative where someone finishes a document, but it gets rejected. The writer would have to fix the document, change it, finalize it, and then get it approved. What if someone routinely gets rejected or consistently rejects others? If you only have a simple field in your documents table in your relational database that shows it's approved, you miss the entire history.

00:06:30.240 Let's take a classic example of a shopping cart. On the left, we have two items in a cart that someone finalized later. On the right, another user added two items, added a third, removed it, and then finalized the order. Are these the same or different? They appear similar; both carts have sold the same amount of products, but the information behind them is vastly different. The person who ordered from the right could have had various factors influencing their decision, such as financial issues or external pressures. This insight into client behavior is invaluable, offering marketing opportunities. Regarding cargo location and shipping, let's imagine we have a large vessel traveling overseas to Miami and need to track its location. If we only update the current location when there is new information, we might miss crucial measurement events like passing through the Bermuda Triangle.

00:08:11.680 The takeaway is not to lose data if you don’t have to. If there is a point in your application that users or systems can reach via two different paths, and you don't know which path was taken, you sacrifice valuable data. The long-term implications of lost data can be significant, as you may not realize its importance until years later. That's event sourcing: a pattern for storing data in an append-only log. If you want to discuss event sourcing, be aware that the learning curve is steep. You need to grasp several concepts before engaging with it meaningfully.

00:10:02.080 We will go over these concepts so you are well-informed, and we will discuss them in a context, as there are many technologies and approaches to event sourcing, especially in Ruby. This discussion takes place in the context of a Ruby toolkit called Eventide, a toolkit designed for creating event-sourced autonomous services. However, you can apply it to any type of project. It features an event store called MessageDB, which allows you to store events, and a test framework called TestBench, which can be used for any Ruby project, including Rails.

00:10:59.040 Event sourcing primarily revolves around the concept of events. You can describe an event in different ways; in this context, it is a message. If a component performs an action, it publishes an event, broadcasting that a change has occurred. An event is a message designed to convey a specific purpose. For example, someone might deposit money into a bank account, or a user could be registered. When an event occurs, it has a corresponding cause, which is referred to as a command. A command requests an action to be executed, while an event is the outcome of that action. Both commands and events are just messages.

00:12:01.600 In the context of the Eventide toolkit, we can define messages in code as very simple Ruby classes using basic syntax to specify their attributes. This allows us to establish both commands and events as messages. For instance, consider an order entity with ID seven. You can request various actions from that entity, like adding a line item or updating a shipping address. Each command we issue to that entity is organized into something called a command stream. Streams are the fundamental unit of message organization. You do not create streams; instead, you create messages that belong to a particular stream. In this way, the command stream for an order looks like this, allowing us to track the commands associated with this specific entity.

00:13:06.960 Once we process commands, they might produce corresponding events. Similarly, just like a command stream, there is an event stream. This stream is crucial because it contains all the events that have occurred, representing every change associated with the entity. In essence, this stream holds all the information about the entity.

00:13:50.720 Now that we have commands and events, we need to store them. We utilize MessageDB for this purpose. The key advantage of MessageDB is that it is implemented on top of PostgreSQL. Thus, to begin event sourcing in Ruby, you only need to have and know PostgreSQL. You can install MessageDB, which is essentially a single-table implementation that simplifies things greatly. While there is additional database-related complexity, fundamentally, it revolves around one table containing all the messages.

00:14:33.200 Let’s explore what's inside this table. It includes various attributes: global position, which serves as a sequence number for the message starting from zero and staggering potentially to infinity. The 'position' indicates a sequence within a single stream. We also track the timestamp of when the event occurred and the stream name, which we discussed earlier. The 'type' attribute corresponds to the message class, such as add line item or line item edit. Additionally, the 'data' field stores important messages in JSON format containing all the attributes and their values. Other metadata, also in JSON, informs how the message was created, relating to any other streams or previous events. Lastly, we have a unique message ID identified via UUIDs.

00:15:44.720 We talked about commands, events, and the need for a consumer to process these commands and generate events. A consumer can be thought of as a process similar to a Sidekiq worker that constantly reads messages from a single category. It processes these messages, applying the relevant business logic and producing events. In the realm of consumers, we find something known as handlers. Each handler is a block of business logic responsible for handling a specific type of message. For instance, one handler would cater to deposit messages while another handles withdrawal requests.

00:17:03.680 To illustrate, let’s consider a command handling example. Each handler processes the appropriate message type and can publish a new message at the end of its process. Additionally, a consumer does not solely read commands; it can also respond to events. This flexibility allows modeling any type of logic, where something reacts to an event. For instance, once an order is finalized, a corresponding event would be published, prompting other components to generate invoices, notify services, or handle logistics for shipping. In this case, we describe this architecture as a publish-subscribe messaging pattern.

00:18:07.760 Publishers publish messages without determining the specific recipients; they send messages to a broad category without needing to know which subscribers are reading them. On the flip side, subscribers are simply notified about messages within these categories. The beauty of this system is that publishers and subscribers operate independently.

00:19:25.360 Returning to the practical side, the message handlers will, for example, when they receive an add line item command, determine if the command can be fulfilled given certain constraints—like ensuring only a specified number of line items can be present at once. This verification is done by processing all previous events in the order they occurred. The result is a feature known as projection, where we derive the current state based on historical events. Additionally, projections allow us to make decisions informed by the aggregated history of events.

00:20:44.960 In terms of implementation, projections are foundational for business logic because they let you derive a static state using past events. An entity projection captures every event that has occurred, allowing for a comprehensive overview. The projection is preserved in a format that is optimized for querying, enabling efficient searches while retaining all historical data. A common use case involves projecting orders into a relational database due to their design which is optimized for read operations.

00:22:31.680 To provide a clear overview, imagine two accounts in a banking application. If a user deposits money into the first account and we track every relevant event, we project the final balance in the database as the current state. When designing this whole system, we consider trade-offs between future queries and foundational transactions. Therefore, projections become a structured read model allowing us to summarize orders based on accumulated events.

00:23:15.840 Over time, as projections evolve, decisions arise regarding how these projections are updated and maintained. For instance, in a relational database, we can define an accounts table where balances get updated according to transactions. Each event can modify the projected state, keeping it in sync without clearing previous states. When balancing flexibility and accuracy in projections, code clarity and system integrity must be maintained.

00:24:23.680 Now speaking of how to link consumers and projections, let’s reinforce the initial concept of a related command stream. When events occur, the consumer processes them to generate updates in the read model, ensuring the data flows consistently. By projecting the events into a relational database designed for this purpose, we foster the independence of read and write operations, enabling efficient data access.

00:25:14.800 There are other architectural considerations such as eventual consistency—where the data in the relational database may not always perfectly match the event states. Managing this inconsistency is a unique challenge of event-driven architectures but often a necessary trade-off to enable system flexibility and scalability.

00:26:05.960 In summary, we have outlined the essential aspects of event sourcing, focusing not only on the theory but also how it can be practically applied in Ruby projects. Understanding these principles equips you to implement event sourcing effectively within your applications.

00:27:00.000 We have covered the significance of proactive event handling, the fundamentals of message processing, and the tools available to create robust event-sourced systems. When seeking to transition your Ruby applications towards an event-driven architecture, familiarize yourself with the Eventide toolkit—an invaluable resource to get started.

00:27:48.960 As we draw this presentation to a close, I want to emphasize the necessity of exploring further resources. While I've shared my insights, I encourage you to dive deeper, experiment with examples, and fully embrace the dynamic landscape of event sourcing and event-driven architectures.

00:28:40.440 I’d also like to point you towards some key figures in the field. Scott Meyers and Greg Young are great resources for expanding your knowledge. Scott’s presentations offer transformative insights into Ruby and microservices, while Greg’s work provides foundational knowledge about event sourcing that is both practical and enlightening.

00:29:26.920 Thank you for your attention through this presentation! I'm excited to field any questions you may have regarding event sourcing, its applications, or specific tools.

00:30:12.080 Thank you very much! That was very interesting. On one slide, you discussed projections. Could you please clarify what assigning a sequence signifies in that context? I want to ensure we both refer to the same concept.

00:31:50.240 Certainly! The sequence denotes the order of events in the stream. It essentially provides an indication of the versioning of your projection, enabling you to determine how up to date it is.

00:32:34.960 Additionally, I have a question regarding projection in cases where changes affect multiple orders belonging to a user. How would those projections execute? Would they execute in parallel or serially, given that each projection relies on its respective event stream?

00:33:30.080 To clarify, each projection is based exclusively on a single stream. However, you could model interactions across streams, yet that requires careful design to avoid complexity.

00:34:00.640 You mentioned that we never lose data, but how do we deal with cases like GDPR compliance where we need to erase personal data from the system?

00:34:45.680 This is achievable by strategically modeling your data in streams. Keep sensitive user information categorized within a designated stream, thereby allowing complete removal if necessary without affecting the overall event history.

00:35:42.800 It's essential to always think about your data strategy before implementing your system. By considering how to erase certain data respectfully, you can prevent unintentional data loss, which might result from ill-conceived architectures.

00:36:24.480 One of the core challenges entrepreneurs face within the e-commerce model relates to drawing insights from customer behavior. As you enhance your knowledge of event sourcing, remember to consciously consider the power of the metrics you gather.

00:37:06.080 It’s important to strike a balance—identify how much data is imperative to retain for accountability and how much can be archived or discarded. Keep an eye on trends over your timeline, as they can provide clarity when strategizing future endeavors.

00:37:47.600 In light of the future, discuss your database structure. Do you think using NoSQL databases would suit event sourcing better than traditional SQL databases—considering the flexibility they offer when versioning events?

00:38:29.360 That's a valuable inquiry. The choice between NoSQL and SQL databases often hinges on what you aim to achieve with your architecture. NoSQL databases can provide greater flexibility in handling events, particularly during the versioning process. However, having adequate constraints and standards would yield a more resilient event sourcing design.

00:39:00.880 In conclusion, understand the trade-offs inherent in each architecture from both a pricing and performance perspective.

00:39:46.560 Let’s delve deeper into projections as it ties back into the core functions of your data model. Establish clear pathways on how data gets entered, processed, and ultimately recorded. By using this approach consistently, you can maximize the efficiency of your project.

00:40:11.760 Thank you for an engaging discussion! I'm looking forward to diving deeper into any of these topics.