Nathan Ladd

Event Sourcing Anti Patterns and Failures

wroc_love.rb 2018

00:00:24.010 Hello everyone, how's everyone doing today? Good? Awesome! I'm here to talk to you about everything that you're doing wrong with event sourcing, which is one of life's greatest pleasures.
00:00:30.130 I'm Nathan Ladd. You can find me at real_ntl on Twitter. As mentioned, I am a co-principal of the Event ID project, a toolkit for building event sourcing projects using event sourcing as well as autonomous services.
00:00:35.380 Before I begin, I notice that on the schedule, I am the first speaker presenting about event sourcing. I hope to give a brief review of what event sourcing is and save the other speakers a bit of time. If I do a bad job with my review, they'll have to explain it again and correct me, so I hope I help them out.
00:01:02.770 Before I talk about event sourcing, I want to go back to some basic web application concepts, specifically web forms. I want to refer to web forms by a more general term: commands or command messages. This is a typical web form that your browser submits to your app server.
00:01:14.650 Your web server handles that form command. I'm calling it a command message because in the context of microservices, and for instance, in a Rails app, we are primarily handling commands and command messages. When you have a web controller, like an MVC controller in your web app, you can also think of that as a command handler. The important thing to note is that our handlers are the entry points of our code.
00:01:36.069 Around that handler, you have frameworks, machinery, and plumbing, but the handlers are where your code is first executed. I want to explain how a typical MVC plus ORM web framework handles form submission. For instance, say we have a command message to deposit funds into an account. I'll demonstrate how this is handled by your web MVC framework.
00:02:06.759 The first thing that happens, usually in your controller, is that you retrieve your model. Then, you figure out what changes to make to the models. For example, you might increase the account balance. After that, you apply those changes to the model, typically using some attributes API like ActiveRecord. Next, you validate the new model state and save it if the new state is valid. Finally, callbacks get triggered, which can cause other models to update and might cue background jobs, sending you right back to where you started.
00:02:33.010 This architecture is what many people tell me is much simpler than event sourcing, but I have my doubts. In event sourcing, you handle your command by projecting your entity. I'll explain what a projection is in a moment. First, you accept or reject the command, just like before, and then you write an event, specifically a deposit event, and write it to your database, your event store. After that, you're done.
00:03:16.840 The projection I referenced is where we gather all the events that pertain to a single entity or model. If you're in the Rails context, you gather the events together and then distill them down using a projection into a single entity that represents the state derived from those events. When we've written events to a stream and want to project it, we visualize reading the stream sequentially. You always start with a blank or empty data structure for your entity and sequentially traverse each event one by one.
00:03:54.010 Let's take an example: we start with an opened event and copy some attributes. Next, we read a deposited event because that's the subsequent event in the stream, increasing the balance by $11, from $0 to $11. Then we handle a withdrawal for $10, resulting in a new balance of $1, and so on. That concludes the introduction to event sourcing. If anyone has questions, you are welcome to ask. I'm a bit early on time, but I'm here to talk about the potential pitfalls that can arise once you decide on this architectural style.
00:04:51.110 There are many common anti-patterns and mistakes I have seen teams make, which cost time and money. Usually, these problems stem from continuing to perceive software development through the lens of an ORM. I would like to discuss four anti-patterns that can lead to issues in event sourcing. The vital point to note is that mistakes are much more costly in an event sourcing system because your database is immutable. This means that if you make a mistake while developing software using a relational database, you can change the data to correct it. However, in an immutable database, you essentially lack that option.
00:05:52.930 There are approaches and techniques for remediating bad data in an event sourcing system, but they are typically much more expensive in terms of time and cost. So, if you avoid these anti-patterns and pitfalls, it won't guarantee success, but it will improve your odds. The first anti-pattern I want to address is entities that are aware of the messaging around them. In event sourcing systems, we handle command messages and publish events, and usually, there is a correspondence between the command and the event. For example, we issue a deposit command and write a deposited event. This involves messaging.
00:06:15.240 Next, I will show you what an entity is like when that entity is unaware or hasn't been coupled to the surrounding messaging. It's easier to read and understand the code. In fact, it doesn't even need syntax highlighting, which is helpful. You can easily read the code, and it makes sense. If you have used ORMs for a long time, you might approach this entity with skepticism, wondering how it can be this simple. However, as you continue working on it, you might accumulate more baggage around the entity.
00:07:25.659 There's a significant point made that your entities should consist of your core business logic. There are implementations where your entities can remain as straightforward as this, making them more reliable and easier to work with. View messaging should be thought of as peripheral that connect your entities with your user interface, front-end, other entities, the underlying operating system, and other databases. Once we introduce the responsibility of messaging into our entities, we immediately see that the cohesion of the entity has substantially reduced. For instance, the method deposit now takes a command message as a parameter, decoding or interpreting the attributes on that message.
00:08:24.760 This interpretation of what a message means is a messaging concern, and it belongs in a message handler. An entity that is coupled to event sourcing can only be used under that condition. You must always be aware of event sourcing when interpreting what this class does. In reality, the entities I build often look different. I could switch from event ID to another ORM and keep the same entities. They are the most crucial part of your business logic, and if you analyze them closely, they serve as specifications for what an account does concerning deposits and withdrawals.
00:09:18.810 The next anti-pattern to discuss is view biases to event schemas. When we display information for users, we present them with a view of information. Typically, these views contain data that lives in multiple different event streams, or some data might be in an event source system while others may reside in a relational database. Building views with event stores can be problematic because our event stores are log-structured.
00:10:10.370 They can only be queried based on an entity's ID. For example, you can't query to get all accounts that have had activity in the last 90 days. Relational databases excel at these tasks, tying together information from different parts of your domain and joining them to present a unified view.
00:10:36.000 When we build event source systems, we typically write events, handle those events, and build materialized views out of them. If you're familiar with Domain-Driven Design (DDD), you may refer to these as read models. I prefer the term 'materialized views' because that's what they have been called for decades. The slide shows an example of an account transactions table along with a transactions categories table, which serves as a view we want to display for our end users. Each transaction has a category ID that can be used to join the categories table for additional information, such as identifying an $11 purchase of office supplies.
00:11:15.970 A common problem encountered when building event source systems arises when trying to project an event, like a withdrawal event. The view requires a category ID column, but the event you're dealing with doesn't provide that ID, creating an immediate problem. The temptation is to change the event schema to add the category ID. However, this introduces a problematic direction of dependency where we design our events, which are part of our core domain, to satisfy our UI needs. Alterations to your event schemas are among the most expensive changes in these systems. The addition of a category ID can invalidate your existing data based on potential changes in heuristic algorithms used for categorizing imports. It is crucial that your events represent what occurred and nothing else; they should not serve as conduits for other data to make UI functionalities work.
00:12:29.090 Typically, a challenge for newcomers to events sourcing systems is figuring out how to build a whole user interface at once, encountering many obstacles in their path. One term I want to focus on is 'tie' because it represents a relational database way of thinking concerning how different entities relate to each other. The alternative approach that works better is to compose the view database from different events and isolate where the events originate so that they don’t all need to be in the same event or stream. Additional techniques can aggregate events, producing any schema you want, including a view database that combines a category ID with your transactions.
00:13:58.550 There is a variant of this anti-pattern where developers accustomed to using Rails expect to utilize the errors object to show validation errors on web forms because they are familiar with how that builds UI. The issue with this is that errors represent a negative disposition in the UI domain. For instance, a withdrawal that causes an account to go into the negative is not an error from a banking perspective that charges $20 every time an account goes negative. We shouldn't disposition events as positive or negative; that's a UI concern. We must build systems that distinguish between our user interface and our backend logic.
00:14:56.180 The next anti-pattern is opaque dependencies. Opaque implies that you can't see through or control something, and this can cause significant issues in any software system. In event sourcing, opaque dependencies manifest in different ways. First, if your command handler receives a projected event as input, the projection becomes opaque because there is no control over it. For beginners, this is terrifying because they learn about event sourcing and can't find it in the code—it's buried within some external framework. If you want to introduce caching or change your caching strategy, you'll have limited means to do so without relying on additional settings or configuration.
00:15:57.180 Another issue arises when entities are not required. For example, while validating a unique email address, you can write a reservation event to ensure that the email is unique without requiring an entity. In certain cases, the uniqueness check resembles the Rails design but avoids race conditions. Additionally, when updating materialized views, the entity serving as a view entity may contain data from a relational database, meaning you don't always need an entity. Occasionally, telemetry handlers are set up to record or signal metrics any time your company receives a payment, which may lead to complications. Similar to an airplane’s black box that logs all the information for analysis after a crash, telemetry helps in the analysis of performance.
00:16:45.390 An alternative approach often adopted by teams excited about functional programming, involves wanting their handlers to receive commands and return events instead of directly writing events themselves. However, this method presents issues because if the handler's purpose is to process a command, it must end once the event is written. If it merely returns an event, it leaves the task incomplete. Databases operate on computers where writing to the database can fail. Transferring the responsibility for writing events to an external actor introduces uncertainty, as that actor must decide how to respond to failures, which indicates that the handler hasn’t done its job.
00:17:56.080 Furthermore, expected versions can complicate event writing beyond simply returning events and critically protecting against concurrency. When multiple threads are processing commands for the same entity, it leads to race conditions. The expected version acts as a safeguard against these concurrency issues, but you should only use it when required. For instance, if a handler sends an API call to charge a credit card, you do not want any potential failure in the process that could lead to charging customers multiple times. Therefore, it's crucial that handlers maintain control over writing messages because the circumstances and techniques for writing can vary significantly. Having control over dependencies is essential in software development, yet it's often overlooked, particularly among teams who have never attempted it.
00:18:59.930 We commonly rely on various tools that help control interactions, like mocking databases to bypass actual interactions, but as systems increase in complexity, our capability to revisit earlier designs diminishes sharply. In our systems, we test handlers without hitting the database, by using substitute writers that can be controlled. This leads to the necessity of examining the aggregate root class pattern, which has shown to lead to complications across various clients. This pattern bundles the responsibilities of an entity, message handler, and projection into a single class for convenience. This results in lost benefits from event sourcing, as entities should be tested in isolation, not merged into multi-responsibility classes.
00:20:06.060 Creating and testing separates entities and projections allows for better isolation, which is more manageable. However, one major reason I observe aggregates being so popular is due to developers' discomfort around object-oriented programming. There is a tendency to derive classes that encompass multiple responsibilities into one aggregate class, which seems convenient short-term, but becomes problematic for long-term design. Additionally, many teams argue for this dynamic, stating that initially, boundaries aren't clear, and so they prefer to build quickly first. However, this mindset is flawed—cleaning supplies don't lead to scientific breakthroughs; coding isn't about trial and error, but understanding the end goal through engineering.
00:21:03.290 A common issue arises from the community leading to the spread of disastrous design patterns from one developer to another. We often look toward community leaders for guidance and accept their principles without adequate scrutiny, which eventually causes misguided practices based on popular conviction rather than empirical validation. Therefore, it’s essential to question techniques and methodologies critically. Although communities can foster wonderful ideas, they can also impede creative and rational thought. I have personally experienced this and appreciate having colleagues who help eliminate my misunderstandings and failures.
00:21:58.660 The penetrative nature of unhealthy community pressures makes it crucial for us to evaluate failures objectively and not solely rely on community theory or tool-based learning. When transitioning to event sourcing from traditional paradigms, it's not the tools or systems that lead to failures but rather the coder’s implementation. It’s imperative to understand the root causes of problems rather than hastily migrating to new technologies. Popular methods should undergo scrutiny and be approached with caution. I look forward to conversations around this notion; gaining insights from discussions is invaluable. Embracing complexity is a critical learning opportunity.
00:22:50.650 The final message I want to impart is to measure twice and cut once. In the past, we would often criticize the waterfall approach for taking too long upfront. But we must appreciate that in the current environment, taking an adequate amount of time for design work is essential. Spending a week to design just one service, while others may rush through minimal planning, should not be criticized as waterfall; we can afford more focused design effort. Proper design work helps virtually eliminate errors while ensuring that everyone on the team understands the intricacies of what we need to achieve before diving into the coding process.
00:23:47.180 Always be part of the design process; meaningful discussions are necessary for understanding the encapsulated ideas we’re attempting to progress. Additionally, when confused about appropriately censured projects, spend the time to engage in reflective planning instead of hastily executing implementations. This principled approach, where we contemplate the practical engineering techniques and the rationale behind such choices, is pivotal to setting the right foundation for any current and future projects. Thank you for your time! I’m @real_ntl on Twitter. Check out Event ID when you have a chance as well—it’s in development, but usable in production environments.
00:24:26.650 I appreciate all your engagement, and I am happy to answer questions. Starting with the first inquiry from the audience: 'What is an entity?' An entity can be viewed as a code structure with defined attributes, typically accompanied by specific methods—for example, handling deposits within an account. I refer to it from a coding perspective, focusing on data structures containing appropriate methods for business logic. I'll show the different aspects of entities and how they work with events.
00:34:07.370 One more question about how to operate within this realm of event sourcing was posed. When faced with complex commands and validations, especially while checking whether sufficient funds exist, we realize that these decisions should reside within the handler domain. It’s important not to mix responsibilities; for the purpose of thorough design, we’ll need to allow flexibility to iterate and understand particular areas gradually. Handling withdrawals must also stay sensible so that clarity is maintained between commands and events—an essential trait of solid architecture.