Talks

Event Sourcing for Everyone

Recorded in June 2018 during https://2018.rubyparis.org in Paris. More talks at https://goo.gl/8egyWi

Paris.rb Conf 2018

00:00:17.830 Thank you! Hi everyone, my name is Jenna.
00:00:21.470 I'm also a developer at Shopify, so come talk to me afterwards if you're interested in what living on the rails edge is like when you're not a member of the Rails core team.
00:00:27.199 Today, I'm going to be talking about the software design pattern known as event sourcing.
00:00:30.980 This talk is aimed at anyone who has never heard of event sourcing before, or maybe you've read a few blog posts but haven't really dug into the code or how it would be implemented. Perhaps you're an expert, and some things might not be new to you, but you're looking for a mental model for how to explain these concepts to other people on your team.
00:00:44.690 We'll talk about what event sourcing is and what kinds of issues it aims to address. We will look at the key components that make up an event-sourced system, and then we'll dive into an example of event sourcing. But before we get into it, let's take a look at how we typically design our systems using a simple database-backed web application.
00:01:08.060 In our simple app, we have users who own accounts. These accounts have email addresses, and we track whether the account is active or not. We also have different plans that we offer our users, such as free or paid subscriptions, and we maintain some timestamps for various actions.
00:01:29.690 Now, let's consider the life cycle methods typically associated with a user account. When a user first signs up for our service, we create a brand new record for them in our database and set their 'is_active' boolean to true. If they then decide to upgrade their plan, we gather their banking information, likely stored elsewhere, and change their plan attribute to 'paid'. Everything is fine until their banking information becomes invalid, at which point we have to disable their account.
00:02:04.310 When they fix their banking information and register again, we retrieve their account, store the updated banking details, and set the 'is_active' boolean back to true. If the user purposely decides to disable their account, this process illustrates an important point: the very first time we created their account, we made a brand new record. However, for every subsequent action, we did not update that original record in place, which would have overwritten previous data.
00:02:36.080 The historical state information can be valuable for debugging production issues or addressing customer concerns. To mitigate data loss, we implement various strategies, like adding logging and sending logs from our production environment into queryable systems. This allows us to retrace what happened in case of problems. Sometimes, we need to add more metadata or columns to our records to contextualize what occurred, or create snapshots of our objects whenever the record changes. However, while these are decent strategies to reduce data loss, we still miss the question of why that record was changed.
00:03:06.390 Event sourcing is designed to address this issue. In an event-sourced architecture, the business logic driving the change from one state to another is what we persist. We can arrive at the current application state by replaying all these events in sequence. The event itself is treated as a first-class citizen that contains all the necessary information for transitioning from one state to another.
00:03:33.840 Events in this context are immutable and append-only, meaning if there's incorrect data, we don't just edit that record; we create a new event to reverse the change. The current state of our application is disposable since we can always rebuild it by replaying all the events. Proponents of event sourcing argue that this provides a better model for representing real-world business processes.
00:04:08.410 In reality, something happens and the record we maintain is merely an artifact of that event. Additionally, we don't rely on a single database lock to ensure related events occur simultaneously; if one fails, we don't need to roll back everything else. Event sourcing enables the development of a universal language, which is extremely valuable when collaborating with interdisciplinary teams. Whether working with domain experts, UX researchers, or designers, discussing your system as a series of events is often clearer than conveying it through convoluted database states made up of multiple attributes.
00:04:47.250 Before diving into the other components of an event-sourced system, I need to briefly discuss CQRS—Command Query Responsibility Segregation. Although this topic could fill an entire presentation, the main takeaway is that in some cases, it's highly beneficial to separate the read side from the write side of your application. Understanding this separation simplifies the mental model of how event sourcing operates.
00:05:38.780 The write side may prioritize data integrity and optimizations for inserts and updates, while the read side focuses solely on efficient queries. When you decouple reading from writing, the flow of data in your application can become more complex. Now, let's take a look at the data flows in typical applications to compare it with an event-sourced application.
00:06:16.820 In a typical application, a client can make a request, which gets routed to the appropriate service or object that knows how to handle it. After some validation checks, we proceed to fetch data or make changes while recording that transformation in persistent storage. Finally, we return a response to the client. On the read side, the client may query the application for information, retrieving data from persistent storage before sending it back.
00:06:35.800 Now, let's see how this looks in an event-sourced architecture. Here, I will introduce some new terms which I will define clearly. It's essential to avoid getting lost in technical jargon as it complicates the learning process. Starting with a client, the client issues something called a command, which communicates intent—what the client tries to achieve. This command will be validated against our domain model.
00:07:07.600 Our domain model contains all of our business logic and includes the nouns of our system, such as accounts, customers, and products. If the command passes validation, it becomes an event—essentially a statement of fact stating that something has happened. This event will be persisted in some sort of event store, from which we can replay all events to create projections that our client can query.
00:07:39.980 The interesting aspect of this architecture is that our domain model and the projections don’t need a one-to-one relationship with each other. We can keep all business logic, schema validations, and more in our domain model, while the projections can be tailored for what the client needs. As a result, we can have multiple projections to cater to different types of clients and queries.
00:08:18.270 Moreover, we can utilize various technologies for our projections depending on our needs. Going back to define those components: the command conveys intent and can be rejected; the domain model includes business logic; the event contains all necessary information for changing one state to another; and the projection serves as the read-only version of our current state.
00:08:57.130 Next, let’s examine a practical example, using an account class. We'll first look at what our class would typically be like in a traditional Rails app. Then, we will see how to incrementally incorporate some event sourcing patterns in our system.
00:09:07.850 If a full system rewrite isn't feasible for you but you’re interested in adopting some of the foundational concepts of event sourcing, I hope you'll find value in this. So, we start with our account class showing three key lifecycle methods: 'register', ensuring that an email address is in place before setting the 'is_active' boolean to true; 'change_plan', which incorporates the business logic needed to transition between plans; and the 'disable' method, which sets the 'is_active' boolean to false, triggering a job to send an email to users who manually disabled their account.
00:09:34.070 Let’s dive deeper into the 'register' method, where we see components of an event-sourced system emerging. The method 'register' acts as a command. At the outset, we perform some validations against our domain model, and if those checks pass, we execute actions. The foremost action we take is somewhat cosmetic—we wrap all relevant information into an event we name 'account_registered'. This name semantically encapsulates our state change and aligns with our domain experts so they aren't concerned with the implementation details.
00:10:31.710 Once we develop an event to represent our granular state changes clearly, the same process applies to, and we'll use event names like 'account_disabled' and 'plan_changed'. Subsequently, we’ll want to store these events in a repository and be able to pull them out to replay when needed. To capture more information, this event object can also reference the unique identifier for the object it pertains to, alongside the payload that contains all the necessary information to transition that state.
00:11:06.300 Next, we will create a consistent interface for these methods, making sure that regardless of the event being applied, we treat it uniformly so that we can send everything to our event store and replay it using the same code. We’ll replace the method with 'apply', signifying that we are applying our specific event to our domain model, which we can include from a base object module applicable to any class that we want to derive from events.
00:11:43.760 As for the logic behind the 'apply' method, it serves to handle the event, updating our object to the latest state based on data contained within that event. Additionally, we will have a 'publish' method to save it to the event store should it represent a new event. In our example, we are using a straightforward in-memory event store, but we want to avoid getting bogged down worrying about whether we should use Redis, Kafka, RabbitMQ, or any specific technology for this purpose.
00:12:21.310 Our keys are defined by the unique identifiers of our objects, while the values stored are simply an array of all events tied to that object. When saving a new event, we check to see if an event list already exists for that object, appending it if we find one. When rebuilding an object from its events, we instantiate a blank version and assign a unique identifier, then call our event store to fetch any relevant events.
00:12:53.230 To retrieve those events, we will utilize a method called 'rebuild', which will trigger 'apply', just as a command would, but with the 'is_new_event' boolean set to false. This way, any listeners monitoring events being fired recognize the difference between an event generated by a command and one occurring simply because we are rebuilding our objects.
00:13:32.580 Next, I will discuss the concept of subscribers—these are classes or objects that want to be notified of an event occurrence. For instance, a reactor is any part of your code that performs additional functions because an event was fired, containing its business logic. A projector, on the other hand, maintains the read-only version of our current state, updating upon receiving events.
00:14:18.560 Our reactors will examine the event to ascertain if an account was disabled and subsequently trigger a job to send an email to the user. However, messaging notifications shouldn't be sent when merely rebuilding objects, ensuring users are not disturbed by an unexpected email when these are functioning as event projectors. Projectors create various read-only views based on the received events, with the flexibility to choose how often we maintain these projections.
00:14:55.110 As we continue designing these components, we face a potential naming collision: we're attempting to perform two functions at once. When rebuilding our object from events, we might also use Active Record to create read-only projections. Consequently, there is a pressing need to distinguish between objects that represent our read side and those that represent our write side.
00:15:43.030 The simplest approach to resolving this is by assigning unique names to our classes, such as representing write side transactions with an Account object and using an empty class for Active Record to symbolize our projections. This allows us to create as many as we require, ensuring the specifics of each do not need to mirror those defined in our primary Account object but can be tailored for each distinct view.
00:16:29.620 As we finalize the overview, let's consider how our Account object operates. The 'apply' method simply calls upon the events, allowing us to strip these commands from our domain model and give them dedicated classes with respective methods. Thus, a command encapsulates its initialization arguments and is validated against the state of the object during execution.
00:17:15.510 This allows for simplicity in modeling the business logic as we become more precise about our operations. We will have a shared terminology with interdisciplinary teams, scaling reads and writes without coupling them. This also provides traceability and auditability as a natural byproduct of our design.
00:18:05.800 However, working with these systems is not always easy. Naming can be particularly challenging. While we may only be updating a few simple attributes, we often have extensive variables to formulate, and the implementation logic can be straightforward but still necessitates more effort to accomplish basic tasks.
00:18:35.440 Managing schema changes, for example, can become complex if our account level or event schema changes, leading to questions about how to maintain versioning. Eventual consistency will arise if you're not reading and writing from the same storage location; thus, the data you just wrote might not be immediately accessible, complicating user expectations.
00:19:08.020 Managing side effects is another potential issue, such as ensuring we don’t send emails when rebuilding an object. Additionally, creating and accumulating significant amounts of data can overload our systems.
00:19:34.110 The title of my talk may have been a bit misleading; while event sourcing isn’t for everyone and certainly not for every application, it addresses some common concerns that surface as applications grow in complexity. As we expand, we need to decide on effective ways to manage that complexity. Thank you very much for your time!
00:20:17.770 I learned a lot while preparing for this talk and have included all the resources I gathered in the README of this repository. It also contains example code and I'll make my slides available there too.
00:21:00.060 I'll take questions afterwards. You can find me at the Shopify booth or outside. Yes, please come find me afterwards.
00:21:10.200 OK, I understand.
00:21:10.660 This topic is too complex to cover in just a few questions.
00:21:10.950 But I would love to answer questions, if possible.
00:21:11.310 So does anybody have questions about that?
00:21:14.050 Please be kind with your questions.
00:21:14.920 Hi, thank you for your talk!
00:21:17.080 I was wondering if you plan to event source all your models at Shopify, or if not, what kinds of objects are you planning to target first?
00:21:21.350 That's a great question! The first thing we considered when exploring event sourcing was restructuring our logging process. Our logs were chaotic, with numerous records passing through state machines and important business logic that could either succeed or fail, hence debugging customer issues became extremely difficult.
00:21:51.020 When we modified data in place, we often lost track of information, leading to perplexing bugs that concealed themselves until we searched for answers. The initial step we took was to refine our code architecture to conceptualize events better, thereby improving our logging structure to facilitate easier querying.
00:22:05.360 When considering which models to initially event source, the worst mistake is to naively decide to replace everything for the latest concept learned from a brief presentation on event sourcing. Instead, focus on models that are inherently event-based or which require a clear audit trail.
00:22:28.540 Do you have any other questions? If so, I can answer them.
00:22:36.270 Responding to the last query, you asked whether it is easier to refactor and extract components when event sourcing is applied as opposed to using a traditional framework. I would say that this presentation may have been somewhat verbose—many methods and classes have been named specifically to make concepts clear at first glance.
00:23:27.810 Other demo frameworks out there employ meta-programming and naming conventions based on the idea that the command name mirrors the event name. However, I found that approach gave rise to additional complexity despite its clarity once you grasped the concepts.
00:23:37.770 As you familiarize yourself with event sourcing, the focus on business logic becomes clearer without being bogged down by event operations. Could I ask for your feedback on this aspect in particular?
00:24:02.780 Hi, thank you for the talk! Regarding the in-memory store showcased in your slides, which technology would you recommend for a production-ready solution that aligns well with Ruby? Well, it truly depends on your specific needs. RabbitMQ is a good choice if your concern lies with message brokering, thanks to its multi-language compatibility.
00:24:57.660 Redis is another excellent option for having straightforward key-value storage, and one could even utilize MySQL, as you don't need to rely on anything overly sophisticated to get started with event sourcing.
00:25:16.590 If your events are set up as Active Record models, they'll support all of your needs while allowing for easy interaction with their data.
00:25:36.660 As you build your architecture, consider adopting a consistent interface for events—it’s critical for your implementation's adaptability. This way, should MySQL become insufficient, you can easily adjust and transition without too much friction.
00:26:18.290 Regarding the gameplay of replaying events to recover model states based on event-based designs, be prepared for potentially large volumes of data that require management.
00:26:32.840 Performance considerations grow as the size of the event data historically expands, which leads to the importance of applying practices like snapshotting—where you take periodic snapshots to prevent sluggishness when rebuilding states.
00:26:43.840 Utilizing snapshots allows you to discard excessively old data that is less relevant but retain the necessary detailed history to replay current state changes as needed.
00:27:18.360 In response to your question about version management: snapshots do not ultimately resolve all issues surrounding schema changes, particularly when it comes to applying events to new model structures. As important as snapshotting is for performance, it can introduce complications when event schemas transform.
00:27:48.300 There exists a comprehensive paper titled 'The Dark Side of Event Sourcing' that elaborates on various strategies to manage those complexities.
00:28:09.850 From my perspective, there is no ultimate solution that would cover all scenarios.
00:28:20.360 However, I would encourage you to familiarize yourself with the materials I referenced, which I have included in the repo.
00:28:35.300 So, thank you very much!