Talks

Event Sourcing

@blimpyacht
Event Sourcing is powerful way to think about domain objects and transaction processing. Rather than persisting an object in it's current state, event sourcing instead writes an immutable log of deltas (domain events) to the database. from this set of events, an object's state is derived, at any point in the past, simply by replaying the event history sequentially. Event sourcing is a deceptively radical idea which challenges our contemporary notions about transaction processing, while also being a mature pattern with a long history. This talk will take a look at how event processing is used across a spectrum of use cases, including database engines and financial systems, to Google Docs hacks.

Talk given at GORUCO 2015: http://goruco.com

GoRuCo 2015

00:00:14.170 Oh, everyone! My name is Bryan Reinero, and I work as a software developer advocate at MongoDB. Another thing about me today is that it’s my birthday! So, standing here means my wish came true. Let's talk about event sourcing: what it is, how it can be ubiquitous, and how I can assert that you've never heard of it before. More importantly, why might you want to use it? Well, let's dive into an example: let's build a banking app.
00:00:30.850 If I’m building a banking app, I’m probably going to need to persist state in some form back on my data store. This seems like a pretty obvious problem to solve. I’m going to have users A and B, and they both have balances. The balance describes the current state of their accounts. When I want to do a transaction between the two accounts—let's say User A buys something for five dollars from User B—that’s easy. I just credit the account of User B while simultaneously taking away five dollars from User A. However, the problem is that if I’m thinking this way, I may already be doing it wrong.
00:01:06.580 Why? By design, I’m throwing away useful information. The only thing I’m persisting right now is the current state of the users. I’ve lost the history of what I’ve done or how they got there. So, if there’s any corruption or data loss, I can’t trace back to what happened. I can’t recover from an attack where someone changes my state. Instead, event sourcing was first proposed by Greg Young and Oody DeHaan, and it is often used in conjunction with Command Query Responsibility Segregation, or CQRS.
00:01:52.179 Event sourcing tells us to forget about the current state. We care about the current state, but we won't persist it. Instead, we will log events that change our state. In this case, instead of updating a user’s account directly, I’m updating the transactions that would change that account. I can derive my state by replaying the event log. What’s more, if I lose the current state, I can always regenerate it by replaying the event log. I can even look back and tell you what the world looked like at any point in time, allowing for temporal queries.
00:02:26.260 This is actually quite useful if I've learned something new about how users interact with my tool or website. I can go back and see how many people I could have reached in the past, provided I’ve kept that information in the log. Now, is this more secure? Consider if someone tries to attack me by sending erroneous data. No problem, because I have a log of the attack. I can replay up to the point where I detect the attack, and since I have a log of the attack, I can analyze it and protect myself in the future.
00:03:01.569 Now, for those of you who might think that this is a radical new idea, it’s not untested; it's surprisingly mature. This concept dates back to ancient history. For example, the 'Res Gestae Divi Augusti' translates to 'The Divine Deeds of Augustus,' which is Caesar’s record of payments to his Legionnaires. He recorded payments rather than maintaining a current state so he could know his accounts at any given time. This illustrates a form of event sourcing—durable storage of important information.
00:03:49.720 For those of you who appreciate technology explained in modern contexts, you might already guess that this pattern is used in version control systems, like Git. When I change something in a Git repository, I'm logging the changes in a way that allows me to always rebuild the current snapshot or revert back to a previous state. This also includes merges and branches that can be complex but powerful, particularly in continuous integration environments.
00:04:53.800 Similarly, databases also leverage event sourcing internally. In the case of MongoDB, we use replication. Our primary node writes to what's called the OP log, which records write operations from the database. Secondary nodes then follow the OP log to stay consistent with the primary. The neat thing about the OP log is that, similar to event sourcing, I can start at any point in the past and replay to achieve the current state.
00:05:12.220 There’s a challenge with event sourcing: if every time I want to derive my current status I have to go back through the entire log from the beginning, it can be time-consuming and may blow out my cache. The solution is to create snapshots at given intervals. When I need to go to a targeted state, I'll simply go back to the last snapshot before the target and replay the OP log from that point.
00:06:17.430 This approach actually has roots in the 14th century when a man named Luca Pacioli introduced dual-entry accounting. In 1495, he documented an accounting system where transactions were not just recorded as, say, 'I owe you,' but in a way that allowed for both accounts payable and receivable to be tracked. This innovative method supported explorations during the age that followed. We can correlate this dual-entry method with Command Query Responsibility Segregation (CQRS), separating commands that write operations from queries that read data.
00:07:02.270 CQRS inherently recognizes that the way we command—write operations—differs from how we read or query data. Given that read operations usually far outnumber writes, separating these operations allows for optimized scaling. This separation also implies that if I’m asymmetrically scaling my readers, I can use eventually consistent data sources, such as MongoDB, to handle loads distributed across secondary nodes. So that’s basically event sourcing and CQRS. I'm looking forward to discussing this with you further! And one last thing, you can reach me at @blimpyacht. I love reaching out and discussing these kinds of topics, so hit me with your best shot. Thank you!