Data Persistence

Summarized using AI

Event Sourcing, or Why ActiveRecord Must Die

Sebastian von Conrad • February 10, 2016 • Earth

In this presentation titled Event Sourcing, or Why ActiveRecord Must Die, Sebastian von Conrad discusses the limitations and dangers of using ActiveRecord as an ORM (Object-Relational Mapping) tool in Ruby applications. He argues that the principle of being able to delete or update data—common in traditional software development—leads to significant issues with data integrity and historical accuracy.

Key points include:

  • Immutable Data: In real life, past events are immutable, and any mistakes made cannot be erased. Similarly, in software development, data should be treated as an immutable append-only log of events.
  • The Dangers of Deletion: Conrad emphasizes that deleting data, whether intentionally or accidentally, is detrimental. He highlights that organizations frequently lose valuable data due to unexpected deletions.
  • Updates are Deletions: Updating data actually involves overwriting previous data, which leads to loss of historical information. He uses the example of email address updates to illustrate the point that prior information can still hold value.
  • Event Sourcing as a Solution: Instead of updating or deleting data, event sourcing involves storing all events that occur in a system. This method allows for reconstructing the current state of data by replaying these events.
  • Historical Integrity: He draws parallels to accounting systems, which have historically utilized this event-sourcing methodology to maintain a reliable record of transactions, further reinforcing the value of this approach.
  • Challenge of Mindset Shift: Moving to an event-sourcing model requires a significant shift in how developers think about data handling. However, it is a necessary transition to improve data integrity.

Conclusions drawn from the talk suggest that by implementing event sourcing, developers can overcome the pitfalls associated with deletion and updates that ActiveRecord encourages. Instead of relying on systems that prioritize current data states, developers are encouraged to embrace methodologies that respect and retain data history, thereby enhancing the reliability and robustness of their applications. He urges the audience to consider utilizing event sourcing, especially for applications requiring historical data retention.

Event Sourcing, or Why ActiveRecord Must Die
Sebastian von Conrad • February 10, 2016 • Earth

RubyConf AU 2016: Much has been said about the dangers with ActiveRecord, but the advice is usually limited to avoiding callbacks or not building God models. We still use it because there hasn't been a viable alternative. It is not actually a trait unique to ActiveRecord that makes it dangerous, however--it's because it's an ORM, and all ORMs must die.

With Event Sourcing, we can build Ruby applications that are simple, scalable, extensible, and elegant without ActiveRecord or any other ORM anywhere in sight. We get a free time machine, and find out that Event Sourcing is a lot older than we might think.

RubyConf AU 2016

00:00:00 I usually like to start my presentations with a little bit of fluff. For those of you who don't know what fluff is, it's that kind of irreverent, irrelevant part of a presentation that doesn't actually have anything to do with it. We come up here and talk about it anyway. Personally, I like to talk about the fact that I'm from Sweden and usually make a joke about IKEA or ABBA. Because when you're Swedish and living in Australia, you get to hear literally every ABBA and IKEA joke that exists.
00:00:19 People often think that we do fluff to warm up the audience, to establish rapport, or something like that, but that’s not why I do it. For me, it's all about feeling comfortable up here on stage because it's actually kind of terrifying. It's important for me to feel comfortable because I know that any mistake I make here gets recorded in that camera right there, and it also gets recorded in all of your memories. If I misspeak or mispronounce something—which I will, because I have a Swedish accent—then that’s just what happens.
00:01:01 In the real world, the mistakes we make are forever. We can't change them once we've made them. This is really the way that the world works; you could almost say that the things that happen in real life are immutable. Yes, people are starting to understand why I'm talking about IKEA and stuff. The things that happen to us are immutable; we can't change them after they've happened.
00:01:36 And really, if you stop and think about it, this talk is an append-only list of immutable sentences, one happening after the other. Once I've made one of them, I'm not gonna be able to change it. Just like life, really, is an append-only list of events. Somehow, as humans, we learn how to deal with that. Things go wrong, we change it, we fix it, we do other things to combat that problem. Except that's not how we build software, is it?
00:02:06 Because with software, we get to be revisionists. With software, we get to play whatever deity we choose. We can delete things in software, and it's awesome. We can change history; we can change what happened simply by erasing information. This is how we are taught to develop software. We learned how to do CRUD, and it wouldn't be CRUD if that last letter fell away, would it? ActiveRecord really helps us with this; it gives us all these nice ways to delete data.
00:02:47 But it also provides us with all these implicit, indirect ways to delete data, which is great. So, we delete data really because we can. This didn't happen back in the days when we recorded data on stone tablets or when we used papyrus for scrolls or any other kind of paper to record our data. We have the technological revolution to thank for our ability to erase information.
00:03:36 However, I would argue that erasing data is not a feature; it's a bug. We shouldn't be thankful for it at all. I’d much rather you give me back my stone tablet, because data is really the most important thing we do. It is at the core of what we, as software developers, do. We capture data from our users; it's what we sell, it's everything we do, it's all about the data.
00:04:10 And this data survives even when technology doesn't. There are companies that stick around for 10, 50, or even 100 years. They go through many technology changes and evolutions, yet the data—the things they sell, the people they've sold to—all that doesn't change. It stays. But we delete it. Sometimes it’s accidental; I usually like to ask people during interviews, 'What's the biggest screw-up you've ever made?' Because I like to feel their pain. The answer is usually about accidentally deleting something or dropping the production database.
00:05:05 So yes, we definitely delete data by accident, but we also do it on purpose. We delete it, but that can often come back to bite us, because what if we need that data after it’s gone? Often, deleting data can have widespread or unanticipated consequences. Try to destroy a random user in your LDAP and you'll see what I mean.
00:05:36 Even more importantly, if we agree that data has value, why would we get rid of something that has potential value? Sure, maybe not all data has value, but how do you know? How can you tell which data doesn't hold value—not just for today, but for the future as well? How can you be sure that what you're deleting will not have value later?
00:06:22 If data can have value and we can’t be certain it won’t ever have value, then that means it has potential value. Why would we delete something that has potential value? I’m not alone in thinking this way—it took me about five minutes to compile a list of gems that help you soft delete things in ActiveRecord. There are ways, such as inserting another database column that has a destroyed at timestamp, and I didn’t write any of these gems; they have tons of downloads.
00:07:14 People have been saying, 'Yeah, we don’t actually want to delete any data.' Workarounds can be okay; we live in a pragmatic world, and sometimes we need to make pragmatic decisions. I would be okay with that if there wasn’t something even more sinister lurking below. I think I can convince you that destroying data is bad, but I also argue that updating data is just as bad as deleting it. When we update, we overwrite what was previously there.
00:08:02 This makes me wonder; why should we stop caring about what was there before? If a user updates their email address, does that mean the old email address is of no value to us anymore? Can we just throw it away? Sure, we might not email them anymore, but if we want to figure out why they didn’t receive an order confirmation or investigate bounce rates for particular email providers, having that email address and the history of emails sent to it would indeed be useful.
00:08:55 But we replace it. When we update something using ActiveRecord, we are deleting the data that was previously there. As a result, every update is a delete. Every time we update anything in our applications, we are deleting and destroying data. It’s worse because it is less explicit; we don’t notice we are doing it.
00:09:30 Sure, we can create changelogs, audit logs, database triggers, and all these things to track changes, but these workarounds are becoming increasingly complicated. This discomfort is the kind of discomfort that no amount of fluff at the beginning of a talk can ever hope to fix. Why should I have to go out of my way not to lose what I feel is most important to me? The more I thought about it, I decided that delete and update are the enemies.
00:10:18 I just don’t want to do it anymore. If I revoke those privileges to my database, what do I have left? Well, I have insert and select, but if I'm only ever inserting data and reading it back, does that mean I’m making it append-only? If I'm never updating anything, does that mean I’m making it immutable? And does that mean I’m actually building software the same way the world works?
00:10:55 Can I record things as they happen and never change them? About 18 months ago, I came across the term 'event sourcing.' I started reading about it; I heard about it from a colleague. The more I thought about it, the more I felt something I hadn't felt since I discovered Rails back in 2006. This is a new way to build software that will change our industry.
00:11:31 Let me explain how it works: The way that ActiveRecord works is it takes the current state of an object and stores it in a database. That's pretty much it. When the state of that object changes, we change the way we represent that object in the database. For example, if there are a lot of Johns here and one of you gets married and wants to take on your partner’s last name, when we update the user, we get an updated database record.
00:12:06 With event sourcing, we don't do that at all. Instead, we store the events that have happened to that object over the lifetime of whatever that thing is. Then we source the current state by simply replaying all of those events in sequence. Doing the same thing with event sourcing would mean having an object created based on the signup events and the data the user provided at the time. When they change their last name, we create another event with the change they make and the new data.
00:12:43 When we want to reconstitute this object, we simply load both events, replay them, and that's how we figure out the current state. Therefore, data becomes an append-only list of immutable events. That’s what event sourcing is all about. If a mistake is made, we don't go back and change the event, even if it’s incorrect; we just write another event that fixes it, just as we do in real life.
00:13:28 But I was wrong. It turns out that this fancy new paradigm I thought would change the world isn't new at all, because ActiveRecord has no respect for history. It definitely doesn’t respect the history of the data it’s tasked to store. It also doesn’t have respect for all of the methodologies that came before it. To understand how to respect history, we must look toward history itself.
00:14:15 At the very core of every business, and for a long time, there is one system that you all know and which is more important than any other: the accounting system. You cannot have a business without an accounting system; you have to keep track of money coming in and going out, payroll, taxes, and all that. Here’s the kicker: your accounting system is event sourced.
00:14:55 A bank account's current state is ephemeral; it’s its balance. The balance of a bank account is obtained by adding up all the debits and credits that have ever occurred. If I lost that balance, I could get it back simply by replaying all of the transactions. These transactions are what matters. A bank account is an append-only list of immutable transactions; you can’t update or delete bank account transactions.
00:15:31 Imagine the chaos if you logged into your internet banking one morning and found that your bank had changed a transaction from three years ago, changed the value, or removed it. You’d be pretty upset. How would you feel if you had to rely on the bank's developers to ensure they don’t introduce bugs that store the account balance incorrectly? You couldn't verify that.
00:16:21 So, we event source it, and honestly, that's how it’s worked for over 500 years, long before computers existed. Event sourcing is robust, scalable, auditable, trustworthy, and proven. If this is how we deal with our most important data—our money—why wouldn’t we apply the same approach to all of our other data, especially the data from our users?
00:17:07 Why are we so caught up with the idea of mapping object states into database tables? Yes, we need state, absolutely, but we don’t need to persist the current state. This, I think, is the fundamental thing that ActiveRecord gets wrong, as well as all ORMs. ActiveRecord encourages us to do the wrong thing with our data. It's teaching us to delete and update, and that is why ActiveRecord must die.
00:17:52 It’s not me; it’s you. You have to die, because without it, we can have nice things. We can keep all the same objects in code while just persisting them differently. As a result, we can record everything that happens, so that we never purposely or accidentally delete anything ever again. We’ve done this at Envato, where I work.
00:18:29 We’ve spent the last year building a system like this, and to be honest, it’s been quite challenging. It’s been a bit of a mind shift for the developers; you have to think about the way you treat your data in different ways. While we've solved many of these problems, we also replace problems with new ones. That’s how software development works—you fix one problem, and another one comes up. There aren’t any silver bullets.
00:19:04 If you care about historical data—and most of the applications we work with do—consider event sourcing. Start thinking about what your app would look like if you revoked delete and update privileges for whatever user connects to the database. What if you could no longer execute delete or update queries? How would you build your app with only insert and select queries?
00:19:51 There’s a whole other world of event sourcing out there, and I wish I had more time to tell you about it. I encourage you to look into it. Come talk to me and others—some of whom have been part of building these systems—and we’d love to share more about it. Thank you.
Explore all talks recorded at RubyConf AU 2016
+15