Balkan Ruby 2018

Beyond the current state: Time travel as answer to hard questions

Software builder, mostly Ruby and a bit of Go. Passionate about "proper" testing, clean architecture and DDD. Currently busy constructing a distributed software system with the best colleagues ever at solarisBank AG.

Balkan Ruby 2018

00:00:09.710 Hello everyone, as mentioned by Dan Dan, guten Morgen! My name is Armin Pašalić, and I work for a company called SolarisBank. We are based in Berlin and we are a tech startup with a banking license.
00:00:17.609 That's enough about that. Today, I would like to talk about a few concepts I have discovered in the recent years while building resilient applications. These concepts have actually changed the way I look at code these days. Before I dive into this journey, I would like to demonstrate how we usually do things.
00:00:37.710 This is a representation of something called interior architecture, simplified and usually represented as three-tier architecture. We also sometimes refer to it as a monolith. Essentially, when a client submits a request, like scheduling a meeting, the application receives this request, validates the payload, which includes some business processing, and then persists the results.
00:01:01.260 Usually, the application performs another query on the persistence layer and responds with the updated state. The only thing that's typically updated in this state is the identifier that the database assigns to your record. In the end, you get a response, and this is how most of us, especially Ruby engineers, write applications.
00:01:17.100 Is there anyone here who is not familiar with this concept? Okay, that's good. However, sometimes there are ways to do it differently, and there are reasons to do it differently. You might work in a business domain that is quite complex. Sometimes your code scales significantly, and sometimes you just want to do some time traveling and have fun with software superpowers.
00:01:30.690 So, how can we approach and optimize our software to do things differently? One of the first things someone may notice in software development is the huge read/write disparity. Typically, applications have much more read operations than write operations. Now, let's envision how we can effectively change this.
00:01:56.490 Imagine that you receive a new request on your API. You ask your software to schedule a meeting, and almost immediately, you receive some sort of identifier. Your request is validated, and you can tell your client to go away with the ID, as you probably don't need to read more, since they know what they sent.
00:02:46.640 Later, when your client comes back and shows interest in the meeting associated with that identifier, you provide them the requested information. Essentially, we have separated our mutation requests from our query requests. This separation might seem a bit complicated, but it's highly beneficial.
00:03:06.040 By removing the extra processes, you’ll notice how the functioning of our application has changed. This approach comes from a concept called Command Query Separation (CQS), which was devised by Bertrand Meyer and explained in his book 30 years ago. He discussed this concept at the code level, but later, Greg Young suggested extending it to an entire system. The concept is called Command Query Responsibility Segregation (CQRS).
00:03:48.290 In essence, CQRS entails separating your write and read sides. If we take this a step further and implement a complete separation of our systems, we can create two completely isolated systems: one for writes and one for reads. This effectively enables you to scale indefinitely on either side by deploying additional instances of your software.
00:04:17.590 With this, we have unlocked our first superpower: the ability to scale. While it might not seem like a grand superpower, it is actually quite significant. The crucial question we as software developers often forget to ask is: 'What is the current state?'. For so long, we have been modifying records in typical CRUD applications without considering that we may forget previous data.
00:05:01.630 It’s important to realize that the current state is something we perceive; it does not truly exist in the real world as we might think. Moving on, the second concept I want to discuss is eventual consistency. You will encounter this as soon as you start building distributed systems.
00:05:36.290 Essentially, eventual consistency means that you can no longer rely on your data or state being transactional and changing immediately as intended. This understanding is vital, as most of the time, a client writing the data doesn't necessarily need to check its status immediately. Once they receive confirmation that the data is there, they can go away.
00:06:10.220 For instance, in Germany, we have a system called Elster to submit taxes. You create an application and submit your details, then you receive a letter with credentials days later—this is how it works, and the system remains rather consistent. Typically, businesses operate in this manner, where you don’t expect immediate results.
00:06:31.560 Now, having separated our system into multiple self-contained parts, we can do various interesting things. To illustrate, let’s consider the concept of event sourcing: rather than mutating the state, we can consider the current state as merely a projection of events that occurred in the past.
00:07:07.200 When you look at how databases function behind the scenes, they maintain something called a transaction log, or journal log. In the event of a crash, the database can read the log and recover its status. If we consider the state from this perspective, everything that has happened can be saved as a series of events. These events can then be replayed to recreate what we regard as the current state.
00:07:41.340 For example, if for any reason we lose the current state, we can still reconstitute it using the past events. If you think about it, you can always recreate your data from the past events—this method might take time depending on the number of records, but it's a workable solution.
00:08:30.410 Moreover, this reconstruction process is flexible as you can choose any medium to represent your data. Whether you're working with a relational database or an in-memory state, you can project your data however you see fit. This leads us to the concept of event sourcing, as described by Greg Young.
00:09:03.790 With event sourcing, time travel becomes possible—this is what my talk is fundamentally about. If you store your state as a series of events, you can review how your state was at a specific point in time by replaying events leading up to that moment.
00:09:32.830 This capability can be invaluable for debugging, as it allows you to reproduce the exact state of the system at the time a bug occurred and to replay subsequent events until you identify where things went wrong. Furthermore, you may want to change how software operates based on past performances.
00:10:14.230 By analyzing past trends, you can enhance your software architecture. Another superpower is scope recognition, which refers to the ability to foresee the future needs of your software. Imagine a scenario where your manager asks you to send promotional mail to users who added a product to their cart this year but then removed it.
00:10:55.720 In a typical CRUD system, this might take months to implement, but by utilizing event sourcing, you can construct features as if they were always part of the original design, as long as they pertain to data you have been collecting.
00:11:15.860 You can simply say, 'No problem, I can build a projection of all the users who performed those actions.' Now, let’s also consider total state reconstruction. This is the ability to remove your data, recreate your state in any way, and reintegrate it.
00:11:39.080 Additionally, there’s a superpower called enhanced compliance. Working in a regulated industry like banking means being under constant scrutiny from regulators. They often want to know how things are accomplished, and with an event-sourcing system, you can maintain an immutable record of all actions performed.
00:12:11.070 This transparency is appealing to regulators because they can investigate records as needed, ensuring everything functions appropriately. Other benefits include the ability to debug from an exact state and the facility of conducting testing without altering or deleting records, making the process significantly easier than traditional CRUD methods.
00:12:43.960 Real-time, event-by-event backups of your system are also feasible, as you could essentially create a single matter projector that processes and stores every event separately.
00:13:13.020 A couple of years ago, I learned about event sourcing, and it struck me as an incredible concept. It seems simple, yet its implications are vast. I wondered why it wasn't taught more widely. This realization prompted me to share my insights on it, hoping to inform others.
00:13:49.810 I'd like to ask how many of you have heard about these concepts before? It's surprising that only half of the room is familiar with it, and even fewer have actively implemented it. There are, of course, trade-offs and challenges that come with this methodology, particularly the necessary mindset shift for software engineers who have been conditioned to think in different directions for decades.
00:14:27.080 Another challenge lies in hiring or training engineers familiar with these concepts. The training process takes time, and hiring experienced personnel can be quite challenging. It is crucial to recognize that while eventual consistency may bring balance, working with legacy systems or third-party systems can pose issues if those systems are not idempotent.
00:15:09.550 In practice, utilizing event projectors and reactors—tools that transform data into the current state—is recommended. These projectors should be designed to be idempotent. When storing your events, use aggregate scope sequence numbers to minimize concurrency and race conditions when multiple writers interact with your state.
00:15:49.079 When aggregating state and creating entities, remember that on the command side, entities should always populate from events, promoting a design where recent state changes precipitate new events.
00:16:33.790 Do not use projections indiscriminately; they are eventually consistent and may not always provide reliability. However, in greenfield projects, if your client requires an immediate state response, it's advisable to provide the identifier instead. Implementing a domain transfer object can assist with the deduplication of events in distributed systems.
00:17:09.350 Working with legacy systems can be managed using a Saga pattern, which aids in state management. I recognize it may be a bit confusing, especially without my notes, but I hope these insights have been helpful.
00:17:46.330 Thank you very much, and as I mentioned, I work for SolarisBank, where we are hiring. Feel free to come talk to me if you want to learn more about event sourcing or my experiences. I would be happy to share further.