00:00:04.580
There is one fun fact about Łukasz. Who can guess what programming language he used two years ago?
00:00:11.340
Yes, Łukasz was a .NET developer two years ago. I'm really happy that I helped him transition to Ruby.
00:00:19.440
Today, Łukasz will talk about a real-world project where Rails Event Store was introduced, or more generally, where event-driven architecture was adopted. I hope some of you will implement something similar in your projects soon.
00:00:32.520
So, let's welcome Łukasz. Good luck!
00:00:57.719
Thanks, André. Yes, I admit, two years ago I was addicted to .NET, and I’m really grateful that he helped me transition away from it. I'm honored to be the first speaker at this conference.
00:01:08.939
As you already know, my name is Łukasz. I work at a company called Arkansaw on a daily basis, helping rescue legacy projects. In this talk, I will share six stories about how we work with Rails Event Store in Trezy, our cash flow management application.
00:01:26.340
Let's begin with an overview of our cash flow management application. This application assists small businesses in managing their cash flow. Users can view their bank account balances in one place, and we also handle some accounting tasks for them, so they don’t have to wait for their accountants to do it once or twice a year, which is the case for many.
00:01:39.000
The Rails Event Store is an open-source event store implementation for Ruby on Rails, established in 2014. It acts as a library for publishing, consuming, storing, and retrieving events. In this talk, I will focus on event sourcing in conjunction with event-driven architecture, domain-driven design patterns, and testing, which will play an important role in this presentation.
00:02:03.840
Is anyone here familiar with this concept? Please raise your hand. Great! Some of you are, while others may not be, but let’s move on to the first story, which is about our integration with an open banking provider.
00:02:30.239
As I mentioned, we are a cash flow management application, so we need access to your bank account data. However, we don’t create the integrations ourselves; we rely on third-party providers, known as open banking services, to send us the data. This enables us to show you the balances of all your accounts and transactions.
00:02:55.319
Based on this information, we can generate useful reports for the small businesses using Trezy.
00:03:01.080
The banking providers integrate with us through webhooks. They essentially hit our API and send us data in the payload. There are at least two approaches to process a webhook. The first is a synchronous approach: when we receive a webhook from the banking provider, we can pass the payload through the application's layers until it lands in the database.
00:03:14.819
While this method works, it raises a question: what if something fails during this journey to the database? For instance, there are many issues that could arise in the application layer, such as violating an index, changes in the database schema, or processing logic errors.
00:03:39.000
To address this, we could store the payload as a technical event, which leads us to the first relation to event sourcing. Instead of saving it in a traditional database, we store it in the event store. Once we do that, we can publish it to a queue and process it asynchronously.
00:03:50.940
This is what the process looks like: when the banking provider sends us a webhook, it first hits our REST API. We then store and publish a technical event. Afterward, a queue takes the event, and an event handler processes it, fetching the payload and storing it in the database along with any necessary calculations.
00:04:22.259
This approach offers several benefits. Firstly, we gain an external system audit log—we have the payload information stored as technical events that we can access whenever necessary. This is incredibly useful for debugging and troubleshooting.
00:04:39.300
Additionally, the performance improves; responses to third-party systems are much faster since we don’t process data immediately. We only store the payload, so there is minimal database interaction. Once stored, we can publish the event, which takes only milliseconds.
00:04:57.780
Later, we can process it asynchronously depending on the availability of processing units. A notable advantage is that we can scale our web server that receives the webhooks independently of the web jobs.
00:05:17.280
It's likely we will have more web jobs processing the webhooks than web server instances, and we don't have to rely on the third-party retry mechanism. Once we have stored the payload, if something goes wrong, we can always access it.
00:05:28.500
This means if there’s a bug, I can fix my code and retry as many times as needed without depending on whether the third-party provider will send me a retry payload, which might not happen for a while.
00:05:41.340
Keep this in mind; I will refer to it in the next stories. Now, let’s return to the open banking provider. We decided to move the processing of webhooks into an async process.
00:06:00.000
As with most things, a question arises: what could possibly go wrong here? Nothing is without drawbacks. While there are many benefits, one problem is that the webhooks may come out of order.
00:06:23.119
This ordering is critical. Imagine we receive a webhook with a bank account balance of 100, and then we publish it as a technical event, but the processing fails. Next, we receive another webhook for the same account with a different balance. If we successfully process the second one and then go back to the first, we might overwrite the data, which we want to avoid.
00:06:39.000
To solve this issue, it's also a good moment to introduce the context of the project we are working on. Some of you might think this is a green field, but it’s actually not. The code exists within a startup context, and this picture conveys the challenge quite effectively.
00:06:53.539
Sometimes, I feel like when I want to change one small thing, I have to be careful not to break something else, resulting in unforeseen consequences.
00:07:06.500
So, what does this mean in reality? For most concepts in the system, there is one relational data model containing the entire structure. We call this logical coupling. I won't mislead you; this is common.
00:07:21.360
While it served its purpose and earned revenue, extending that model at this point would be impractical. We began seeking a new model and, after several iterations, arrived at a refined design.
00:07:40.560
In this new model, the white boxes represent the pieces of information needed, such as those for external providers, timestamps, statuses, and, of course, IDs. There are three primary operations: opening a bank account, closing one, and performing a specific operation.
00:08:02.460
The yellow boxes depict the rules. We utilized event sourcing notation here. The first rule states that an operation can only be performed if the bank account is not closed. The second rule prevents processing webhooks with timestamps older than the current one, which protects against data inconsistency.
00:08:19.980
This is how it appears in Ruby code. Some of you who attended the workshop might recognize this. This class is quite straightforward, including the aggregate route game, which allows the use of methods for handling events.
00:08:34.620
This class checks whether operations can be performed while also managing the event and its publication.
00:08:46.500
However, I must confess that we had to be pragmatic here; we just didn’t have time to rewrite half the system. Our existing approach required writing to the bank account model, which forms the top layer, where we instantiate the class and execute the operation.
00:09:08.760
This means that changes to the read model occur in conjunction with the write model. It’s often not easy to adopt a perfect system in a real project.
00:09:24.580
The underlying assumption is that the stream will always tell the truth. Multiple reasons may alter the state of a bank account. For instance, if we have a soft delete mechanism with the ActiveRecord, deletion of a bank account can happen for various reasons.
00:09:39.000
However, when we check the stream and see that the bank account is closed, we know not to act on it. Next, we have a story regarding the close bank account and missing events.
00:09:57.919
You may already guess that we introduced a problem when implementing the aggregate tasked with managing operations.
00:10:13.200
When we receive a webhook and publish it as a technical event, we begin processing it, but it fails. The error indicates that we can’t save the record, which surprised me since there was nothing wrong with the code.
00:10:29.640
I reviewed the stream, and everything seemed fine; it indicated the bank account was not closed. However, upon examining the code snippet that generated the output, I noted it failed during the save operation.
00:10:46.080
This issue was only encountered in specific bank accounts, presenting a real challenge to debug and reproduce accurately.
00:11:02.640
To address this, I utilized my advanced debugging techniques and replayed the event since it was stored in my event store.
00:11:16.500
As it turns out, the read model was self-deleted for some reason, but I noticed we were missing the event indicating the bank account closure.
00:11:32.520
Considering our policy stating the stream as the source of truth, this missing event should not have happened. It became clear we should have included it in the bank account aggregate to maintain a reliable event stream.
00:11:48.720
Previously, a new feature had been implemented to ignore all incoming webhooks for specific data, which posed substantial risk.
00:12:04.600
The developer neglected to incorporate the necessary changes into the aggregate, bypassing the single source of truth for this part of the logic.
00:12:22.520
While some may view this as poor design by introducing multiple verifiable sources, it was a conscious decision aimed at retaining a reliable stream.
00:12:39.000
To resolve this issue, the first part was straightforward. I wrote a test to confirm the fix. Once the problem was identified, I implemented the correction, which made it pass.
00:12:57.880
But the challenge didn't end there; I now had to backfill the historical data, a critical aspect of maintaining an effective event-Sourcing strategy.
00:13:16.920
This step is crucial when managing legacy systems that continue to evolve rapidly.
00:13:31.200
This dynamic environment prevents us from adhering strictly to every architectural principle. We emphasize adapting solutions to the specific context of the current business needs.
00:13:46.200
Now, let’s serve another story regarding pending transactions; another hurdle that arose with event sourcing that I had to tackle.
00:14:02.520
Events in an async system that depends on external data from providers can be complex. Let me illustrate what pending transactions entail.
00:14:19.560
One of our banking providers notified us they could send transactions not yet fully processed by the bank. Customers could view these pending transactions early, but when they sent a new batch, we needed to clear the previous ones.
00:14:35.040
After a few weeks with this feature in production, we received complaints from customers stating that they couldn’t see new transactions, and the bank account balance seemed incorrect.
00:14:53.780
This situation occurred multiple times a week, and at first, I brushed it off, thinking customers didn’t understand our role as a bridge, not a bank.
00:15:12.480
After some investigation, I pulled the recent events and discovered a delay. We realized 700 pending transactions were missing, and the resulting inferential calculations yielded an incorrect balance.
00:15:29.700
With newfound determination, I checked the event logs provided by the open banking provider and observed that they had indeed sent us the relevant events, which we recorded as events.
00:15:44.520
However, I ran into the same save record error once more. After initial panic, I noted the payload looked healthy.
00:16:03.000
Employing my debugging repertoire once more, I replayed the event and traced the issue back to the soft-delete configuration in the ActiveRecord setup, which unexpectedly affected the pending transactions.
00:16:25.680
Here’s how I resolved it. First, I wrote a test confirming my understanding of the bug, then I fixed the issue and deployed the correction.
00:16:43.720
Moreover, unlike traditional debugging methodologies, I could replay the event without waiting for another webhook; I didn't need to wait for the bank provider to resend it.
00:17:01.120
This significantly boosted customer satisfaction as they did not experience prolonged delays.
00:17:16.560
Next, I want to cover scenario involving large transactions and categorizing them accurately.
00:17:35.040
As I mentioned, the classification of transactions is essential for reporting and managing cash flow effectively. Customers can manually select a category for their transactions.
00:17:52.720
In Poland, for instance, cars available for rent observe specific travel expenses. There are instances where transactions are automatically classified by our system to save time and avoid the need for manual input.
00:18:05.520
This entire system was built as a monolith initially, and over time, the model became too large with distributed business logic causing an amalgamation of complexities.
00:18:24.000
The consequence was a lack of cohesion as business rules got scattered everywhere, especially within various event handlers.
00:18:41.760
When we realized transaction classification was crucial, we knew the project needed addressing, as customers kept randomizing their transactions.
00:18:57.680
After deliberations with our team, we crafted a project to rewrite the classification mechanism to enhance stability and predictability, aiming to do so effectively.
00:19:13.600
A breakthrough introduced an automatic classification feature. To ensure reliability, I laid out exploratory tests to establish a point of control.
00:19:30.240
Crucially, all of this hinged on solid integration testing due to the unpredictable nature of event-driven architectures.
00:19:54.960
We developed a classification algorithm that learns from existing data. For instance, if a user manually classifies one transaction, the system can suggest the same classification for future transactions.
00:20:14.240
The code encapsulates the business rules carefully. Every logical pathway is robustly checked against numerous criteria, maintaining our emphasis on quality.
00:20:30.000
We utilized a short, simple Ruby object that does not depend on an ActiveRecord setup, simplifying unit testing.
00:20:44.400
Following development, we aimed for complete mutation coverage, ensuring our confidence in the transition to production.
00:21:02.000
Eventually, we backfilled the historical data. However, we decided to introduce a rich part into the classification process, separating the read and write model.
00:21:19.720
By developing an event handler, we can maintain a simplified read structure while adapting flexibly to future requirements.
00:21:36.640
After deploying these changes, we instantly received notifications of errors related to incorrect transaction classifications, prompting us to revise our implementation.
00:21:52.640
To clarify the hierarchy in our decision-making process, we established a clear classification pipeline evaluating rule-based classifications first, followed by account classifications, ensuring that manual overrides remained within context.
00:22:08.600
This iterative feedback helped us improve our system, allowing us to refine processes regularly.
00:22:23.520
In this cumulative process, we maintained visibility in real time, practically recalibrating pathways for transactions to protect against categorization errors.
00:22:38.800
This feedback continues informing future refinements, demonstrating the iterative value of our evolving architecture and classification algorithm.
00:22:54.400
Once we launched the new classification method, the operational efficiency improved significantly, demonstrating the simplicity behind a well-structured event-driven architecture.
00:23:09.240
Ultimately, we adapted this system to accommodate user feedback dynamically, also tracking responses to past decisions.
00:23:27.200
The alignment of user input with historical data created a robust foundation for future data-driven decisions.
00:23:44.080
With tracking mechanisms in place, we anticipated even smoother transitions for users returning to our application.
00:24:02.880
With the event streaming available throughout the systems, we realized what’s mistaken or unreachable will always be held within the archived data.
00:24:19.280
To encapsulate this process in user-focused terms, we've defined specific feedback categories to identify business analysis opportunities easily.
00:24:36.040
Ultimately, a streamlined flow facilitates a thorough understanding of past actions concerning customer retention, creating numerous opportunities for improvements.
00:24:54.080
There is no arbitrary limit restricting us to only sending notifications if we can access value in existing data. Businesses may always iterate upon categorical classifications based on relevant events.
00:25:13.760
This system induces flexibility when responding to varying types of user interactions while remaining grounded in analysis. Flexibility is a significant advantage in our rapidly changing landscape.
00:25:32.740
Ultimately, the power of event sourcing becomes evident through improved decision-making facilitated by thorough analysis over historical events.
00:25:50.080
At the end of this journey, embracing an event-driven architecture requires acknowledging the inherent complexities while cherishing its advantages.
00:26:04.560
My message to you all is: don't let misconceptions on event sourcing intimidate you. Understand its role within the system and approach it step by step, recognizing that it can seamlessly integrate into legacy systems as much as new builds.
00:26:23.720
When adapting legacy projects, break down the problem, evaluate your business needs, and determine how much of the architecture should be applied and when it should take place.
00:26:38.560
I believe this is essential in guiding practical implementations towards assured success. Working iteratively incorporates flexibility while keeping proactive measures in sight.
00:26:55.920
Ultimately, event sourcing and domain-driven design are not magic solutions to every problem, but serve specific purposes in unique contexts.
00:27:15.560
None of these patterns are silver bullets and may not resolve every concern you face; there are definitely traps to watch for.