Domain Driven Design

Summarized using AI

The Saga Pattern

Robert Pankowecki • March 12, 2016 • Wrocław, Poland

In the video titled "The Saga Pattern," Robert Pankowecki discusses the evolution of an event-driven architecture through the implementation of the saga pattern, reflecting on his experiences over a two-year journey. He begins by sharing insights on how his organization published their first events related to a significant client project, resulting in 13 million events in their database. Pankowecki emphasizes the importance of moving beyond basic parameters in command patterns to a richer model using commands and events.

Key points discussed in the video include:

- Understanding the Message-Driven Architecture:

- The shift from simple command parameters to a more robust event-driven architecture that distinguishes between commands (user-initiated actions) and events (system occurrences).

  • Example of the Ticket Booking Process:

    • The saga of ticket booking, illustrating the complex flow of actions when a user orders tickets and selects additional services like postal delivery, highlighting the need for timely decision-making about when to ask for postal details relative to payment.
  • Introduction of Handlers for Event Management:

    • After publishing events, handlers are introduced to manage those events, exemplified by adding registered users to an Elasticsearch database or sending emails based on events like refunds and cancellations.
  • The Role of Sagas in Managing Dependencies:

    • Illustrated through various examples, including conference creation and ticket ordering, where subsequent commands depend on earlier events, demonstrating how sagas facilitate complex workflows across bounded contexts in event-driven architectures.
  • Challenges with Synchronous Handlers:

    • Emphasizing pitfalls of synchronous handlers that can lead to rollback issues if one part fails, advocating for resilience where failures are recorded without affecting the overall transaction flow.
  • Naming Conventions and Complexity Reduction:

    • The necessity for clear naming of events and a manageable service object size by extracting dependencies to handlers, simplifying business logic management.
  • Asynchronous Logic and Serialization:

    • Discussing the importance of handling asynchronous events properly, ensuring their execution occurs post-transaction commitment, and addressing serialization challenges effectively.
  • Structured Event Processing:

    • Introduced mechanisms for logging and processing events sequentially to manage sagas effectively, which ensures clarity in the operational state for auditing purposes.

In conclusion, Pankowecki reflects on how the journey of implementing the saga pattern transformed their architecture and highlighted the need for proper structure, naming, and error handling. He shares a personal anecdote about the value of knowledge dissemination through books versus software, concluding by encouraging the audience to support their work through book purchases.

Main Takeaways:

- Transitioning to a message-driven architecture and embracing the saga pattern offers robust management of complex workflows.

- Proper naming and error management are essential for effective event handling in asynchronous systems.

- Knowledge proliferation through writing can yield greater community support than sole reliance on software sales.

The Saga Pattern
Robert Pankowecki • March 12, 2016 • Wrocław, Poland

wroclove.rb 2016

00:00:14.890 For our biggest client and biggest project, we started publishing our first events three years ago. Right now, we have 13 million of them.
00:00:22.330 Frankly, I had no idea two years ago what we were getting into. It wasn't my idea to start publishing those events. I wasn't convinced; I didn't know how much it was going to change our code base or whether it would be a good or bad thing.
00:00:34.120 I had no idea about any of this at all. I'm still not an expert, but at least I now have 13 million events in my database, so I know something. Just like GitHub, except they have a bad pipeline, and I have a good one.
00:00:58.000 The first step, which I mentioned yesterday, is looking at where we are two years later. In those two years, we had quite a journey. It’s not just that if you start publishing events, you will suddenly gain the ability to implement the saga pattern that I will discuss next week.
00:01:28.060 It's probably not going to happen immediately, but maybe sometime later you will achieve the ability to understand and reap the benefits of it. So, it's not only about the saga pattern; it will also be about how you connect it to your architecture.
00:01:47.979 The assumption is that you have a message-driven architecture in your application, which is often referred to as event-driven architecture. However, I prefer the term "message-driven architecture" because it emphasizes that there are two kinds of messages: commands and events. These two types of messages are crucial.
00:02:20.470 Initially, when you start programming, commands are often just parameters. Sometimes, this is good enough, especially at the beginning. However, over time, you will realize that parameters are not sufficient. You may need to use form objects, and then you will discover that parameters just don’t cut it.
00:02:55.570 For example, you might have a large, complex form that appears to be posted with just one request to your system. But this is not just one command; it often corresponds to multiple commands—like five, ten, or even fifteen commands—because users are changing several things within our application.
00:03:22.630 Commands indicate what the user wants to do with the system. For example, as a user, I see that the price is acceptable and the seat is free, so I issue a booking command for that seat for the conference I want to attend. If everything goes as planned, and the seat is still available, our service or aggregate can accept that request and then publish an event confirming that the ticket has been booked.
00:03:41.110 Events are the occurrences that reflect what has actually happened. Generally, commands are issued by the users, who assess the state of our system and decide on actions they would like to take. However, what would happen if commands originated from the system itself? What if they came from an automated process rather than user input?
00:04:04.870 This situation leads us to a discussion about sagas, which can be long and complex stories. They often involve ups and downs, much like a tale that Vikings would narrate to their children.
00:04:16.900 In our system for booking tickets for conferences, we offer additional services for those who don't have a mobile phone or a printer. You can choose the option to have your tickets printed and delivered to you by post. The process begins when a user places an order and adds this postal service. Once the service is requested, the user pays for it.
00:05:02.290 After payment, the user fills in postal details, which triggers an event indicating that the address has been filled out. As the user completes these details—potentially quickly if they have an autocomplete feature—a background job generates a PDF containing the tickets for them.
00:05:50.120 Here’s an interesting point: we are unsure about the order in which these actions will take place. We debated whether to collect the postal address before or after the payment. We were concerned that asking for the address before payment could reduce conversion rates, as users might hesitate to complete their orders once they see the requirement to fill out an address.
00:06:24.300 On the other hand, we worried that if we delayed asking for the address until after payment, users might be so pleased with their successful ticket purchase that they'd forget to fill out the additional form. Hence, we made the address requirement come after payment.
00:06:52.150 In our case, while we are not entirely certain about the timing of the address input related to the PDF generation, we know that when the four specific events occur, we want a fifth action—the command to send the PDF via postal service—to take place.
00:07:19.530 So, when those four events happen for a given order, we need to trigger the fifth command. That's our saga! This has taken us two years to develop and was not present at the beginning.
00:07:44.020 So, how did we reach this point? The first step, as I mentioned previously, was to start publishing events. We have a service that allows for user registration, and once a user is created, we publish an event stating that the user registered via email.
00:08:05.160 When we publish this event in our event store, it gets saved in the database, and if there are any handlers, those handlers are triggered. This initial step is simple and involves minimal attributes; it’s a small event.
00:08:26.250 However, we may also publish much larger events that involve numerous attributes and more complex cases with various value objects.
00:08:52.720 Sometimes, events will be quite simple, while other times they will be elaborate. Both cases are valid, and you will need both types of events in your system.
00:09:10.630 That brings us to the next phase: introducing handlers. Once you introduce handlers, you will see they effectively manage the events you publish. For instance, a simple handler can add the newly registered user to an Elasticsearch database so that customer support can find that user by searching relevant attributes.
00:09:36.900 This handler is straightforward because it only needs the event and its data; it doesn’t require any queries—it simply takes data from the event and stores it somewhere else, essentially creating a rich model in Elasticsearch.
00:10:02.540 However, not all handlers are that simple. Some handlers might need to send emails, which would require looking up a corresponding email address from the database. For instance, when a Football Club approaches us, we will import the people who hold season passes, allowing them entry to every event at their home stadium.
00:10:34.320 Once we have this information, we will want to welcome them in our system and provide a method for setting up passwords. To do this effectively, we need to know which email address to send in correspondence to each respective club.
00:11:04.960 Eventually, you may find that you can utilize a single handler class to manage multiple types of related events. For example, you may have two events that sound quite similar: one event is a ticket being refunded, and the other event represents an admin canceling a ticket.
00:11:29.430 Although both events are similar, they're treated differently depending on the bounded context in our application. The scanning process responsible for admitting access needs to be aware of both events and ensure that access is revoked in either case.
00:12:02.220 When a ticket is canceled by an admin, it results in denying the ticket holder access. However, when a ticket is refunded, additional implications arise—for instance, how the financial records must be updated due to the refunded money.
00:12:31.370 Although both the refund and cancellation activations share some similarities, the workflow following each activation is vastly different due to the implications of the financial operations involved. With these complexities, we can observe how a single handler can manage several events while still executing separate workflows based on their context.
00:13:12.840 The key aspect here is that if you introduce a handler that processes multiple events, while requiring some state to make determinations, that’s how you arrive at implementing a saga. Here’s another example: imagine a conference creation event. If a conference is created or cloned using last year’s setup to streamline the process, we store specific details in the database.
00:13:53.120 For example, we might save information about who created the conference and the tenant associated with it, while also keeping track of the event ID. Then, as the first ticket is created for that specific conference, we issue a command to check the banking settings, ensuring that the organizer has filled out their payout details.
00:14:33.700 If they haven't filled in those details, we’ll email them, prompting them to do so to earn money through our system. This necessity for information is drawn directly from the data recorded as we processed earlier events.
00:15:07.940 Having this data is essential for executing subsequent commands, creating a notable dependency chain that links events together. A common example within the payment field is an order which a user places to buy tickets for a conference.
00:15:54.730 Typically, in the ticketing industry, there is a limited time frame to pay for your order—perhaps 20 minutes. After that, other users may want to purchase those tickets, so those places can't remain reserved indefinitely.
00:16:36.050 If the order expires, what happens? If the user submits their payment before the order expires, we authorize the payment amount on their credit card, but we haven't charged it yet. This situation gives us the opportunity to issue a command to release the payment, returning the funds to the user and canceling the order.
00:17:16.360 Essentially, the saga orchestrates the flow between multiple bounded contexts in this event-driven architecture, creating a cohesive bridge even when issues arise that cause separate systems to act separately.
00:17:53.740 Over two years, we have worked to transform our approach to sagas amid prolonged deliberations and analyses. To gain even more insight into establishing robust saga methodologies, it's critical to consider what areas to be mindful of and what potential pitfalls to avoid.
00:18:31.390 This includes the use of synchronous handlers executed immediately when an event is published. Their API, which resembles a rescue API, often contains one method that initializes the handler and calls this method upon triggering events.
00:19:04.400 While it is crucial to initialize the handler correctly for testing purposes, caution is needed concerning synchronous handlers. For example, if a user registers and the first handler attempts to add the user to Elasticsearch but fails due to an external issue, the entire transaction may roll back.
00:19:38.310 This scenario implies that the user never officially registered, which is problematic. With multiple bounded contexts operating independently, the failure of one handler shouldn't affect the entire flow.
00:20:20.330 If a handler fails, it’s better to record a failure and rectify that particular handler separately.
00:20:52.180 When publishing events, it's also essential to ensure that they are named appropriately to reflect their purpose clearly. Instead of generic names like 'user created' or 'user updated', it would be advisable to use more explicit naming conventions, such as 'user registered by email' or 'user joined from Facebook.'
00:21:30.990 By naming events meaningfully, you greatly enhance clarity and structure within your architecture, thereby solidifying your understanding of the flow in your system.
00:22:12.210 Once you adopt this intentionality in naming, the big wins become evident as you harness the power of events and handlers more effectively.
00:22:45.260 You may realize that large service objects come with several dependencies—often more than necessary. By extracting these dependencies from the service object, you place the logic of side effects into the handlers.
00:23:18.230 Service objects can focus on just determining whether an action should proceed, while the handlers account for the various outcomes. This shift invariably reduces the complexity and size of service objects.
00:23:55.940 However, moving forward tends to mean embracing asynchronous logic. We can serialize events and handlers using mechanisms that mimic Active Record.”
00:24:34.200 An event published gets stored in an event store, triggering registered event handlers, either synchronously or asynchronously, using tools like Rescue. If a handler is asynchronous, it might try to execute before the database transaction is completed.
00:25:11.270 To prevent issues arising from this, we schedule the asynchronous jobs only after the database transaction commits.
00:25:37.060 We've encountered challenges with serialization, notably when using YAML. This choice was made because it allows us to serialize value objects easily. For instance, a seat within our system could be represented as a value object with attributes such as label and category.
00:25:59.630 YAML can handle many data types and provides a mechanism for our application to continue functioning smoothly, in spite of serialization complexities.
00:26:29.800 In the scenario of asynchronous handlers, any errors encountered are directed to separate rescue tables, enabling easy reprocessing and troubleshooting. Once the root cause has been resolved, the handler might get reprocessed seamlessly.
00:27:02.170 This offers a strategy to enhance the robustness of our architecture during event-triggered operations.
00:27:35.770 Moreover, when it comes to processing events that trigger sagas, they can occur at any moment. However, to facilitate an orderly resolution of events and decisions, a systematic approach must be in place.
00:28:08.050 By employing logging mechanisms integrated with a sequencing logic, we can ensure events are processed sequentially. This locks the database records during operations, thus preventing simultaneous conflicting access.
00:28:39.670 This structured processing allows us to manage sagas effectively, ensuring that events are not only logged but also fully documented throughout the saga's lifecycle.
00:29:08.380 In conclusion, as the saga progresses, we want to highlight specific milestones while maintaining awareness of the state of our operations. This way, we can continuously audit and adapt our processes based on the progressive changes that happen in our application.
00:29:45.230 Thank you for your attention today. Let me close with a personal story: in my entire life, I've made $15 in revenue from open-source software.
00:30:21.480 This came from one single donation, and it turns out that the appreciation for my work led to that donation being made. However, in my company, we've generated approximately $50,000 from books we've written.
00:30:53.940 This experience goes to show that people are more willing to invest in knowledge than in software, mirroring a broader trend across the industry. Writing books opens many doors while sharing knowledge tends to create lasting relationships in the community.
00:31:31.970 So, if you enjoy what you’ve seen and would like to support us, I invite you to purchase one of our books. It is a great way to help us continue our work.
00:32:09.880 Thank you very much!
Explore all talks recorded at wroclove.rb 2016
+27