wroc_love.rb 2014

From ActiveRecord to Events

This video was recorded on http://wrocloverb.com. You should follow us at https://twitter.com/wrocloverb. See you next year!

Slides: http://www.slideshare.net/emadb/wroclove-rb

Emanuele Delbono with FROM ACTIVERECORD TO EVENTS
We are used to write our rails application with ActiveRecord and store in the database the
current state of our entities. This kind of storage is not lossless as we might think, we completely miss the story that took the entities in the current state. That's way new architectures are becoming popular, these architectures don't store the state of the models but their deltas. During this session we will give a look at the Event Driven architectures, what problems they solve and how can be implemented in ruby and rails applications.

wroc_love.rb 2014

00:00:13.839 Good afternoon, everybody! Thank you for joining. Let's start by asking how many of you are familiar with these books.
00:00:21.320 Oh nice, it seems most of you are! This book was published in 2002, and when I was a young developer, it contained many interesting patterns for enterprise applications.
00:00:30.080 One of the most interesting patterns inside this book is the Domain Model pattern.
00:00:37.040 This pattern teaches us that we should model our applications using objects that contain both behavior and data.
00:00:42.280 Until then, my applications were filled with anemic entities—objects that contained only data but no behavior.
00:00:49.280 After reading this book, I started to model an application, trying to apply the patterns I understood from it.
00:01:03.239 I began a new large application where I tried to define my domain models: a Customer can have an Invoice, which in turn can have items. A customer also has an Address, which is tied to the City, Price, Contract, and so on.
00:01:14.200 After a few iterations, I had created a very large domain model where all the objects were interconnected. It seemed awesome to me because I could start from the City and navigate through the Address to the Customer and retrieve all the contracts signed in a particular city.
00:01:32.479 I felt like I could do a lot of things with this extensive graph of interconnected objects.
00:01:39.200 However, the first problems arose when I tried to persist this graph to a database. At that time, the application was written in C# and I had to use an ORM like Hibernate to persist these object models to the database, which was quite painful. The problem was that this domain model was a monolith, with all the application's concerns bundled together in the same piece of code, making it difficult to separate them out into different concerns.
00:02:16.680 Around 2003 or 2004, a pivotal book was published. Who here is familiar with it? It’s a difficult book to understand, as I read it three times to fully grasp all the concepts it contained. If you read it after the book on Enterprise Application Architecture, you will see how it explains where I went wrong in my past designs. This book clarifies how the Domain Model pattern should be correctly applied in real-world applications.
00:02:56.400 During this talk, I want to show you two different approaches for our Rails applications. My name is Emanuele Delbono, and I come from Italy. I work as a software developer at a small company called C Plastico. Although I’m not a full-time Ruby developer, I enjoy developing in Ruby for fun in the evenings.
00:03:35.360 Let’s kick things off with the Lasagna architecture—does everyone here know about this architecture pattern? Being Italian, I know how to cook lasagna very well!
00:03:46.760 A Rails application can be likened to a lasagna; it has the view on top. The View communicates with the Controller, which ultimately interacts with the Model to persist data to the database. What's wrong with this arrangement? Actually, nothing—if the application is relatively simple, this architecture works just fine. However, if the application grows larger, and the model portion expands with numerous model classes, persisting data to the database becomes complex. This is why we use an ORM—in the case of Rails, that ORM is ActiveRecord.
00:04:31.800 Martin Fowler describes the Active Record pattern as an object that wraps a row in a database table. The concept is simple; you have one row in a database table corresponding to one object in your application. But if the model changes, the database changes, leading to tight coupling between the database structure and the domain model. In many cases, we have one class tying to one table, which means that our applications often reflect our database structure in memory.
00:05:14.840 This coupling creates problems when it comes to queries, as we face issues such as the N+1 problem during lazy loading. Moreover, we violate the Single Responsibility Principle because our models mix concerns related to both data persistence and domain logic. Therefore, our models fulfill two roles. In our Rails applications, we tend to think only in terms of the database; when we scaffold or project our models, our focus is almost exclusively database-centric.
00:06:37.160 Another problematic aspect is that we use the same model for both GET and POST operations—reading and writing data. For instance, during a GET operation, we might fetch all products from the database to return them to the view, but during creation, we employ the same model to create a new product in the database. As one prominent architect mentioned, a single model cannot adequately serve purposes for reporting, searching, and transactional behavior, yet that's exactly what we attempt to do.
00:07:38.880 This leads to different constraints between reads and writes. Reads are generally simpler; we just need to query the database and present the data to the view—sometimes performing minor formatting or internationalization. In contrast, writes are more complex as they require validation, authorization, and business logic to persist data.
00:08:13.799 The perspective of our users is also essential; they don't care about the underlying database mechanics. When a user clicks a button, they are interested in activating a new contract or performing a business operation. Such actions might involve sending emails, creating documents, or initiating workflows. However, developers—especially in Rails applications—tend to remain entrenched in INSERTs, READs, and CRUD operations.
00:09:01.600 My first piece of advice is to stop thinking in CRUD. We need to shift our mindset away from inserts, reads, deletes, and updates in the database because we’re building applications for users that require business operations.
00:09:36.360 This brings me to the first architectural pattern I will present today: Command Query Responsibility Segregation (CQRS). The idea is to treat reading and writing as two distinct concerns, resulting in two separate stacks—one for reading and another for writing. A possible implementation is thus: we have a presentation layer, such as our controllers, that sends commands to the green area.
00:10:11.040 Each command, which could be creating a contract or adding an item to a basket, is processed by a Handler.
00:10:18.200 The complex business logic is handled here, and the data is written to the appropriate write database. This database is then denormalized into a read database by a normalizing layer that prepares the data to be read as efficiently as possible. The result is a query service—very lightweight—designed solely for querying data in an optimal manner.
00:10:50.800 Here, we can employ one table for each view, where each view in our web application corresponds to a table. This means that reading operations become straightforward; we can use a simple SELECT * query or specify the required fields, with no joins or subqueries involved. Simplifying our queries makes reading data easier.
00:11:26.520 While this first step is conceptually straightforward, its implementation can be complex, particularly since we need to denormalize data and ensure that the two databases remain synchronized to avoid dirty reads.
00:12:09.320 Next, we often encounter a situation where data is stored in tables where each row represents the state of an object. For example, a basket might include two items, each represented in a table. The state of our object is defined by this table.
00:12:56.480 However, this presents a challenge because we only see the present state, missing the history that led to this status. It’s similar to receiving a bank statement that only contains totals with no details regarding how we got there—losing sight of all transactions.
00:13:24.320 Using this analogy, we might miss how our state changed yesterday, last week, and so forth. If we employ the same approach in our applications, we can shift our thinking towards events. Each change made to our system by a user can be treated as an event.
00:14:00.680 If we persist the events, we can reconstruct the state starting from them. For example, if a user adds an item, removes another, and adds yet another, we will know that their current basket contains those specified items, along with the historical actions of addition and deletion.
00:14:31.680 From a marketing perspective, being able to see what a customer adds to their basket and later removes can provide useful insights—helping us to strategize offers or discounts for items they show interest in.
00:15:02.680 This kind of architecture is known as Event Sourcing. As Martin Fowler noted, it captures all changes to an application state as a sequence of events. Instead of storing the actual current state of our objects, we record only the events.
00:15:41.720 This opens opportunities for various solutions, including the ability to travel back in time. We can reconstruct the state of our basket from yesterday, two weeks ago, or even a year back. Furthermore, we can simulate new events that may happen in the future, essentially projecting different scenarios.
00:16:22.360 The general architecture of event sourcing looks somewhat like this. It resembles CQRS and is a bit more complex. In this implementation, the presentation layer sends commands to a Handler, utilizing a message bus (like RabbitMQ or Redis) to decouple the Rails application from the command handling operations. We can have multiple instances handling groups of events.
00:17:03.200 The Handler pulls information from the event store, reconstructs the current state by reapplying past events, and calls methods using the Domain Model pattern. Notably, every method call produces new events that reflect changes.
00:17:49.840 These events are captured by a normalizer, which prepares the data for reading, while the query service reads from the database to expose data to the user. We essentially have an event store and the actual database.
00:18:35.280 Benefits of this architecture include encapsulation. We can model our domain without having to worry about storage since we only store events. Our objects typically do not have accessor methods; they contain methods that perform operations based on state. As such, our Domain Model becomes fully encapsulated.
00:19:31.680 With this architecture, managing storage becomes simple. In the forthcoming examples, I will use MongoDB for storing events. The overall performance and scaling are enhanced because we can spin up more instances of our handlers to process more commands in a given timeframe.
00:20:38.480 Additionally, testing becomes more straightforward since the units we test are the Domain Model objects themselves. This allows us to easily verify that the expected events are raised correctly. We can collect a considerable amount of information in our event store for future use, even retaining data that is not actively utilized but may become relevant later.
00:21:39.440 Integrating with other services also becomes easier, as we can use events for integration rather than relying solely on the database. For example, when needing to communicate with an invoicing system, we can create a consumer that listens for relevant events and executes necessary operations.
00:22:06.400 While there are numerous advantages, some cons do exist. Establishing this full-fledged infrastructure can be quite complex, especially for simple applications consisting of a limited number of classes. The costs associated with setting up messaging systems, two distinct databases (one for querying and another for event storage), can outweigh the benefits in less complex applications.
00:23:06.640 Handling long-lived objects can also be tricky, as they need to be reconstructed from extensive event histories, which can become time-consuming and costly.
00:23:50.840 To mitigate this, we often take periodic snapshots to contain the state of long-lived objects, which allows us to rebuild state from a specific point while keeping processing times manageable. We can then rebuild the entire state tree from the snapshot.
00:24:17.440 Lastly, the lack of a traditional database means that we need specialized tools to manage events. Unlike a conventional database where we can easily perform updates through SQL queries, we must build our own scripts and methods for dealing with events in the context of our application.
00:25:03.840 I have attempted to build this infrastructure into a sample Ruby on Rails application. It has been in production for nearly a year, and I aim to convert an existing C-based application into this architecture using Ruby. I believe Ruby, being an expressive and compact language, allows for simpler development than static languages like C or Java.
00:25:51.440 The mini application I created focuses on basket management, using Redis for communication, SQL for querying, MongoDB for event storage, and SQLite for the read database. The design follows Domain-Driven Design principles, using plain Ruby objects without infrastructure overheads.
00:26:36.880 As an example, when the controller receives a POST request to add an item to the basket, it simply sends a command of type AddToBasket with the correct parameters.
00:27:07.440 The Handler retrieves the current state of objects from the event store and applies the necessary changes. The command processing results in raising an event that informs our system of a change in the basket's state.
00:27:39.440 Once the basket processes the add item event, it collects uncommitted events before persisting them in the event store and notifying interested parties through the publishing of new events.
00:28:24.760 The event store is essentially a MongoDB list containing details of all the events related to the basket's state change, along with their corresponding metadata.
00:29:06.720 The normalizer listens for those events, updating the read database (SQLite) to ensure the data is readily accessible for displaying to the user.
00:29:55.480 The process we just discussed encompasses commands, event storage, handling commands, feeding into a normalizer, and finally presenting those states back to users. It might seem lengthy, but much of this is infrastructure code, which simplifies the addition of features as the infrastructure stabilizes.
00:30:42.080 Lastly, transitioning from thinking in CRUD to focusing on business operations is key. Users do not care about your database; we must understand the differences between reads and writes and separate the pertinent operations.
00:31:18.720 Domain models should be constructed with plain old Ruby objects, without the complexities associated with ActiveRecord or other base classes. CQRS and Event Sourcing prove valuable in complex enterprise applications, and Ruby offers expressive constructs that can enhance development efficiency.
00:32:12.640 You can find the code for the simple application I developed online. I welcome your feedback to improve the quality of the code. Feel free to approach me with questions or suggestions as we can explore the code together.
00:33:15.440 Thank you!
00:34:08.760 Questions? Hi! I find this architecture quite interesting, but I would like to know how you deal with atomicity and data consistency.
00:34:14.200 One thing I didn’t mention is that our system is eventually consistent. When a user clicks 'add to basket', it takes some time for the action to reflect accurately in the state of the basket.
00:34:27.040 The atomicity of operations is guaranteed by the Handler and the aggregate models, thus ensuring that actions are performed correctly.
00:35:00.080 If errors occur, we manage them by notifying users at the top of the interface that an operation failed.
00:35:31.040 The important part is that we never rewrite history; once an event has been written, it remains as part of the historical record of actions.
00:36:19.680 We can modify the event store when necessary, rerun all relevant events, and ultimately rebuild the state.
00:37:01.920 As we use a denormalized database design, data duplication is common. Still, it's essential to maintain the truth of our application's state through events.
00:37:45.120 I encourage open conversation—please feel free to ask about any other topics or clarifications.
00:38:19.800 Thank you once again!