RailsConf 2019

ActiveRecord, the Repository Pattern, and You

ActiveRecord, the Repository Pattern, and You

by Craig Buchek

In the talk titled "ActiveRecord, the Repository Pattern, and You," Craig Buchek discusses the complexities and limitations of ActiveRecord, the ORM (Object-Relational Mapping) layer used in Ruby on Rails. He highlights the dual functionality of ActiveRecord, which combines domain modeling and database access, advocating for a clearer separation to adhere to the Single Responsibility Principle (SRP).

Key points discussed include:
- Impedance Mismatch: Buchek explains how Ruby objects and SQL databases operate on different principles, leading to complications when attempting to map them efficiently.
- ActiveRecord's Size and Complexity: He notes that ActiveRecord encompasses about 40% of Rails and contains many instance and class methods, which can overwhelm developers, especially in larger projects.
- Separation of Concerns: The speaker argues that ActiveRecord's design conflates persistence logic with domain logic, making it harder to maintain and test.
- Alternatives to ActiveRecord: Buchek explores various alternatives like Sequel and ROM (Ruby Object Mapper) but finds them either too complex or incompatible with Rails operations.
- Repository Pattern: He explains the repository pattern as a more structured approach, emphasizing the separation of domain objects from database interactions, improving testability and maintainability.

Throughout his talk, Buchek shares insights from his experience and struggles with ActiveRecord, ultimately revealing his development of the "active record repository" gem, designed to facilitate a more modular use of ActiveRecord by preserving its features while decoupling the domain model from persistence concerns. His conclusion stresses that while ActiveRecord is powerful and prevalent, adopting a repository pattern can lead to cleaner, more manageable code, particularly as applications grow in size and complexity.

The session concludes with an invitation for attendees to contribute to his project, sharing links to the repository and his other works. The overall takeaway is that while ActiveRecord is widely adopted, there are practical architectural strategies developers can employ to improve their Rails applications, particularly through the use of repositories to maintain clarity and function separation.

00:00:20.840 All right, I had a minor tragedy: I lost my dongle for my clicker, so I'm a little bit late, but let's get started.
00:00:24.330 Welcome! Today, we're going to dig into ActiveRecord with a couple of detours.
00:00:30.509 In the lower right, there's a link to the slides if you want to follow along. If you press 'P', you can see some presenter notes.
00:00:37.500 I also have some links and notes that I won't cover in the talk. My Twitter handle is in the lower left corner, feel free to tweet at me or about me using the hashtag #RailsConf.
00:00:50.089 We'll focus on some major issues with ActiveRecord, look at some alternatives, discuss why you might not want to use those alternatives, and then talk about a potential solution and a pattern that I think we should all be using.
00:01:04.979 Who here uses ActiveRecord? Okay, most of us! Who hasn’t used ActiveRecord? I don’t see any hints for that.
00:01:19.289 Who’s used a different Ruby ORM? A couple of hands. How about a non-Ruby ORM in a different language? A couple more hands.
00:01:27.720 Who loves ActiveRecord? A few hands. Who hates ActiveRecord? Anyone else have a love-hate relationship like I do?
00:01:41.130 All right, actually, more with that.
00:01:47.550 ActiveRecord is the 800-pound gorilla, and odds are if you're going to work on Rails, if you get hired to work on Rails, you're going to be using ActiveRecord.
00:01:57.239 First, I want to make sure that everyone knows what a norm is. Ruby deals with objects, obviously, and SQL databases deal with relations. It's actually something called relational algebra that they work with. Sounds pretty cool!
00:02:18.360 I'm not sure there's any mathematical foundation for NoSQL databases.
00:02:28.500 So, Norm maps these two sides together; it maps between objects and relations. Note that there's an impedance mismatch between the two sides. What works well on one side might not work well on the other.
00:02:33.269 Some straightforward data structures can’t be mapped one-to-one. One canonical example of this is a tree structure, which is really easy to do in OOP, but there are several different ways to represent it in relational algebra, making it difficult to map between those two.
00:02:46.290 I gave a talk at RubyConf 2015, where I went more in-depth on what a norm is. I sort of found the essence by building one in 400 lines.
00:03:07.140 So, Rails' ActiveRecord is based on the ActiveRecord pattern. Here’s Martin Fowler's definition: I’m not sure if he's the first one to come up with it, but he documented it in the Patterns of Enterprise Application Architecture.
00:03:19.200 Note that he lists three separate things here: wrapping the data of a database table, encapsulating database access, and adding domain logic.
00:03:30.569 You could argue that wrapping and encapsulating are pretty much the same thing, but domain logic is clearly a separate concern.
00:03:38.280 Having that in there indicates that we might be violating the Single Responsibility Principle (SRP). So here’s a UML class diagram of the ActiveRecord pattern.
00:03:46.260 Note that there are two different kinds of things going on: 'find' and 'save' deal with persistent storage, while 'name', 'agent', and 'address' deal with domain logic.
00:03:49.619 The biggest problem I have with ActiveRecord is that it encourages bad engineering project habits, mostly because it violates the Single Responsibility Principle. It co-mingles persistence with domain logic. Separation of concerns is important, just like Rails separates the MVC (Model-View-Controller) into separate concerns.
00:04:09.980 We should probably be doing the same with the model itself. As your project gets bigger, ActiveRecord's flaws become more apparent.
00:04:24.570 I find that when I get to about 12 to 20 model classes, it starts to hurt, whereas if you’re below that, it probably doesn’t really matter to you.
00:04:47.550 ActiveRecord is big; it's about 40% of Rails, and that size is another symptom of the Single Responsibility Principle possibly being violated. It tries to do too much in one place and conflates multiple concerns.
00:05:14.180 I am showing some stats here from Rails 5.2.3. A model with one field came in with over 200 instance methods and 600 class methods. For comparison, the Object class has about 86 methods, while String and Array have about 250 methods.
00:05:40.280 So, the number of instance methods is pretty high. You're not going to remember most, if not many of those methods. Granted, some of those are dynamic, but you know if it has a field, it's going to have certain related things.
00:06:01.360 The number of class methods is really concerning. Class methods have several issues, which I think I will cover later.
00:06:03.890 All right, another frustrating thing I found about ActiveRecord is that relationships or associations are defined in the model, like your 'has_many' and 'belongs_to', but attributes are defined in the database schema.
00:06:06.999 I think that's a terrible abuse of the DRI (Don't Repeat Yourself) principle. DRI states there should be only one place to look for any piece of information. I feel like attributes and relations are similar kinds of things where you should look in one place for both.
00:06:16.110 Putting related things in different places seems really counter to the DRI guidance, so you have to look in two places for all the details about a model. This is a case where there’s too much magic for me.
00:06:38.120 There are some workarounds like model annotations, and there is an Atom package to show a toggle displaying the model's attributes from the schema, but unfortunately that's currently broken for me.
00:07:05.219 The attributes API actually came out in 4.2, but it wasn't publicized until Rails 5, yet we have to use it, and hardly anyone does.
00:07:20.412 Does anyone here actually use annotations in their ActiveRecord models? Decent number. All right, that's good.
00:07:28.350 I also released a couple of gems, actually, to let you define active record models before the attributes API was available. One was called Virtus. Has anyone seen this talk?
00:07:45.150 Well, you know the Bob architecture in the last year—a couple—anyone lucky enough to have been there?
00:08:01.349 Well, another person was there. I'm the person that asked him a question: can you show us some code? He said you have to figure it out yourself.
00:08:19.350 So, you can't hear me in the video. It's a seminal talk. Oddly enough, it was only given once at a regional conference. I’m not sure why. Maybe because he didn’t provide those details and I called him out. I don’t know, probably not. He doesn't usually seem affected by that.
00:08:50.970 Since that talk, and probably even earlier, I've struggled to find a way to implement all those architectural suggestions.
00:09:01.480 My last project used the Interactors gem to handle that part. It's actually on the chart there, in that diagram.
00:09:10.110 It separates the Rails controller from the business logic, according to this talk. The fact that our app is a web app is sort of incidental.
00:09:19.260 So, we should have the business logic separate from that incidental delivery mechanism.
00:09:23.700 Interactors gives us that, and there’s a pretty good Interactor gem that works well for that part.
00:09:35.380 But I've never found a great way to split entities in the database. In the previous slide, you can see entities on the right side and a database or any gateway in the database.
00:09:59.470 This is the quest that I’ll be talking about today.
00:10:01.960 So, after almost ten years from that talk, Uncle Bob wrote a book on the topic called Clean Architecture. It's a pretty good book, but it really doesn't help me with this problem.
00:10:10.750 It doesn't get into the details. Uncle Bob also has a blog article called 'Clean Architecture' that provides a succinct explanation.
00:10:14.290 The first stop on my quest is Sequel. I will not pronounce SQL as Sequel because then I would get really confused.
00:10:30.490 This is the biggest surprise I found when I did research for an earlier related talk. This was written by Jeremy Evans, and I don’t see him here today.
00:10:59.440 Sequel has tons of plugins and leverages a lot of database features, especially for Postgres. It supports almost any SQL database you can think of.
00:11:16.560 I really like the documentation.
00:11:19.410 Sequel has two different APIs so that you can use it: the dataset and the model API.
00:11:38.880 Here’s the code used to set up Sequel for the next couple of slides. Not a lot going on here.
00:11:42.160 Pretty much just creating a table since the Sequel syntax is really nice. Here’s the dataset; look at line four. Note that the block lets you use bare column names, which is pretty cool.
00:12:06.960 ActiveRecord does not let you do that, although there is a gem that adds that sort of feature. The problem is it doesn't stay synced up with ActiveRecord as well as I'd like, and I've run into a few other bugs and problems.
00:12:22.670 The dataset is enumerable, with each element as a hash-like object. You can see that being used in line five. I haven't come across anything that Sequel can't do, which is pretty cool.
00:12:44.800 It just doesn't fit the pattern that I'm looking for.
00:12:53.820 So here’s the higher-level API: the Sequel model, that's using objects instead of just a hash-like object. You’d probably be more likely to use this layer in Rails as we like to apply object-oriented programming.
00:13:08.560 Like ActiveRecord, attributes are derived from the database schema, but similarly to ActiveRecord, relationships have to be specified manually. Again, that’s something that frustrates me.
00:13:24.680 I really like Sequel. I wish ActiveRecord was more like Sequel, actually, but Sequel doesn’t solve the problem I’m trying to address.
00:13:39.610 So the next stop on my journey is ROM: the Ruby Object Mapper. This was originally implemented as a Data Mapper. Does anyone remember the Data Mapper library? Some people, decent amount.
00:14:07.280 Originally, this was Data Mapper 2, and in 2013 they renamed it to ROM. In 2014 they moved away from object-relational mapping altogether, so it's not really technically a norm.
00:14:20.060 It just maps the data and not the objects. Most of the work was done by Peter Saluski. I'm not going to try to pronounce it in his language.
00:14:38.700 It's similar in spirit and partly inspired by Elixir's Ecto. Anyone use Ecto? All right, good number.
00:14:55.360 So you guys might find this a little more palatable than I do.
00:15:07.780 Peter Saluski also formerly wrote Virtus, a really nice attribute declaration library. ROM is a bit complex to use; it has commands, relations, and mappers, and you have to buy into this completely different paradigm and mindset.
00:15:25.850 ROM's developers are responsible for the dry-rb libraries, which we'll actually talk about a little more. It's really good as small independent low-level composable libraries.
00:15:41.650 Some of the leaders of this movement towards functional programming and immutability in Ruby are part of the dry-rb and the ROM team. I find them to be a bit too focused on the low-level details.
00:15:59.160 I think that's why it takes them a long time to get their product out. But once it's out, it’s really high-quality code.
00:16:14.820 ROM relation looks pretty straightforward. We have a model class called User, which is empty, and then we have a Users class, which is a ROM relation, and we define the attributes in a schema block.
00:16:37.210 Then we have the associations in a sub block of that. I kind of like that; that's nice.
00:16:52.960 We could tell ROM to pull the schema from the database; we would replace the schema block with a schema and set it to true. But then we wouldn't see all the attributes, and this seems to be the preferred way.
00:17:01.750 To save an object, we start with the relation; we create a relation object and pass it to a change set. This feels really familiar.
00:17:14.440 When that collection includes the create or update, I don't know what happens if you get it wrong, and it passes all the attributes as a hash.
00:17:29.589 So like I said, we’re not really dealing with objects. You’d have to convert your object to a hash if you're dealing with a Ruby object; and then we have to explicitly commit the changes.
00:17:47.940 I think that might be a nice feature, not sure. I found ROM to be really complex. Here’s an overview diagram of their architecture. Honestly, I can't follow everything that’s going on there.
00:18:02.999 I want to like ROM, but I find it too complex and confusing, and I couldn't actually get things set up right to run the code that I showed you in those previous examples.
00:18:15.130 The last stop of my quest is the model layer in Hanami. Hanami is a full web framework and an alternative to Rails. I’ve liked everything I’ve seen; if I had a choice, I might choose Hanami instead of Rails for some side projects.
00:18:37.710 Hanami supports SQL through Sequel, memory, and file adapters. It follows a data-driven design architecture. I will talk about that a little more.
00:18:59.370 So, it has entities which are models without persistence or validations. It has a repository, which is mostly like the class methods in ActiveRecord model classes.
00:19:15.790 So, things like create, update, persist, and delete are all fine.
00:19:34.690 First and last, it has a mapper, which is a declaration of how to map between the database.
00:19:43.250 Here’s the start of a Hanami model. We inherit from Hanami entity, which surprisingly adds only four methods; at least the last time I looked.
00:19:57.440 It adds ID, ID equals, initialize and then a class method called attributes, which we’re using there.
00:20:12.180 The flawed-ish initializer takes a hash of attributes to set all the keys' entity attributes and types come from the dry-types library.
00:20:40.720 So, those types like : : int and : : string are dry types. We could again let the model pull the schema from the database like ActiveRecord, but I don't think that's that common.
00:20:59.530 Persistence is done by the repository class. Note that things like where and order are private. We can only use them within that query on line 3. Queries are analogous in the way they get to scopes.
00:21:15.870 Here, we instantiate an article from the article repository, we created and then we can find it by the author.
00:21:30.930 I think Hanami is my favorite Ruby ORM. If I had a choice, I’d probably use it over Rails, but it's not a very realistic option.
00:21:43.990 It requires everyone on your team to learn something new. If it's just you, that's not a big deal; but if you have a team— I think we have eight developers on my team—it's probably not going to work.
00:22:05.979 Also, you probably wouldn’t want to use it on a project that already has hundreds of models that’s been around for eight or ten years.
00:22:29.350 There’s not much documentation on using it with Rails, nor with the other ORM frameworks.
00:22:49.320 I talked about Rails add-ons that suit your usage of ActiveRecord most of the time, and they may or may not work with another ORM.
00:23:06.429 So, Hanami model implements the repository pattern, which represents a collection of domain objects.
00:23:17.750 In a lot of ways, we can treat the database as an in-memory collection and abstract that even more than we do with ActiveRecord.
00:23:36.000 We do have something similar in ActiveRecord with the class methods to create, the where, the find all. When you create a scope, that’s also sort of the repository pattern, but again, it's stuck in class methods.
00:24:05.800 Those have serious limitations—they look more like procedural code than object-oriented code.
00:24:24.190 It indicates that you’ve missed an abstraction, limits your polymorphism, and it's hard to test and refactor.
00:24:52.320 There’s a good article on Code Climate that talks about all the problems with class methods, and if you check the presenter notes, there’s a link to it.
00:25:14.700 Here’s the UML class diagram for the repository pattern. Note the arrows: the domain model is not dependent on anything.
00:25:31.920 There’s a clear separation of concerns. The domain model class handles the business logic, and the repository class handles persistence.
00:25:48.149 We could end up with more than one repository for a given model. Maybe you want to do sharding, soft delete things in a separate database, or read/write segregation.
00:26:09.290 Perhaps you want in-memory persistence for tests that utilize a different database backing or in-memory backing.
00:26:24.360 You might see this repository pattern with a third class, which is the mapper class that handles the coercion between database fields and object attributes.
00:26:41.430 So, I’ve spent several years looking for a way to have my cake and eat it too. I want to keep using ActiveRecord, but I want to separate my domain model from the database persistence.
00:27:12.250 One Saturday morning I was lying in bed a little late and thinking about it again. Don’t ask! I don’t know why I need to think about those things in bed, but I came up with a solution that I thought could work.
00:27:34.950 In Rails 3, they split ActiveRecord into several modules, and I thought I could use those various modules that ActiveRecord uses and split them into the two sides.
00:27:56.930 The funny thing is, I think I misremembered that; I think it was ActionController that got modularized for breaking it into pieces.
00:28:12.920 ActiveModel did get pulled out of ActiveRecord at that time, but I don’t think they were really meant to be used separately.
00:28:37.210 So, it wasn't quite as easy to make this work as I expected. All the modules have a lot of interdependencies, and there’s no real documentation on how to use each module and what their dependencies look like.
00:29:00.289 But it turned out that the domain model is just most of ActiveModel. So, I ended up calling that ActiveModel Entity when I originally called it ActiveRecord Entity.
00:29:16.540 The repository side is still mostly ActiveRecord, so I'm going to show the difference between using standard ActiveRecord and using ActiveRecord Repository, which is the gem I'm working on.
00:29:42.200 So, here’s a typical ActiveRecord model. This should be pretty familiar to you. We have associations like 'belongs_to', 'has_many', validations, and scopes.
00:30:01.120 And then there are some fields that we don’t know about just by looking at the code, unfortunately.
00:30:23.800 Here’s the same thing using ActiveRecord Repository: instead of subclassing, I’m including a module.
00:30:50.070 This is an interesting little pattern that I found. The module is actually dynamically generated through the call to ActiveRecord or ActiveModel's entity method.
00:31:06.980 So, we can pass parameters, and I'll talk about that some more when I discuss the implementation.
00:31:28.020 Let’s see, the module we’re mixing in is ActiveModel Entity. The term 'entity' comes from Eric Evans' domain-driven design. An entity is an object with an identity.
00:31:46.830 So, we could have two items with the same attributes but different IDs, and those would be considered different entities. If we have two items of the same type that have the same ID, they’d be considered the same thing.
00:32:19.880 There’s actually something called an identity map in ActiveRecord. The other major difference is that we declare attributes here: their names and types, which fixes my second biggest gripe.
00:32:38.930 We still have the 'belongs_to' and 'has_many', but we don't have the scopes. Any instance method we add would be defined here as well.
00:32:59.340 Here’s the repository for that same class. Again, we're including a module instead of subclassing. We can pass parameters; we can pass the model class we’re working with.
00:33:19.580 By default, I’m taking the term User Repository, knocking the 'Repository' off, and assuming it's 'User'. We can specify the database table name if it can’t be derived.
00:33:35.950 We can also specify a primary key, and we could specify a mapping of database column names to entity attribute names.
00:33:54.290 The scope is on this repository side because it deals with the entire collection, not any individual object.
00:34:13.200 Here’s the typical controller with Rails and ActiveRecord. We tell the User model to save itself on line 4, and that will return false if it failed to save.
00:34:29.960 Here’s the same thing with my ActiveRecord Repository gem: only two lines have changed. Line four explicitly tests to see if the model is valid.
00:34:52.280 Then we deal with that; on line five, we tell the repository to save the model object instead of asking the model object to save itself.
00:35:01.370 There’s one caveat: if you have a uniqueness validation that can't be determined until you hit the database, you're actually going to have to catch an exception on the save.
00:35:28.520 Here’s a bit of the implementation of the entity model. I talked about that pattern, the parameterized model pattern.
00:35:51.150 Unfortunately, I think this is the simplest implementation possible, but basically we create a list of modules that may vary depending on what we passed.
00:36:12.020 Then we create a module composed of those modules. The self-composed model module is not important to understand, but it’s important to note that we’re taking several modules and composing them together.
00:36:30.720 This, as I said, allows us to pass parameters.
00:36:51.160 I previously called that ActiveRecord Entity, but we’re not using anything from ActiveRecord, so I changed that.
00:37:16.030 You can see that we are just including and extending ActiveModel modules; that’s hard to say.
00:37:38.580 So here’s the repository side again. I’m using the same parameterized module pattern, but I haven’t implemented anything on this side yet.
00:37:55.580 This side is all ActiveRecord, plus some custom code. We’re mostly ensuring that ActiveRecord still works despite the parts I’ve taken away.
00:38:13.839 It still utilizes most of ActiveRecord, but not quite all.
00:38:32.370 We’ve got some helper methods that are calling ActiveRecord. This method lets you do user :: repository dot save and then pass a user object.
00:38:44.310 This one is a bit tricky: we have to create an ActiveRecord model object temporarily to save and then update the entity's ID when we save to indicate that the entity has been persisted.
00:39:17.000 This is an implementation of ActiveModel's persisted question mark, which I think is required to be an ActiveModel citizen for Rails.
00:39:45.220 There are quite a few challenges—more than I expected—probably due to the fact that I misremembered which things kept modularized.
00:40:01.120 It didn't occur to me for a while how to separate the modules. It turned out that the entity side is all ActiveModel.
00:40:31.040 The repository side is all ActiveRecord. We're not subclassing ActiveRecord, and that turned out to be really tricky.
00:40:53.340 I spent hours trying to fix this. ActiveRecord uses that to figure some things out and includes info about the connection to the database.
00:41:10.540 I also had to tell ActiveRecord that the repository class is not an abstract class.
00:41:26.730 Currently, I’m fighting with ActiveRecord relations and getting an error that doesn’t seem to be related to the code I added.
00:41:44.520 This makes it really hard to troubleshoot.
00:41:56.180 So, I still have a lot of work to do to make this usable. Please do not use this in production.
00:42:10.750 I’m not going to use it in production. I’m not sure I’ll even get to that point, but it was fun and interesting to learn.
00:42:27.859 Maybe I can make it work. The main part is testing how the relations work, like cascading deletions, loading, or auto-loading—all the relations—and mapping them to objects.
00:42:50.289 We could automatically create migrations because we have all the data we need in the model class, all declared there.
00:43:10.230 I think the only thing I’m missing right now is indexing. Data Mapper actually had that option.
00:43:28.890 If you’re into migrations, go see Metas in Ski’s talk on migrations right after this.
00:43:47.509 A teammate and colleague of mine covers a lot of gotchas with migrations. I plan to look at those gotchas if I do get to automating migrations.
00:44:01.420 That’s in the next time slot over in room F.
00:44:32.640 I need some help from all of you. If you're interested, please go star the repo on GitHub so that I know if people are interested.
00:44:52.059 The more people are interested, the more likely I am to complete the project.
00:45:06.170 I’m easy to find on the internet or in person. I’ve got the Weed Maps t-shirt on today; I made the repo and the talk easy to find.
00:45:28.010 I have links for everything on the last slide: links that kind of link back to each other.
00:45:52.400 I’d like to thank you all for coming and watching, especially my co-workers who watched the previous talk and provided some valuable feedback.
00:46:03.270 If you liked listening to me, I do a podcast on Agile called 'This Agile Life'. We do it semi-sporadically.
00:46:15.120 I’m not always on, but we have resurrected it and are recording podcasts again.
00:46:33.740 A big thank you to my employer, Weed Maps, for sponsoring this talk.
00:46:36.880 There’s about 20 of us here; most of us have t-shirts on, and we are hiring big-time.
00:46:56.420 Come see us at our booth—we'll have t-shirts; I’m not sure exactly which ones yet.
00:47:15.020 The source of the presentation is on GitHub in my presentations repo. Easy to find.
00:47:35.360 There’s the link to the ActiveRecord Repository gem; you can also find that on my GitHub page, near the top.
00:47:42.869 Feel free to stop by in the hall if you have any questions.