Ernie Miller

Summarized using AI

An Intervention for ActiveRecord

Ernie Miller • June 25, 2013 • Portland, OR

In this talk titled "An Intervention for ActiveRecord," Ernie Miller addresses the complexities and shortcomings of ActiveRecord, a key component of the Rails framework. He emphasizes that while ActiveRecord offers simplicity and convenience for database interactions, it also harbors numerous issues that can confuse both new and seasoned developers.

Key points discussed include:
- Understanding ActiveRecord: ActiveRecord is more than just a design pattern; it's a comprehensive library that encapsulates database access while adding domain logic to data.
- Complexity and Size: With over 15,000 lines of code, ActiveRecord's size and the multitude of its methods make it daunting. Many developers lack the deep understanding necessary to utilize it effectively, leading to common pitfalls.
- Magic and Conventions: Miller highlights the 'magic' involved in ActiveRecord, which can be beneficial but often leads to unexpected behaviors and inconsistencies. He uses humor to describe 'douchebag magic,' where certain functionalities can deceive users without offering real benefits.
- Confusing Behaviors: Through various examples, he explains how certain methods like initialize or association methods (e.g., has_many :through) operate in ways that might not align with user expectations.
- Testing and Validations: He warns that tests may not be reliable indicators of functionality, referencing a personal experience where a well-passed test led to broader issues. He advocates for using inverse_of to manage associations effectively to improve validation integrity.
- CallBacks Caution: Miller discusses how the callbacks feature in ActiveRecord can lead to callback hell, complicating the codebase further.
- Legacy and Community Involvement: He concludes with a call to action for developers to contribute more to ActiveRecord, whether by writing documentation, improving tests, or engaging in issue triaging to better the library for future generations.

Ultimately, Miller aims to inspire a more profound understanding of ActiveRecord and encourage developers to contribute positively to its development, enhancing the experience of all Rails users.

An Intervention for ActiveRecord
Ernie Miller • June 25, 2013 • Portland, OR

Let's be honest: ActiveRecord's got issues, and it's not going to deal with them on its own. It needs our help.
Don't think so? Let's take a closer look together. We'll examine the myriad of perils and pitfalls that await newbie and veteran alike, ranging from intentionally inconsistent behavior to subtle oddities arising from implementation details.
Of course, as with any intervention, we're only doing this because we care. At the very least, you'll learn something you didn't know about ActiveRecord, that helps you avoid these gotchas in your applications. But I hope that you'll leave inspired to contribute to ActiveRecord, engage in discussion about its direction with the core team, and therefore improve the lives of your fellow Rails developers.
WARNING: We will be reading the ActiveRecord code in this talk. Not for the faint of heart.

Help us caption & translate this video!

http://amara.org/v/FGa8/

RailsConf 2013

00:00:15.980 Hello, everybody! How's it going? Great! Let's get started. I have a lot of ground to cover and not a lot of time to do it.
00:00:22.470 My name is Ernie. I work for LivingSocial, and I've written a whole bunch of ActiveRecord-related gems that you may have used in the past. I've spent a lot of time working on integrating with ActiveRecord in various ways.
00:00:32.730 I've written Squeal, Ransack, MetaWare, MetaSearch, Valium, and several other gems. As I mentioned, I work for LivingSocial, and we are hiring! I would certainly recommend you swing by and talk to me either after this talk or sometime for the remainder of the week if you're interested.
00:00:48.480 This slide is totally not for you; this is my mom. She thinks I'm awesome and is psyching me up for the talk.
00:01:01.739 Now, I don't know how many of you actually read the talk descriptions, but there is one very important warning. You may not have been aware that there is going to be a lot of ActiveRecord source code in this talk—seriously, a lot—and we're going to be moving very quickly.
00:01:12.930 Things might get a little crazy. In fact, I would go so far as to say that my goal is not simply to teach you about ActiveRecord internals today, but to fit as much crazy as possible into the next 40 minutes or so. So, I’m warning you upfront: there is time to escape. If you leave now, I won’t be offended. If you leave in five minutes, I’ll be a little hurt, but I think I’ll get over it.
00:01:32.100 So, what is ActiveRecord? Well, first off, it's a design pattern. As you're probably aware, Martin Fowler describes it as an object that wraps a row in a database table, encapsulating the database access and adding domain logic on that data. He emphasizes the simplicity present in the ActiveRecord pattern and how easy it is to understand.
00:01:56.280 ActiveRecord is also a library. Here's a simple example of using ActiveRecord, which has just a few attributes: name, email, and timestamps—ID, of course. Once you have that model in place, you have 57 ancestors to that model, most of which are modules included in the ActiveRecord::Base subclass.
00:02:09.869 You get over 500 methods on the class and over 300 methods on the instances of that class. ActiveRecord, the library, also has almost 16,000 lines of code. So, for comparison, ActiveRecord has the primary advantage of simplicity; it's easy to build ActiveRecords, and they're easy to understand.
00:02:29.819 That was the pattern; this is the library. You get the idea. I actually caught some flak for spelling ActiveRecord without a space. The official Rails core team-approved way of spelling it is 'Active Record' with a space.
00:02:37.380 However, I think you'll have to forgive me for saying that I don't believe that ActiveRecord is the same as Active Record. Don't get me wrong; ActiveRecord was one of the first things I saw in Ruby on Rails that made me say, 'Wow, this is amazing magic! I don’t even know how this works.' It sparked my initial interest in Ruby and Rails.
00:03:02.040 How many of you had a similar opinion when you first interacted with ActiveRecord? Almost every hand in the room, right? So, I love ActiveRecord, but— and this is a big 'but'—ActiveRecord has issues. It has a lot of issues. In fact, look at how many issues ActiveRecord has.
00:03:42.959 You want to hear my theory on this? I have a theory about why ActiveRecord has so many issues. It’s not just the code size; I think it's because nobody understands ActiveRecord. You get issues when people write code that may not really be issues; they just don’t understand how the library works. Because nobody understands ActiveRecord, there's no one who can actually say, 'Hey, this issue is totally bogus.' So they sit around and complain.
00:04:23.120 I know most of you in the room are probably saying, 'That's right! Nobody understands ActiveRecord. I sure am glad I'm the exception.' If your face is not up here, you are not the exception. That's forgivable, right? Here's a comparison of all the various libraries that are part of Rails.
00:04:50.660 I took the liberty of splitting Action Pack into Action Controller, Action View, and Action Dispatch. ActiveRecord, barring ActiveSupport, is easily twice or more the code size of just about any other part of Rails. In fact, if you throw out ActiveSupport, it’s more than the code size of everything else left.
00:05:09.360 From the documentation, one of the design philosophies behind ActiveRecord is that magic is not inherently a bad word. I agree: magic is not necessarily a bad word, but there’s a kind of magic you experience at your kids' birthday party, where a magician pulls a rabbit out of a hat, and you're amazed and delighted.
00:05:45.600 Then there's this other kind of magic, which I refer to as 'douchebag magic.' This is the kind of magic that takes your delicious orange soda, turns it into Cheetos, and then puts that soda in your friend's mouth, making them spit it out. That's what douchebag magic does: it’s the unexpected stuff that surprises you, and it's not even for your benefit.
00:06:05.360 The trick with magic is that it requires conventions, and conventions require opinions. Believe it or not, not all opinions are created equal. The thing about it is that you have to make trade-offs whenever you exercise an opinion in something, and those trade-offs will determine how other people interact with your library.
00:06:21.050 I have a quick two-step process for how to create a learning curve if you're writing any kind of API: step one, find things that your users already understand; step two, ignore those things. For instance, you might think you know how to initialize an object. If you initialize an object as a subclass of something else, you're used to saying you want to provide some defaults to that object.
00:06:52.230 You might say that if I initialize a user and nobody has a name, then I'll go ahead and give it 'name not provided' as a value so that shows up in my view or whatever. You try it, and it works: you create a user, and you see 'name not provided.' That's what you expect, right? But then you find it from the database, and you get nil, and you're like, 'What the heck?!' As it turns out, this is because the Rails way to handle this situation is to define an after_initialize callback.
00:07:13.970 You learn this as a Rails newbie, and you say, 'Alright, I guess I'll take that.' It works, and it does, but have you ever wondered why that happens? Some trade-offs were made. ActiveRecord::Base has a standard initialize method—it's lengthy, but it does what it needs to do. Below the initialize method, however, is a method called init_with, and it has some bonus magic.
00:07:36.060 If you find a record out of a database and select additional attributes from the table you're selecting from, your instance of class user can have methods that match the names of the columns you just selected. That means you can have two instances of class user that have completely different methods based on what was selected.
00:08:06.620 The only place that init_with is called in the entirety of ActiveRecord is in persistence.rb, in a method called instantiate. What we do is we bypass the normal allocate-initialize pattern we would do with a normal call of new against an object, and instead, we allocate and call init_with, providing attributes and the columns.
00:08:31.310 Now, instantiate is called in many places, but the most obvious is within find_by_sql. When you find records, this is how they become hydrated into ActiveRecord objects. It's also used whenever you load associated records from an eager loading query or something like that.
00:09:00.589 The trick to this stuff is that you need to understand enough of the code to see the trade-offs being made and maybe question them. For instance, I wondered what would happen if we had a build method on the class intended for creating new records, and new just behaved like new, allowing us to overwrite initialize.
00:09:42.440 You have to kind of understand the trade-offs they made. Certainly, they knew the trade-off was allowing everyone to create a new object automatically, meaning it would be an unforeseen object. Associations are totally simple until they are not, so let's take a pop quiz here.
00:10:03.500 Let's say we're developing the next awesome blogging engine—because the world needs another one of those—and we define this assigned posts method. The intent is that we can assign one or more posts that may already exist to the user, ensuring we don’t assign posts to another user without their approval.
00:10:19.200 So we add those posts to that individual and ask whether the user ID of the post matches the user ID of the person. What's your answer? Hope you have it ready. The answer is: I have absolutely no idea!
00:10:38.490 I’ll give you bonus points if you say, 'I’m not even testing the right attribute.' Here's why: depending on whether the user is persisted or not, you may or may not auto-persist the posts at the time you assign things.
00:11:06.710 Now, I understand the intention is to simplify things by automatically persisting the already-persisted objects, but you also lose the chance to modify those objects once they've been added, and the behavior becomes inconsistent and unpredictable.
00:11:24.300 Another fun aspect of associations is that you can use them in joins. You're probably familiar with this syntax: you have a posts association and also a published post association that adds some conditions. You can say 'user joins posts' and apply conditions on the post table, but when you try to join published posts, you can't do that.
00:12:03.860 Why can't you? It’s because of this little guy right here called the predicate builder. Whenever you build a where clause, it gets invoked and builds from a hash. This means that at the point of instantiation, it creates a table from the key.
00:12:33.960 What you’re really saying is that you need to know the table name, which is 'post.' That's fine if you only have one join of post, but if you start joining posts and then maybe published posts, or if you have a parent-child relationship and you're joining the parents multiple times, you have no way to know what the table name is at the outset.
00:13:03.560 There are ways around this, though. In Squeal, for example, I actually defer creating these tables until we run the query, accounting for any subsequent joins. However, there are drawbacks to that, and we'll discuss those later.
00:13:19.650 So then, maybe you learn that you want to have multiple roles on a post; maybe you have an author and an editor. You learn how to use 'has many through,' creating a rich join model that has roles, and you validate the presence of those roles.
00:13:54.550 Right away, you say, 'Okay, I'm going to create a new post by setting the current user,' and it fails. You say, 'Oh well, it failed to save. Oh duh! I need to set a role, because I’m validating that.' So you decide to create the association, thinking the join model should exist because it’s getting persisted in the database.
00:14:28.540 And you are wrong; it doesn't. So you are left with the laborious task of building the join model first and then creating the post associated with that, finally saving the join model.
00:14:59.180 That sounds okay, but something feels off about it. Then some genius comes along and tells you that you can add additional associations with authors or editors or whatever using conditions to create your associations.
00:15:14.770 You can now associate those records through that association, and you would get the required behavior. Great! But this assumes you know all the roles you might want to assign. Most people stop there and think, 'That’s good enough.' They end up with like 20 different macros at the top of the class for every possible role.
00:15:46.000 So I got curious about why this was happening and started digging in. I discovered that the 'has many through' association, which inherits from the 'has many' association, is not actually creating its own build-through record.
00:16:25.370 So I replaced a non-existing 'concat_records' that was merely using one from 'has many' associations, and I built the through record. Seems pretty sane, right? I ran the tests, they passed, submitted a pull request, and it got merged.
00:16:49.900 Then it got cherry-picked for release 3.2.9, and everything seemed awesome. How many people know what happened next? Everything was terrible! If you upgraded to 3.2.9, it was a security release, and if your associations broke, you can come see me later for a hug.
00:17:15.370 What I learned from this is that tests lie a lot, and we can't always trust them. Even though all the tests passed, the experience was quite unpleasant for many people.
00:17:53.930 So what I ended up discovering was that a lot of folks who were doing their validations like this had validation on the join model itself, validating those belongs to relationships by checking the ID.
00:18:11.850 So when they created a new post, they would add the contributor and hit save, and nothing would happen. It didn't work, so we dug into it and discovered it was because the contribution was not valid.
00:18:29.330 It's not valid because, at that point, the post has not been saved yet, so it does not have an ID, and therefore we can't set the ID.
00:18:45.420 How many of you know what 'inverse_of' does on your associations? Less than I even hoped for. 'Inverse_of' is the oldest ActiveRecord feature that you aren't using, and you should totally use it.
00:19:00.900 Here's an example: user equals users.first. You find the first user and then the first contribution. You might think you would compare the first user’s first contributions user with the user who owns the contribution and get object equality. No, you wouldn't get object equality; that would make sense.
00:19:22.090 What you actually have is that after loading the contribution, when you access the user association on that contribution, you're hitting the database again. You're round-tripping to the database, bringing back a brand new instance of the user.
00:19:44.600 This means the object equality doesn’t match. It's pretty easy to fix this—this has been around for quite a while—and it's so useful that it was backported to 3.x. If you stopped before that version, you can still use 'inverse_of'.
00:20:05.540 Here's how it works: in the particular case of your post and your user, you define what the inverse is. When you load the association, before it gets returned to you, it sets the inverse to the same in-memory instance of the object you're concerned with.
00:20:31.440 So that contribution, when it has many contributions, will have the post or the user set whenever you load it. Thus, you can start validating the presence of the actual user object and post. While 3.2.9 and onward would have worked for you, what I think is essential about this is relatively deep.
00:20:56.710 Validations live in Ruby; therefore, they should being concerned with Ruby objects. Association IDs are an implementation detail that nobody in this room needs to worry about. You should care about the presence of the object.
00:21:25.870 In fact, if you start thinking about it differently: how do you know the record you are actually joining to is really present? Just because it has an ID value, you don’t know; some random gremlin could have deleted that record from the table.
00:21:50.230 You might think, 'But wait, what will happen? What about my read-heavy scales?' You'll incur a hit every time you validate. That's true; you will load the associated record, but you won't incur that hit every single time you load associations from your associated records.
00:22:22.780 You say your user has many posts: you load a post, and now you have the user available without needing to round-trip to the database again. Trust me, you'll endure more pain from not validating things than an occasional extra hit to the database during validation.
00:22:54.230 Speaking of validations, my favorite validation is 'validates_uniqueness_of'. The thing I love about it most is that it actually doesn't validate uniqueness of. You don't have to take my word for it.
00:23:19.680 I actually got the chance to have dinner with DHH last night, and I told him that this is my absolute favorite comment in all of the ActiveRecord source code. In the uniqueness validation comments, it says it doesn’t really validate uniqueness; it’s worded in a way so that you might not notice.
00:23:47.670 But I’m telling you now, it doesn’t. 'Validates_uniqueness_of': you had one job, seriously.
00:24:01.210 More helpfully, it does explain that you should probably go ahead and add a unique index to your database table. That would appear like this.
00:24:32.600 People often say, 'Oh, I get nice validation error messages now. I can see that my email was already taken.' Here's the catch: when you're doing that, you're running a round trip to the database to find out whether or not it’s potentially unique.
00:24:51.430 This is another one of those instances where, if you think about my deep thoughts on validations, you’d say the email shouldn't need to have knowledge of whether or not it’s unique among other emails. That’s the database's job; the database handles relations, and the email is just an object.
00:25:15.930 Here’s what you could do instead: if you override save for the instance, you can call super and rescue a handy little error that’s raised specifically if your record is not unique by ActiveRecord. You can add the same validation error message you'd normally have, and you don’t need to round-trip to the database twice to do it.
00:25:35.930 It works, and it does validate uniqueness, which is probably the point in the first place. So you might think, 'That’s great, but what if I have multiple uniqueness constraints, or what if I have a scope or whatever?' Well, I’ve got you covered there, too.
00:26:06.480 This takes a little more effort. I'm going to put this code up anyway so that you can investigate it further. I don’t have enough time to delve too deeply into it; assuming you're using PostgreSQL, you have a relatively sane error message that comes back when you run into a uniqueness constraint violation.
00:26:38.840 You can parse that out and see what columns are the issue. I mentioned earlier that we had some problems with Squeal while I was trying to defer handling of association conditions.
00:27:06.200 This issue arises because of 'Rate Correlation Merge.' How many of you use 'Rate Correlation Merge'? The rest of you are lying; you just don't realize it. You're using it all the time, and here’s why it matters.
00:27:29.490 Whenever you merge hashes, we know that we replace the left-hand value with the right-hand one if there’s a conflict. In our infinite wisdom, we decided that when merging relations, if there are two equality conditions on the same attribute, we're replacing the one on the left.
00:27:51.550 This behavior can lead to problems because of how merges operate. Merges are everywhere, making it relatively miserable. I've added rage comments to Squeal and tried many different variations of this method.
00:28:19.140 One key offender is the default scope. How many of you have used it? How many of you have regretted using default scope? I want this thing gone in the worst way possible.
00:28:45.740 It introduces implicit behavior; it could easily be made explicit by just reading the scopes you’re chaining. If you read through the ActiveRecord source code, the number of workarounds for default scope will make your head spin.
00:29:08.820 Here’s an example: whenever you execute a query against the database (which happens when you convert your relation to an array), you first check the default scope. If your relation is default scoped, it returns a default scoped version of the relation; if not, it returns itself.
00:29:45.840 That’s why you have this check where you say, 'If the default scope is equal to self, do your thing, otherwise, load the version with the default scope.' It combines queries with the default scope.
00:30:07.850 And from this, you get a bunch of merges. It depends on how many default scopes may exist on a model. Honestly, I’d like to see default scope reduced drastically.
00:30:31.580 Scopes are supposed to narrow your database query. But they don't do that effectively. In Rails 4 and onward, you can actually chain multiple scopes without needing to override anything.
00:31:01.100 However, default scope remains unique, leading to a substantial number of undocumented and unwanted behaviors.
00:31:29.610 Calculations are those things you do with numbers. Let's pretend pagination didn't come with a billion different solutions already. Suppose we want to be able to have a chainable method off of a given relation that tells us the number of pages and the page size needed to display query results.
00:32:07.890 What’s wrong with this code? The problem is that if you use this method, you might get a message back because any relation with group values will return a hash of key-value pairs and counts instead of an integer value.
00:32:35.590 The perform_calculation method does anything from max to min, sum to count. If there are any group values, we execute the group calculation, which might look entirely different and creates new magic.
00:33:06.450 If the thing you're grouping is only one string or symbol and matches a belongs-to association on the counted object, it finds that association and holds it. It creates a lookup dictionary matching their ID to the actual ActiveRecord base object.
00:33:40.640 This means, by the end of it all, you're left with a hash that has an ActiveRecord base object on the left and on the value side, how many of that object you have.
00:34:01.450 In short, I’m not really sure why we don’t have a separate method to achieve this already. If I want a hash, I’ll ask for a hash; don’t do that on my behalf. I want consistent return values from any method I call.
00:34:27.060 Let's talk about callbacks. Trust me, you're gonna be sorry if you use them. Don’t say I didn’t warn you! How many of you have found yourselves in callback hell? About half the room at least. The rest of you haven’t yet, but it’ll happen.
00:35:04.670 In the comments for the callbacks file, it states that those twelve callbacks give you immense power. That sounds inspiring, but let’s get real, that’s marketing fluff.
00:35:39.430 So, let’s take a look at how callbacks are implemented. Hang on to your hats! You’re familiar with 'before_save'—this is a simplified version. We'll just stick with before_save for now.
00:36:02.650 There are three different ways of saying the same thing: we can give it a class, an instance of an object that responds to before_save, a symbol that represents a method name, or a block that can accept the record we're currently handling.
00:36:27.820 You can even give a string, even though it’s been deprecated and doesn’t issue a deprecation warning.
00:37:05.020 The callbacks themselves are part of a class called CallbackChain, stored in a variable labeled '_safe' due to importance.
00:37:20.130 When we compile a callback chain, we start with some strings and reverse the order due to execution requirements. The callbacks will apply to the calls made surrounding them.
00:37:46.720 If we haven't halted, we compute the result and make sure to join everything together, but callbacks are pretty simple to understand.
00:38:14.300 Here’s the neat part: if you look at the actual implementation of callbacks for the most common operations (like save, update, and create), it’s pretty straightforward.
00:38:37.750 The point is that we can use inheritance to redefine 'before_save' in a more understandable way. It's about making sure callbacks compile to something sensible.
00:39:05.210 We can perform checks to halt if needed, making it more robust. I’d like to rephrase the ActiveRecord callbacks documentation to be a reflection of good Ruby practices.
00:39:25.490 You might be tempted to say that ActiveRecord works well enough and that none of this is a problem, but changes will be hard and could lead to issues. Let's not let this legacy nature hinder us. ActiveRecord is what we're leaving for the next generation of Rails developers, and it deserves better.
00:39:51.880 New users deserve better, and I know we can do better. I hope you leave this talk inspired to write documentation that accurately reflects behavior, write tests to prevent future issues, or triage some of the 230+ active issues related to ActiveRecord.
00:40:12.630 If you begin to understand how ActiveRecord works, you’ll be in a better position to help reduce these issues. More importantly, you can engage in discussions to improve this beloved library. I'd like to see that happen starting now. Thanks!
Explore all talks recorded at RailsConf 2013
+93