00:00:10.730
All right, welcome to 'Up And Down Again: A Migration's Tale'. Together, we're going to journey into the inner workings of Rails schema management. My name is Derek Prior.
00:00:16.590
If you have feedback or questions about this talk as it's ongoing, you can tweet me @DerekPryor, email me at [email protected], or afterwards, come up and introduce yourself. I’d love to talk to as many of you as possible.
00:00:29.099
I currently work as the development director at Thoughtbot, where I've been for five years. I want to take a quick moment to thank Thoughtbot for the remarkable opportunity to engage in activities like this and contribute to open source and our local communities.
00:00:40.290
A lot of people use our gems and are familiar with Thoughtbot, but they aren’t quite sure what we actually do. At Thoughtbot, we work with our clients to build great digital products. If you want to learn more about what it means to collaborate with us as a client, or what it means to work with me as a co-worker at any of our locations, please come see me afterwards, or send me an email or tweet.
00:01:02.910
So, we're here today to talk about schema management in Rails. The first version with what we’d recognize as migration support came in Rails 0.10.1, which was released towards the end of 2005. It’s actually the first open-source implementation of any conventional schema management system that allowed developers to progress their database schema alongside the code.
00:01:16.619
That pattern became so immediately useful that it’s hard to imagine a web framework that doesn’t allow for such functionality these days. So, I really wanted to dive in to see what makes it tick. Our agenda for today: First, we're going to look at what makes up the migration DSL. Then, we’ll explore what happens when we apply or revert a migration, how schema Ruby gets generated and what it is used for, and finally, where some of this falls short and how we might overcome those issues.
00:01:55.979
To dive into the anatomy of a migration, here's a common migration to create a table, specifically our posts table, which will have several columns with some restrictions. My intrigue really starts right here on line one with a relatively recent addition to migrations, which I think came in Rails 5. The reason why it’s interesting is that it looks like we’re inheriting from an instance of an array, which I can assure you is not code you’ve ever written anywhere else in Ruby or Rails.
00:02:21.459
To understand this, we need to crack open the migration class itself. We see that it defines the brackets method; so we're not actually indexing into an array, but calling this method with the version we passed to it. That method hands off to compatibility.find, which isn't an ActiveRecord.find call, but rather a call to the compatibility module method find.
00:02:37.660
What that does is take the version you pass in, turn it into a string, do a bit of error handling, and then return a constant. This mechanism is a way to turn a shorthand into a much longer form. The brackets are shorter and may be nicer to type, but I suspect this is primarily for aesthetic reasons, as how many of you are actually typing an entire migration class by hand?
00:03:00.760
What does the class actually look like under the hood? This is it—it’s empty! So we’ve gone a long way already, and we’re essentially inheriting from Migration directly. Why don’t we just inherit from Migration like we used to back in Rails 4 and earlier? The answer is that Rails won’t allow us. The inherited callback defined after we define the v5.2 constant effectively checks the direct superclass and rejects direct inheritance from it, so it's not supported anymore.
00:03:40.980
Why was this change made? It might help to answer that by looking at a slightly different example where, instead of inheriting from v5.2, we inherit from v5.1. If we examine its compatibility class, we can see it allows us to see how the behavior of migrations changes between releases. It encapsulates how a migration behaved up to and including that particular version, allowing us to understand the differences.
00:04:11.200
For instance, it shows that v5.2 changed how Postgres handles change_column, and similar changes can be found for MySQL. If we need to find out how those migrations behaved, we can dive into the code or use Git to trace changes over time. All this work was done in the name of having stable migrations, which is vital because it ensures that the meaning of our migrations doesn’t change when we upgrade Rails versions.
00:05:06.270
It used to be the case that if you had to revert an old migration after updating Rails, or for whatever reason, reapply an old migration, you might end up with a schema that differed from what the migration originally intended. This change was introduced to fix that problem.
00:05:31.230
Next, let’s discuss the change method. We’ll cover the semantics of change a bit later, but let’s dive into the body. The create_table statement is known internally as a schema statement. If you check the documentation for schema statements, you’ll find various commands like create_table, add_column, add_index, and more. You may also encounter some commands that might be unfamiliar, such as change_column_null and change_column_default.
00:06:03.129
These statements pack a considerable amount of meaning, and with each recent version of Rails, additional support has been added for more complex operations. For example, consider a single line adding a non-nullable user ID to the posts table. This statement effectively performs multiple operations: it adds the user ID column, sets it as non-nullable, adds an index for efficient lookups, and establishes a foreign key to ensure referential integrity.
00:06:34.060
In summary, when we run this migration using rake db:migrate, it generates SQL that varies depending on the database system. For instance, while MySQL produces one DDL statement, PostgreSQL may generate several statements, as indexing and foreign key constraints are handled separately. Nevertheless, as Rails developers, we shouldn't need to concern ourselves with these differences; it all happens behind the scenes.
00:07:10.020
The way we generate differing SQL from the same Ruby code is through what is known as the adapter pattern. This pattern allows us to adapt the interface of a class so that clients with incompatible interfaces can still work together. It provides a consistent interface through which elements like the ActiveRecord class can interact with various database systems.
00:08:05.000
We start with an abstract adapter that defines how Rails should interact with various database systems, which is implemented by the specific adapters for each database such as PostgreSQL and MySQL. This pattern is prevalent throughout Rails, from Action Cable to Active Job, and is crucial in any application that interacts with external dependencies.
00:09:03.240
Each connection adapter has various responsibilities, such as reflecting on the schema to determine tables, columns, and indices. For instance, when we create a string column in MySQL, it automatically assumes a default length, whereas in PostgreSQL, it recognizes it as character varying. These adapters ensure that the specific characteristics of each database system are translated appropriately into the Ruby code.
00:10:14.970
Now, how does Rails know which migrations to run during rake db:migrate? It all begins with the file name generated when creating a migration. The integer value on the left is a version, and the part on the right is the name you provide for your migration class. Rails relies on this versioning system to track which migrations have already been applied.
00:10:46.440
When running rake db:migrate, Rails selects all versions from the schema_migrations table, which it creates on demand when migrations are first run. It then compares this list with the available migration files, and any that exist on the file system but not in the migrations table are the ones that need to be executed.
00:12:20.580
After successfully applying a migration, Rails records this by inserting an entry into the schema_migrations table to prevent that migration from being run again. You can use the rails db:migrate:status command to view the status of your migrations, which reveals the cross-reference between the file system and the schema_migrations table.
00:13:38.670
If there's a status of 'down', it indicates that those migrations will run the next time you execute rake db:migrate. This output can reveal interesting results if performed on a production instance, especially if a migration exists in the system but not in the codebase.
00:14:41.700
What about rolling back migrations? If a migration is defined with separate up and down methods, the rollback process resembles what we’ve discussed previously. If the migration uses the change method, the idea is Rails should be able to infer the reverse operation from the change itself.
00:15:45.840
However, the reality is that there’s no such thing as a free lunch. What's actually paying for this ability is the command recorder. The command recorder captures all the commands that a migration runs and allows Rails to invert these operations during a rollback.
00:17:04.650
Each schema statement we want to invert necessitates a corresponding invert_ method that defines what that operation’s rollback would be. For instance, dropping a column can only be reversed if the type of that column is specified during its creation. Therefore, if we remove a column without specifying its type, we run the risk of creating an irreversible migration error.
00:18:47.790
After rolling back a migration, Rails will run the commands recorded in reverse order. However, if a column you want to restore had any constraints or default values originally set, you need to explicitly define these when writing the migration.
00:19:50.310
Let’s talk briefly about schema.rb. This file looks something like this and represents the entire current state of your database. You can load it into a fresh database using rake db:schema:load, which runs the contents as if they were a migration.
00:20:10.170
It serves as a reference for existing columns when writing queries, helping to determine if indexes are defined correctly.
00:20:30.180
The active_record/schema_dumper is responsible for creating schema.rb. This process is initiated by calling the dump method, which is broken into four main parts that reflect in the generated schema.
00:21:21.390
This includes collecting the header, defining ActiveRecord extensions, fetching the list of tables and columns, as well as foreign keys. The goal is to write executable Ruby code that can replicate the schema consistently.
00:22:13.230
The schema file also contains a comment block at the top claiming to be the authoritative source for your database schema. However, the actual database should be regarded as the authoritative source.
00:23:05.970
It's worth noting that running all migrations from scratch can be flawed and inefficient. This emphasizes the importance of using ‘rake db:schema:load’ for new instances to ensure accuracy without the overhead of all previous migrations.
00:24:06.640
Let's address the idea of migration rot. This occurs when the meaning of schema statements can change over time, or when migrations depend on the current state of your application’s code.
00:25:08.520
Another issue arises when migrations become overly reliant on external dependencies, leading to difficulties if those dependencies change. To mitigate migration rot, it’s prudent to minimize dependencies outside the state of the database when defining migrations.
00:26:38.150
There are known limitations to the migration DSL. For example, Rails added support for foreign keys as late as version 4.2, which is surprising given that foreign keys have been a database staple for decades.
00:27:39.420
Similarly, features like expression indexes and database functions are still not natively supported. While you can execute any SQL you require, those aspects won’t appear in your schema.rb file, creating a disconnect between code and database state.
00:28:50.650
That said, there have been gems like Foreigner and SchemaPlus that sought to bridge these gaps. Recently, I collaborated with a friend, Caleb Thompson, on a gem called Scenic, which adds support for database views in a Rails-friendly manner. This includes essential components like schema statements, a command recorder for down migrations, and modifications to the schema dumper.
00:30:09.480
Implementing support for views put me in a unique position to tackle some of the challenges Rails faces in extending schema functionalities. However, this journey has led me to question whether we should rely on Ruby migrations when databases could inherently manage such operations far better with structured SQL, and if there’s potential for Ruby migrations to evolve into something more comprehensive.
00:32:16.950
In preparing for this talk, I’ve found much value in Ruby migrations, with their promise of abstractions making tasks easier. And though rails developers may need to thoroughly understand SQL, they shouldn’t have to deal with the complexity of DDL SQL.
00:34:11.520
Should you find yourself writing migrations and forgetting critical details, using Ruby's migrations alongside robust schema management can pave the way for a smoother development process. Thank you for your time today! If you have questions or feedback, I'm eager to chat. You can also check out my podcast, which I host every week where I discuss similar technology topics. If you're interested in learning more about Thoughtbot, let me know. Thanks again!