RailsConf 2018

Up And Down Again: A Migration's Tale

Up And Down Again: A Migration's Tale

by Derek Prior

In the video "Up And Down Again: A Migration's Tale," Derek Prior delves into the intricacies of schema management in Ruby on Rails, particularly focusing on the migration system. The talk encompasses the history of migrations since their introduction in Rails 0.10.1, highlighting how they enable developers to manage database schema alongside code changes. Derek outlines the essential components of migrations, such as the migration DSL (Domain-Specific Language), and explains the process of applying and rolling back migrations.

Key Points Discussed:

  • Migration DSL: An introduction to the syntax and design of migrations in Rails, with a focus on the inheritance model and versioning of migrations, enhancing stability and clarity.
  • Applying and Reverting Migrations: An exploration of how Rails processes migration commands, the use of command records, and the necessity of making migrations reversible.
  • Schema Dumping: Overview of how the schema.rb file is generated, and its role as the authoritative source of the database schema, with notes on its limitations and the ongoing debate regarding reliance on migrations.
  • Common Shortcomings: Discussion of issues such as migration rot due to outdated dependencies and the limited support for certain database features within the Rails migration DSL.
  • Enhancing Migration Functionality: Presentation of solutions like the Scenic gem, which extends Rails migrations to support database views, foreign keys, and more.

Significant Examples:

  • Derek provides examples of how migrations can be structured, emphasizing the inclusion of types in column removals to maintain reversibility. He showcases how active record models can be employed in migrations, which aids in preserving the integrity of code and schema.
  • The importance of the adapter pattern is illustrated, demonstrating how Rails interacts with different databases (such as PostgreSQL and MySQL) to create a uniform experience for developers.

Conclusions and Takeaways:

  • A sound understanding of migrations not only aids in avoiding pitfalls but also helps developers harness the full potential of database interactions within Rails applications.
  • Migrations should be composed to minimize dependencies on external factors, with an emphasis on maintaining consistency across upgrades and changes in schema.
  • While Ruby migrations provide a higher-level abstraction that's beneficial for developers, one must also be open to structured SQL for more advanced functionalities.

Overall, the talk equips the audience with essential knowledge for better schema management practices in Rails, providing tools and approaches to mitigate common challenges faced during database schema evolution.

00:00:10.730 All right, welcome to 'Up And Down Again: A Migration's Tale'. Together, we're going to journey into the inner workings of Rails schema management. My name is Derek Prior.
00:00:16.590 If you have feedback or questions about this talk as it's ongoing, you can tweet me @DerekPryor, email me at [email protected], or afterwards, come up and introduce yourself. I’d love to talk to as many of you as possible.
00:00:29.099 I currently work as the development director at Thoughtbot, where I've been for five years. I want to take a quick moment to thank Thoughtbot for the remarkable opportunity to engage in activities like this and contribute to open source and our local communities.
00:00:40.290 A lot of people use our gems and are familiar with Thoughtbot, but they aren’t quite sure what we actually do. At Thoughtbot, we work with our clients to build great digital products. If you want to learn more about what it means to collaborate with us as a client, or what it means to work with me as a co-worker at any of our locations, please come see me afterwards, or send me an email or tweet.
00:01:02.910 So, we're here today to talk about schema management in Rails. The first version with what we’d recognize as migration support came in Rails 0.10.1, which was released towards the end of 2005. It’s actually the first open-source implementation of any conventional schema management system that allowed developers to progress their database schema alongside the code.
00:01:16.619 That pattern became so immediately useful that it’s hard to imagine a web framework that doesn’t allow for such functionality these days. So, I really wanted to dive in to see what makes it tick. Our agenda for today: First, we're going to look at what makes up the migration DSL. Then, we’ll explore what happens when we apply or revert a migration, how schema Ruby gets generated and what it is used for, and finally, where some of this falls short and how we might overcome those issues.
00:01:55.979 To dive into the anatomy of a migration, here's a common migration to create a table, specifically our posts table, which will have several columns with some restrictions. My intrigue really starts right here on line one with a relatively recent addition to migrations, which I think came in Rails 5. The reason why it’s interesting is that it looks like we’re inheriting from an instance of an array, which I can assure you is not code you’ve ever written anywhere else in Ruby or Rails.
00:02:21.459 To understand this, we need to crack open the migration class itself. We see that it defines the brackets method; so we're not actually indexing into an array, but calling this method with the version we passed to it. That method hands off to compatibility.find, which isn't an ActiveRecord.find call, but rather a call to the compatibility module method find.
00:02:37.660 What that does is take the version you pass in, turn it into a string, do a bit of error handling, and then return a constant. This mechanism is a way to turn a shorthand into a much longer form. The brackets are shorter and may be nicer to type, but I suspect this is primarily for aesthetic reasons, as how many of you are actually typing an entire migration class by hand?
00:03:00.760 What does the class actually look like under the hood? This is it—it’s empty! So we’ve gone a long way already, and we’re essentially inheriting from Migration directly. Why don’t we just inherit from Migration like we used to back in Rails 4 and earlier? The answer is that Rails won’t allow us. The inherited callback defined after we define the v5.2 constant effectively checks the direct superclass and rejects direct inheritance from it, so it's not supported anymore.
00:03:40.980 Why was this change made? It might help to answer that by looking at a slightly different example where, instead of inheriting from v5.2, we inherit from v5.1. If we examine its compatibility class, we can see it allows us to see how the behavior of migrations changes between releases. It encapsulates how a migration behaved up to and including that particular version, allowing us to understand the differences.
00:04:11.200 For instance, it shows that v5.2 changed how Postgres handles change_column, and similar changes can be found for MySQL. If we need to find out how those migrations behaved, we can dive into the code or use Git to trace changes over time. All this work was done in the name of having stable migrations, which is vital because it ensures that the meaning of our migrations doesn’t change when we upgrade Rails versions.
00:05:06.270 It used to be the case that if you had to revert an old migration after updating Rails, or for whatever reason, reapply an old migration, you might end up with a schema that differed from what the migration originally intended. This change was introduced to fix that problem.
00:05:31.230 Next, let’s discuss the change method. We’ll cover the semantics of change a bit later, but let’s dive into the body. The create_table statement is known internally as a schema statement. If you check the documentation for schema statements, you’ll find various commands like create_table, add_column, add_index, and more. You may also encounter some commands that might be unfamiliar, such as change_column_null and change_column_default.
00:06:03.129 These statements pack a considerable amount of meaning, and with each recent version of Rails, additional support has been added for more complex operations. For example, consider a single line adding a non-nullable user ID to the posts table. This statement effectively performs multiple operations: it adds the user ID column, sets it as non-nullable, adds an index for efficient lookups, and establishes a foreign key to ensure referential integrity.
00:06:34.060 In summary, when we run this migration using rake db:migrate, it generates SQL that varies depending on the database system. For instance, while MySQL produces one DDL statement, PostgreSQL may generate several statements, as indexing and foreign key constraints are handled separately. Nevertheless, as Rails developers, we shouldn't need to concern ourselves with these differences; it all happens behind the scenes.
00:07:10.020 The way we generate differing SQL from the same Ruby code is through what is known as the adapter pattern. This pattern allows us to adapt the interface of a class so that clients with incompatible interfaces can still work together. It provides a consistent interface through which elements like the ActiveRecord class can interact with various database systems.
00:08:05.000 We start with an abstract adapter that defines how Rails should interact with various database systems, which is implemented by the specific adapters for each database such as PostgreSQL and MySQL. This pattern is prevalent throughout Rails, from Action Cable to Active Job, and is crucial in any application that interacts with external dependencies.
00:09:03.240 Each connection adapter has various responsibilities, such as reflecting on the schema to determine tables, columns, and indices. For instance, when we create a string column in MySQL, it automatically assumes a default length, whereas in PostgreSQL, it recognizes it as character varying. These adapters ensure that the specific characteristics of each database system are translated appropriately into the Ruby code.
00:10:14.970 Now, how does Rails know which migrations to run during rake db:migrate? It all begins with the file name generated when creating a migration. The integer value on the left is a version, and the part on the right is the name you provide for your migration class. Rails relies on this versioning system to track which migrations have already been applied.
00:10:46.440 When running rake db:migrate, Rails selects all versions from the schema_migrations table, which it creates on demand when migrations are first run. It then compares this list with the available migration files, and any that exist on the file system but not in the migrations table are the ones that need to be executed.
00:12:20.580 After successfully applying a migration, Rails records this by inserting an entry into the schema_migrations table to prevent that migration from being run again. You can use the rails db:migrate:status command to view the status of your migrations, which reveals the cross-reference between the file system and the schema_migrations table.
00:13:38.670 If there's a status of 'down', it indicates that those migrations will run the next time you execute rake db:migrate. This output can reveal interesting results if performed on a production instance, especially if a migration exists in the system but not in the codebase.
00:14:41.700 What about rolling back migrations? If a migration is defined with separate up and down methods, the rollback process resembles what we’ve discussed previously. If the migration uses the change method, the idea is Rails should be able to infer the reverse operation from the change itself.
00:15:45.840 However, the reality is that there’s no such thing as a free lunch. What's actually paying for this ability is the command recorder. The command recorder captures all the commands that a migration runs and allows Rails to invert these operations during a rollback.
00:17:04.650 Each schema statement we want to invert necessitates a corresponding invert_ method that defines what that operation’s rollback would be. For instance, dropping a column can only be reversed if the type of that column is specified during its creation. Therefore, if we remove a column without specifying its type, we run the risk of creating an irreversible migration error.
00:18:47.790 After rolling back a migration, Rails will run the commands recorded in reverse order. However, if a column you want to restore had any constraints or default values originally set, you need to explicitly define these when writing the migration.
00:19:50.310 Let’s talk briefly about schema.rb. This file looks something like this and represents the entire current state of your database. You can load it into a fresh database using rake db:schema:load, which runs the contents as if they were a migration.
00:20:10.170 It serves as a reference for existing columns when writing queries, helping to determine if indexes are defined correctly.
00:20:30.180 The active_record/schema_dumper is responsible for creating schema.rb. This process is initiated by calling the dump method, which is broken into four main parts that reflect in the generated schema.
00:21:21.390 This includes collecting the header, defining ActiveRecord extensions, fetching the list of tables and columns, as well as foreign keys. The goal is to write executable Ruby code that can replicate the schema consistently.
00:22:13.230 The schema file also contains a comment block at the top claiming to be the authoritative source for your database schema. However, the actual database should be regarded as the authoritative source.
00:23:05.970 It's worth noting that running all migrations from scratch can be flawed and inefficient. This emphasizes the importance of using ‘rake db:schema:load’ for new instances to ensure accuracy without the overhead of all previous migrations.
00:24:06.640 Let's address the idea of migration rot. This occurs when the meaning of schema statements can change over time, or when migrations depend on the current state of your application’s code.
00:25:08.520 Another issue arises when migrations become overly reliant on external dependencies, leading to difficulties if those dependencies change. To mitigate migration rot, it’s prudent to minimize dependencies outside the state of the database when defining migrations.
00:26:38.150 There are known limitations to the migration DSL. For example, Rails added support for foreign keys as late as version 4.2, which is surprising given that foreign keys have been a database staple for decades.
00:27:39.420 Similarly, features like expression indexes and database functions are still not natively supported. While you can execute any SQL you require, those aspects won’t appear in your schema.rb file, creating a disconnect between code and database state.
00:28:50.650 That said, there have been gems like Foreigner and SchemaPlus that sought to bridge these gaps. Recently, I collaborated with a friend, Caleb Thompson, on a gem called Scenic, which adds support for database views in a Rails-friendly manner. This includes essential components like schema statements, a command recorder for down migrations, and modifications to the schema dumper.
00:30:09.480 Implementing support for views put me in a unique position to tackle some of the challenges Rails faces in extending schema functionalities. However, this journey has led me to question whether we should rely on Ruby migrations when databases could inherently manage such operations far better with structured SQL, and if there’s potential for Ruby migrations to evolve into something more comprehensive.
00:32:16.950 In preparing for this talk, I’ve found much value in Ruby migrations, with their promise of abstractions making tasks easier. And though rails developers may need to thoroughly understand SQL, they shouldn’t have to deal with the complexity of DDL SQL.
00:34:11.520 Should you find yourself writing migrations and forgetting critical details, using Ruby's migrations alongside robust schema management can pave the way for a smoother development process. Thank you for your time today! If you have questions or feedback, I'm eager to chat. You can also check out my podcast, which I host every week where I discuss similar technology topics. If you're interested in learning more about Thoughtbot, let me know. Thanks again!