Scalability

Summarized using AI

Keynote: Technically, a Talk

Eileen M. Uchitelle • April 24, 2020 • Couch Edition (online)

In the RailsConf 2020 keynote titled 'Technically, a Talk,' Eileen Uchitelle, a Staff Software Engineer at GitHub, delves into the intricacies of managing multiple databases within the Ruby on Rails framework. She begins by humorously addressing expectations for a highly technical presentation and transitions into a detailed exploration of multiple database support in Rails.

Key points discussed in the talk include:

- Complexity of Rails’ Database Management: Despite critiques of its complexity, this foundation allows applications to maintain simplicity while leveraging advanced features.

- Development Journey: Eileen highlights the extensive effort put into implementing multiple database functionality, noting over a hundred pull requests leading to the necessary improvements required for database stability.

- Connection Management: The talk explains the various methods for managing connections across multiple databases, including horizontal sharding and functional partitioning, alongside the role of read replicas to alleviate database load.

- Demonstration of Usage: Eileen provides a practical demo using a recipes application, where she sets up multiple databases, illustrates connection configuration, and shows how to perform migrations and switch connections effectively.

- Architectural Enhancements: Significant changes to internal APIs and database configuration management were introduced, transitioning from hash-based configurations to more structured database configuration objects to enhance readability and usability in complex setups.

- Focus on User Experience: The development process prioritized creating a smooth user experience, where public API constructs were built before internal changes were made, emphasizing the importance of real-world application needs.

- Community and Collaboration: Eileen reflects on the collaborative nature of Rails development, expressing gratitude to contributors and the community for their support in driving these advancements.

In conclusion, Eileen's talk serves as both a technical guide and a heartfelt tribute to the Rails community, underscoring the importance of continuous improvement in the framework to support growing applications. The key takeaway is the acknowledgment that the framework is designed not only for performance but also for programmer happiness, emphasizing the belief that everyone contributes to the success of the Rails ecosystem.

Keynote: Technically, a Talk
Eileen M. Uchitelle • April 24, 2020 • Couch Edition (online)

Keynote: Technically, a Talk by Eileen M. Uchitelle

"Peer deep into Rails’ database handling and you may find the code overly complex, hard to follow, and full of technical debt. On the surface you’re right - it is complex, but that complexity represents the strong foundation that keeps your applications simple and focused on your product code. In this talk we’ll look at how to use multiple databases, the beauty (and horror) of Rails connection management, and why we built this feature for you.
"
__________

Eileen M. Uchitelle is a Staff Software Engineer on the Ruby Architecture Team at GitHub and a member of the Rails Core team. She's an avid contributor to open source focusing on the Ruby on Rails framework and its dependencies. Eileen is passionate about scalability, performance, and making open source communities more sustainable and welcoming.

RailsConf 2020 CE

00:00:08.880 Well, hello there, everyone! I'm Eileen Uchitelle, and I'm a Staff Engineer on GitHub's Ruby Architecture Team. Our team works to improve code stability and resilience by looking for ways to improve the Ruby language and Rails framework to support GitHub for the long term. I'm also a member of the Rails Core team, responsible for deciding the future of the Rails framework, including when new releases should come out and what features we want to build and support. If you want to get in touch with me, you can find me online at the handle 'Eileen Codes.' Welcome to my online RailsConf keynote called "Technically, a Talk." This title is a little bit of a joke. When I was first asked to keynote at RailsConf, I heard that Evan Phoenix wanted a really technical keynote. He asked for a talk that was so technical that people would want to leave the room. Well, jokes on him; there’s no room for you to leave now!
00:00:31.899 Today, we’re going to talk about multiple databases. We're going to deep dive into the design and architecture of the APIs and how connection management works internally. When I first started working on this talk, I mapped out every single pull request and commit that I was involved in that implemented multiple database support. I was impressed to find that, over time, we had amassed over a hundred pull requests that added the functionality that Rails needed to support database stability and growth. We worked to improve migrations, improve database configurations, and build and enhance Rails tasks. We created easy-to-use APIs for establishing connections and developed an auto-switching API. As I look at this timeline, I don’t just see years of work; instead, I see how much I care about this framework and this community.
00:01:19.660 While this is technically a talk, it’s also so much more than that. This talk, as well as this timeline, is my love letter to Rails and to this community. Two and a half years ago, I started working on bringing multiple database support to Rails. There were a few third-party gems that provided solutions, and at GitHub, we had developed our own internal code for handling multiple databases. From my experience working at GitHub, I knew that supporting multiple databases in Rails wasn’t just a nice-to-have; it was a requirement for Rails' continued success as a modern web framework. We must continue to improve the foundation that Rails provides for applications.
00:01:56.979 Companies that were built on Rails mature and grow on Rails as well. GitHub is one of Rails' many success stories. We weren't successful in spite of Rails; we thrived because of it and because of the changes we made to continue using it. There were many good reasons for us to upstream our database code to Rails; it meant that we could delete hundreds of thousands of lines of code from GitHub. This would make our code less complex, more modern, and more resilient. More than that, it meant that we were finally sharing what we had built for Rails with the rest of the community. Today, I’d like to share this work with you.
00:02:34.600 First, we'll talk about what multiple databases are, then we’ll get into how to use them in your application, and then we'll deep dive into how the architecture and design works. Let’s get started! When I talk about multiple databases, I’m referring to the ability for your Rails application to connect to, write to, and select from more than one database in a single environment. In a standard new Rails application, there is one database per environment. The database for the current environment traditionally holds all the tables your application needs. Often, as an application grows, the database will become too large to effectively handle the data and traffic it needs to support.
00:03:10.390 When that happens, there are many ways to alleviate the pressure on your database. I'm not a database administrator, so I'm not going to discuss all the ways to partition your data. Instead, in this talk, I’m going to focus on the three ways that we now support in Rails. One way to manage your data is through horizontal sharding, also known as horizontal partitioning or simply sharding. Horizontal sharding involves taking your database and splitting the tables up by row. With horizontal sharding, each of the partitions of your database contains the same schema but different data. This approach helps limit the number of rows in each shard or partition, which reduces the number of rows that need to be indexed and selected from.
00:03:57.340 Another way of splitting your data to alleviate pressure on your database is through functional partitioning. In this method, tables are split by function or need rather than by row. Functional partitioning means splitting whole tables into separate servers rather than a set of rows. Each partition has a different schema, which focuses on moving high-volume write tables out of your main data cluster into their own clusters to reduce the load on the primary database. Both of these types of database partitioning can also support read replicas—readable copies of the primary writer database—to reduce pressure on the write database for select queries. The goal is to send as many queries as possible to the replicas. However, when a write happens, there’s often a slight delay called replication lag before the inserted data is copied to the replicas. In a well-functioning system, the replication lag is usually less than a second.
00:05:15.070 Now that we have an understanding of what multiple databases are, we can take a look at how to use them in our applications. In this demo, we have a recipes application that has one primary database containing tables such as users and a second database named meals that contains recipes. The meals database has three shards: the default shard containing the shard keys to look up data from the other two shards, and the two shards that contain our data. Both the primary database and meal shards have replica databases. Now that we know the structure of our databases, let’s look at how to set them up in our application.
00:05:45.820 First, we’ll take a look at our database UML. Here we see that our database schema has multiple entries under the development key. We have primary, primary replica, meals default, meals default replica, meals one shard, meals one shard replica, meals two shard, and meals two shard replica. This matches the database diagram we just looked at. Now that we have a database team to set up, let's create some databases. When we run DB create, we can see that Rails creates the databases for all of the writer entries under development and test. Since replicas use the same name as the writers, Rails won’t create them.
00:06:14.440 Next, we need to set up our connections. First, we open app/models/application_record.rb, and in this file, we’re going to add connects to for our writing and reading roles. 'Primary' is for writing and 'primary replica' is for reading. Pretty easy, right? Next, we need to create a new file for our meals database. You can call it whatever you want, but I like to keep the application record naming conventions, so we're going to call it meal_application_record.rb. New connection classes need to be in an abstract class so Rails knows that it’s not an Active Record model. If you’re working with shards, use the shards keyword argument instead of database. Each shard’s writing and reading configurations will then be specified.
00:07:01.660 Now let's create some models. First, we’ll create a scaffold for users for our primary database. This is the same process as creating a scaffold for any Rails application. Now we can see that it created the migration in the db/migrate directory. We’re only going to run migrations for this primary database by running db:migrate:primary. This is basically the same thing you would do for any application, except that we’re using the name tasks here because we want to specifically avoid running migrations for the meals database. Now, we’re going to create a scaffold for our meals database called dinner_recipe. The dinner_recipe has a title and a recipe, but you can also see that we’re going to pass a database argument, which informs the generator that we want to create migrations in a different directory. We need to choose one of our meals configurations as an argument, but it doesn’t matter which one.
00:07:56.689 We can see that Rails created the migration in the db/meals/migrate directory. Before we run the migration, we need to update our model to use the meal_application_record model so that all queries use the correct connection. Lastly, we're going to run rails db:migrate, which will migrate all three of the databases at once. So we’ve run our migration for dinner recipes three times for each shard. Now that all of our models and data are set up, let’s look at how to switch connections in our application. Rails ships with a method called connected_to, which is used to switch the current connection context. By default, Rails is set to the writing connection, and the default pool key is also set to default. We can change the connection context by passing the role 'reading' to connected_to.
00:08:47.049 This method will switch our role to reading. Next, we’ll do the same thing with the shard by passing the shard key. We will output the current role to see that it has switched to reading, and then we will output the current pool key to ensure it has also switched. This can be used together or separately, depending on what you need for your application. In addition to the ability to manually switch connections, Rails also has an auto-switcher that will automatically switch connections for you. All we have to do is put three lines in our application configuration. Now we're going to start the server. I have added logging to show us when we are on writing or reading roles so we can see that we’re on the reading role because of GET requests. We’re going to create a user and we’re on the writing role; we are still on the writing role. Now we are back to reading. What’s happening here is that after we performed that write, Rails will keep us on the writing connection for the delay that was set in our application configuration.
00:09:44.180 And that’s it; it’s pretty simple, right? That’s how to use multiple databases in your application. Now that we've seen how to use these in your app, let’s take a look at the design and architecture of the public APIs. I gave hope we had written our own code for handling multiple databases, which monkey patched and extended Active Record. Once I had a handle on what was broken in Rails and needed to be fixed, I built APIs in Rails inspired by our code at GitHub. I changed the APIs a fair bit to better match Rails style, but effectively, the internal code remained the same. This approach to building functionality is unique because I built the public-facing code first and fixed the internals second.
00:10:04.730 This may sound backwards, but it allowed me to architect changes based on real-world applications rather than a perfect but unnecessary use case. In Rails, we always focus on building easy-to-use, simple, and functional APIs that are 100% centered on user experience. Our goal is to optimize for your happiness. Rails internals take on complexity so that you can focus on building your application. One of the first areas we improved regarding the user experience from multiple databases was in the migration code. As you saw earlier in our demo, we can use the Rails generators with a database argument. That argument looks up the migration paths set in your database configurations for sharding. We always want to use one directory and migrate that multiple times so that you don't have to copy migrations after generating them.
00:10:46.510 We’re not going to deep dive into any internal changes for migrations that I made, as I discussed it two years ago at RailsConf 2018. The migration internals haven't changed since that talk, so if you’re interested in that work, you can watch it on Confreaks or YouTube. I would rather spend this time discussing some of the other fun features we have developed since 2018. One of our biggest changes to Rails to support multiple databases was to the database configurations. One of the first significant architectural decisions was to turn the internal database configuration hashes generated from your database configuration into objects. Hashes have served us well in applications using a two-level configuration: Rails would parse the database YAML and turn it into a two-level hash.
00:11:44.660 We could easily find the right configuration to use because it always corresponded to the current environment. If we were in development, there was only one configuration that could possibly be selected for development. Because of this, many of the internals of Rails assumed the configuration to connect could always be found by looking up the hash associated with the environment key. When Rails introduced a three-level configuration to support multiple databases, it increased the complexity of the generated hash and broke many of Rails' internal assumptions. When an app boots and it has to connect to a database, Rails includes a rel tie that establishes a connection to the configuration for that environment. However, if we have two or more configurations per environment, how does Rails know which one to connect to? By default, we can't just assume non-replicas, because in a multiple database application, there can be more than one writing database per environment.
00:12:35.300 In addition, how do we know we have a three-level configuration and how do we know to use that configuration to create the name tasks for each of the database entries? Constantly parsing hashes to figure out where we are in the stack is possible but error-prone and brittle, introducing unnecessary complexity. We wondered, what if we could parse the hash just once and know everything we needed about all of the configurations after the app was booted? We decided to introduce a new database configuration object. The idea was that we could parse the configurations once, turn every hash into an object and be able to more easily select and enumerate all those objects as they were created.
00:13:20.150 Once the hashes became objects, we knew everything we needed to know about the configurations, such as what environment they belong to, what their names are, and what their connection requirements are. We don't need to keep re-parsing the hash to keep track of where we are in the configuration levels. The database configuration objects look like this: the hash configuration object has environment identifiers, name, and a hash configuration that corresponds to the first, second, and third levels of the YAML, respectively. If you're using a URL configuration, Rails will generate a URL config instead of a hash config. It’s super similar to the hash configuration with all the same keys, except for the URL config that includes an instance variable for the original URL.
00:14:11.030 Database configuration objects are generated by Rails when the application is booted in an Active Record initializer. This configurations method instantiates a new database configurations object, which loops through all the hashes and turns them into objects. We aren’t going to deep dive into the build configs method here because, frankly, it’s not that interesting. I deleted ten slides showing how this works because they were boring. Maybe I’ll write a blog post about it if you find it interesting. The result returned to Active Record configurations is an object containing an array of configurations. These objects make it easier for us to iterate, enumerate, and select the correct configurations from the list without parsing a three-level hash multiple times.
00:14:58.930 Now that we have the configuration objects, we can ask Active Record about your favorite configurations using the configs_for method. This is similar to using a hash accessor and will return a list of configuration objects for the production environment. We can also specify configurations for an environment and a name. For instance, 'production' and 'meals_one' will return a matching configuration object for those parameters. By default, replicas are excluded from the configurations returned by configs_for, since in general you just don’t need to operate on them. Their database names should be the same as the writer, so we generally ignore them for the rake tasks. If you need to get replicas from the configurations list for any reason, you can pass include_replicas with true, and the replica configurations will be returned along with the primary ones.
00:15:51.520 These database configuration objects make it easier for Rails internals to interact with the database configurations and also provide a cleaner interface for applications to interact with them as needed. Once we had the objects working correctly for creating the rake tasks and configurations, we started working on the internal API. We wanted to ensure that all the internal methods in Rails also use the configuration objects. Since Rails was still using the configuration hash in the connection pool and other places internally, we needed to identify every place we were accessing values in the hash and add reader methods for those. This was required because we didn’t want to change the hashes to configurations and have to access the hash directly.
00:16:40.740 Instead, we wanted to be able to get all the information Rails needed from the configurations on the objects themselves. In short, we needed to be able to ask DB config directly for its database rather than reaching into the hash. To accomplish this, we looked at all the places we were using the configuration hash and figured out which keys were values that Rails needed to access. We ensured that Rails should never have to reach into the hash stored on the database config for values; it should always be able to ask the object directly for the data it needs. For each value that Rails needed, we added reader methods to the database config objects. This allowed us to eliminate all of the hash accesses in Active Record and turn them into real accessors.
00:17:30.110 Here’s an example of an internal change we made in Rails: by moving to use objects everywhere internally, we ended up with cleaner code and a single place to set defaults. The database configuration objects have been really useful for improving the internal code in Rails and were instrumental in some important features like generating the Rails tasks. Active Record’s database tasks are crucial for having a good development and testing experience. Without them, you would need to run a lot of manual SQL queries just to set up your databases. Even if all of Rails' internals supported multiple databases, the feature would have been a failure if we didn’t support rake tasks. Rake tasks make using Rails applications enjoyable because they take care of all the hard setup work for you.
00:18:19.550 The database objects were instrumental in creating the Rails tasks because they allowed us to enumerate, find, and select the database configurations needed to generate the tasks. The first thing we did to support multiple databases in Rails tasks was to update the existing tasks like create, drop, and migrate to operate on all the configured databases for an environment. To do this, we updated the tasks to use the database configuration objects. For example, in the migrate task, we first look up all the configurations for the current environment and return an array of the database config objects. Since configs_for doesn’t return replicas by default, we know that all the DB config objects here are for primary databases that need to be migrated.
00:19:06.520 If we return replicas, they would raise errors, because replicas usually have read-only users and are not able to run any create or migrate tasks. We then establish a new connection using the corresponding database config object. Since Rails tasks don’t boot the application and don’t know what models to use to run the migrations, we need to establish a connection so we’re connected to the correct database during these tasks.
00:19:55.470 Once connected, Rails calls migrate. Lastly, we establish a connection to the original configuration to ensure Active Record Base isn't connected to the wrong database when migrations are completed. In addition to supporting all databases with the original database tasks, we added individual tasks for each database name. Creating these named tasks turned out to be a lot more complex than ensuring the default tasks operated on all databases.
00:20:19.210 First, you'll notice that we call a method named 'for_each' before loading the configuration. The configuration objects are generated when load_config is called, but we need to access the names before we load the environment. This is because database configurations allow applications to add environment-specific configurations into the YAML file. If we were to load the database YAML before the environment and parse it, as Rails does, we’d get an exception. If we loaded the config to generate the tasks, Rails task would take a dramatic performance hit.
00:20:40.970 This creates a chicken-and-egg situation. We needed a way to parse the YAML without evaluating any ERB. After a lot of painful PRs, we ended up creating a special YAML loader in the Railties specifically for the rake tasks. This loader parses the ERB with a dummy ERB class, which serves only to replace any ERB in the YAML with an empty string, providing us database configurations that are not suitable for connecting to a database but giving us just enough information to have the environment and configuration name needed to generate Rails tasks. Once we call load_config, we can look up the actual configuration objects because the environment will be loaded. We use the dummy ones to create the names, then load the config and retrieve the real ones. This approach avoids a performance hit while generating the task names, yet gives us everything we need for multiple databases.
00:21:58.690 The rest of this task is similar to the main migrate task, so we won’t look at it closely. While the majority of Rails tasks are supported, there are a few that we don’t support. DB migrate_up, DB migrate_down, and DB rollback tasks cannot support multiple databases because operating on all targeted databases within an environment doesn’t make sense for these tasks. If you say to roll back two steps, should all the databases roll back, or just the one that was most recently migrated? We had a hard time answering those questions, so instead of implementing a potentially confusing solution, we decided to not implement these tasks at all.
00:22:40.350 If you call these tasks in particular in a multiple database application, they will raise an error that recommends using the named tasks instead. This gives your application the functionality it needs without implementing a confusing solution to the main tasks. One of my favorite features of multiple databases is the connection APIs. These simple APIs for establishing connections and swapping connection contexts are what powers multiple databases. The connects_to API enables you to establish more than one connection on your models. This is very similar to establish_connection, except that you need to specify the type of connection and the database configuration at the same time for all of your connections.
00:23:27.520 In a single database application, you don't need to call establish_connection in your application code because Rails does it for you in an Active Record Railtie when the app is booted. In an application with multiple databases, we need to tell Rails that there are more connections that need to be established. Depending on the types of connections we need, there are two ways of establishing connections with connects_to. If you’re using functional partitioning with a single primary and a single replica, you should pass a hash to connects_to. The keys in this hash represent the role names you want your connection to have. By default, Rails expects you to set your primaries to writing and your replicas to reading. There are ways to change this, but it’s not recommended.
00:24:26.850 The second part of the hash represents the database configuration names you want to use. These must match the second level of your database configuration exactly to set up the connections properly. Connects_to should only ever be called on abstract classes. Database clients have a limit to the number of open connections you can have, so you want to ensure you’re establishing each connection only once. Always set up your database connections in an abstract class and inherit models that need to connect to that database from the parent class. You can also connect to multiple shards and roles in a model with the shards keyword argument. Default shard one and shard two represent the shard keys; you must name one of the shards 'default,' otherwise Rails will automatically create one, and you may end up with duplicate connections.
00:25:21.480 All the other shards can have any name you choose. The default shard should be the shard that contains the references to the other shards so that you know which shard to look up your data from. Just like the database argument, the hash for each shard represents the role and database configuration name, respectively. Once you have an active connection, your application’s models will need to switch between roles and shards, which we provide that functionality via the connected_to API. This API enables you to switch between shards or roles in a model, controller, scripts, or wherever you need to change the connection context. Let’s look at how we can use the connected_to API to select data from your replicas.
00:26:39.520 This method will swap the reading role so that you can select data from the reading connection pools. The connection itself is chosen by the model that we're loading, such as user or dinner_recipe. We know that user inherits from application_record, a standard Active Record, while dinner_recipe inherits from meal_application_record, which will use that to select the connection from the pool. The connected_to API is used to switch the context in which you’re finding, deciding from which pool you are selecting data. If you attempt to write data to the reading role, Rails will raise a read-only error. This protection is built in regardless of whether your replicas have read-only users; it’s simply a safeguard we provide.
00:27:38.230 Additionally, connected_to can take a prevent_writes argument, which can raise a read-only error on any role for you. This might be useful if your app is just getting started with multiple databases and you want to ensure you're not writing when you mean to read before you’ve set up your data replicas in production. This is also useful if you’re using a role name that is something other than 'reading' and want to ensure writes are blocked. Switching shards with connected_to is as simple as passing the shard key with a shard name that you want to switch to. The shard key behaves similarly to the role key by looking up connections from the pool based on the context set in connected_to. If you only pass a shard with no role, Rails assumes you intend to use the writing connection.
00:28:28.480 Switching to the reading connection is just as easy as calling 'role' and 'shard' together with connected_to. If you switch to shard two and attempt to load a model without a connection established for shard two, you will receive a 'connection not established' error. This is because the user does not have a connection associated with shard two. If we establish that connection in application_record with connects_to, we would be able to find that one in the pool. Let’s take a look at how these connection contexts work internally in Rails. In a single database application, we have a single connection handler. By default, Rails will always have a writing role; the writing handler looks for connections by class name.
00:29:24.680 In Rails 6 and below, the class name lookup for Active Record::Base was set to primary, but in version 6, we changed it to match the class name since all of the other connections are looked up by class name, and we wanted consistency. The class name is mapped to an object called pool_config. The pool_config holds a reference to the connection specification name, which points back to the class cache key, the DB config, which is the entire database configuration object that belongs to the pool, and the schema cache. Lastly, the pool_config points to the connection pool itself. When you call a model in one of the meals databases like dinner_recipe, Rails looks up the connection based on the parent class name, which for dinner_recipe is meal_application_record. The class references the pool_config which then references the connection pool.
00:30:47.230 When you switch connections from writing to reading, Rails is able to look up the connection the same way as it does for the writing handler. All it's doing is switching the context and using the same lookup code to find the pool based on the parent class name. Since Rails knows how to find the connection based on the class name, it simply switches from reading to reading context and vice-versa. However, you may realize that this cannot support shards if we're always keying connections by class name; adding a second or third shard with the same class name would make it impossible to look up the correct connection when we swap shards.
00:31:37.270 How can we look up the correct connection if we do not have a unique identifier? This particular problem was a blocker for adding sharding support. We fixed this by creating an intermediary called pool_manager. Instead of pointing the handler to the class and the pool directly to the pool_config, we changed Active Record to have the class point to a new object called pool_manager, which is keyed by the shard key we pass into connects_to. For non-sharded connections, we simply continue using the default shard key. This is crucial because Rails will set one, thus allowing us to maintain behavior for non-sharded applications without breaking any APIs. This is how we managed to enable multiple connections on shards for a single class by having a different pool_manager for each pool_config belonging to meal_application_record.
00:32:38.030 This intermediary allowed a public API for sharding to be implemented, and we were able to solve this without breaking a single public API. Don’t worry if you didn’t follow this too closely. You don’t need to know the internal architecture of connection management to be able to use multiple databases in your application. As you saw in my demo earlier, one of the enhancements in Rails is a middleware for automatic connection swapping. Currently, only role swapping is supported in this middleware, but sharding support is expected soon.
00:33:19.630 The automatic role swapping middleware switches the connection context based on the request type and how recent the last write was. This middleware is designed to guarantee read-your-own-write, meaning that if you write data, we guarantee you'll be able to read it because we send your requests to the primary until it's safe to send them to the replicas. It doesn’t guarantee immediate reads for any user who didn’t perform the data update. The middleware can be activated through Rails configuration. By default, Rails will use a two-second delay before switching read requests back to the replica after a write to account for replication lag. You can change this to any number or calculation that suits your application. For more advanced requirements, any of the classes in the middleware can be overwritten, as they were specifically designed for that purpose.
00:34:11.780 Internally, the database selector middleware is initialized by an app resolver class, context class, and options. The resolver class determines which database role to switch to, and the context class is the context in which we switch. By default, the context for switching is stored in a session. When a request begins, the select_database method is called. In this method, Rails checks whether the request is a reading request. A reading request is defined as a GET or HEAD request. If it is, we call resolver_read; otherwise, we call resolver_write.
00:34:48.040 The resolver_read method checks whether it’s safe to read from the replicas or not. Read_from_primary checks whether the time since the last write is acceptable. This is calculated based on the context timestamp, which is, by default, the session’s last write timestamp, subtracted from the current time. We ensure that this value is less than or equal to the delay set by the middleware options, which defaults to two seconds. If the time since the last write is not okay, we read from the primary, bypassing the writing role and ensuring writes are prevented. If the time since last write is acceptable, we call read_from_replica, which uses the reading role.
00:35:58.060 Lastly, if we are performing a put, patch, post, delete, or any other kind of request that writes to the database, we call write_to_primary. The reading role and writing role are set while prevent_writes is set to false. The write_to_primary method also updates the context's last write timestamp, which is used to determine when the last write occurred so that we can read from the primary conditionally. The primary conditional also determines whether it’s safe to read from the replicas or not.
00:36:42.330 As I mentioned before, the middleware is designed to be extensible and can be changed or overwritten to meet your application's needs. If you prefer to use a cookie for your context, you can implement your cookie class in your application and then update the Rails configuration to use that new class. The middleware serves as a guideline for implementing automatic swapping in Rails. It’s purposefully concise, designed to not do everything your application may need because replication lag and other database traffic issues are unique to your application. We created something that provides guidance for automatic swapping without strictly enforcing the use of Rails defaults.
00:37:33.160 Before we get into the last portion of this talk, I want to take a moment to thank a few folks. First, I want to thank Ruby Central for inviting me to keynote RailsConf this year. You all do an amazing job, and I appreciate you. While we can't be physically in a room together, I’m glad you provided a way for me to share this work with everyone. Next, I'd like to thank my husband, Dave Uchitelle, who has spent countless hours listening to me go on and on about multiple databases, this talk, and Rails in general. He inspires me to excel and has always been supportive of my goals.
00:38:37.150 Next, I want to thank Searles, who has helped me with many talks over the years, especially this one. He’s the person I call when I can't get a talk to work, and every single time, it turns out better because of his help. I’d like to extend my gratitude to DHH for creating this framework that not only provided me with a fulfilling career but also an amazing community and some of my closest friends. I wouldn’t be here keynoting RailsConf and discussing this work without his influence.
00:39:29.260 It was Aaron's idea to turn the database hashes into objects, and this feature turned out so much better because of that. Thanks to John for his incredible partnership. We started collaborating to fix a bug in your configs one day, and we just never stopped. Many of the features and improvements rolled out in late 2019 and early this year were the result of work John and I did together. It was John's idea to implement the pool manager for sharding, and without him, we probably wouldn’t have been able to implement sharding successfully.
00:40:03.080 Lastly, I want to thank all of you watching at home from your couch. While it’s sad that we can’t be in a physical room together, I'm grateful that I still get to share this experience and talk with you. I hope that you have enjoyed it so far. When I first began working on multiple databases, I had no idea how much work it would entail. I could hope we had rolled out multiple database setup years prior and had monkey patched Active Record, adding custom methods to establish connections.
00:40:30.990 We wrote custom Rails tasks and had layers of hacks to make sharding work and around filter for auto swapping connections. Since I hadn’t written any of the functionality we had at GitHub, I couldn't immediately upstream our hacks; I needed to identify what Rails was missing and what features we actually needed. Just because it worked at GitHub didn't mean it was necessary or appropriate for implementation in Rails. I began with a brand new application to see what the user experience in Rails was truly like.
00:41:11.769 It didn’t take long to discover that even basic functionality was missing, and I quickly found that migrations were completely broken. There was no clear way to change the connection any migration was running on. There was a three-level config supported by Rails but the feature wasn’t parsing the configurations correctly, causing any application using this feature to fail to boot. There were no Rails tasks for creating, deleting, migrating, or operating on more than one database per environment. Beyond missing the basics that make Rails enjoyable, we didn’t have the capacity to establish more than one connection per class, nor was there support for roles, shards, or replicas.
00:41:56.950 Lastly, there was no API or pattern for an auto-switching API. In situations like this, it’s easy to feel overwhelmed, and I’d be lying to you if I said I never felt frustrated, angry, or unsure of the best way to fix something. I wasn’t the first person involved in Rails that tried to add first-class support for multiple databases. There are numerous gems that do similar work, and we had tried over the years to make it a priority on Rails core.
00:42:42.640 However, we often get hung up on the perfect API or how to fix the internals without breaking public APIs. I knew this was going to be a big project, so instead of attempting to fix everything at once, I broke the problem into features that needed to be developed to make multiple databases in Rails not just functional but also a positive user experience.
00:43:30.160 First, we fixed migrations to recognize migration paths, then we transformed configurations into objects, followed by making Rails tasks functional. Next, we focused on the connection APIs, improved many internals, and eventually developed a sharding API. Then we implement a pattern for automatic connection switching. There is still a lot of work left to do, but we’ve accomplished a significant amount in two and a half years.
00:44:05.640 If we zoom in on the timeline, we can see that from January 2018 to June 2019, the majority of the work focused on the public API. We heavily prioritized the application and user experience to guarantee a smooth and user-friendly feature. If we had rewritten the internals without knowledge of the user experience or API, we would either have forced the API to conform to the work we'd done or constantly revised our internal code.
00:44:58.740 For the latter half of 2019 into 2020, we concentrated mostly on private, internal APIs. Much of this effort was to rectify bugs or inconsistencies in non-public-facing code. We removed undocumented classes, rewrote the paid tracker to use a more modern Active Support fork tracker, and transitioned to hash lookups using the database configuration objects. Our approach of fixing the public API first and then working inward is somewhat unique.
00:45:40.170 As software engineers, we aim to anticipate everyone’s needs and future use cases, seeking to address all the broken internal components first. We aspire to build the most resilient and flawless software, requiring a robust foundation. But what if I told you Rails’ foundation is already strong enough? It would not have been possible for me to build these public APIs if it weren’t true. If that were not the case, it would have been unfeasible for anyone to monkey patch Rails for multiple databases or build their own gems.
00:46:17.780 Rails' sturdy foundation is why applications that start on Rails can grow on Rails. As long as we keep improving that foundation, Rails keeps your applications simple and less complex by absorbing that complexity for you. We built Rails as a solid foundation to support your applications as they evolve from Rails new into applications that can handle millions of requests and store terabytes of data. The majority of Rails’ functionality derives from real applications running in production.
00:47:06.720 This provides stability for Rails. We build Rails to support your applications; we don't build Rails for perfect imaginary use cases. Building for perfect use cases falls short when you need to back public APIs widely used in open source. We are not developing for ourselves or for our company’s products; we are building for everyone. It would have been selfish of me to rewrite connection management to be what I imagined would be perfect without considering your needs.
00:47:56.020 When creating these open-source tools, one must consider what public APIs might break and what use cases you may not have accounted for. You need to be flexible enough to pivot when your plans don’t pan out but not so flexible that features never get shipped. Striking this balance is critical for maintaining work in open source. When we develop features for Rails, we are not doing it for our ego or our own needs.
00:48:48.680 We built multiple databases for you. Rails is genuinely unique. Even if you believe Rails has too many features, or if you disagree with the APIs or dislike our management of the community, you’d be hard-pressed to find any framework that cares more about you than Rails. We literally optimize for programmer happiness. How many languages or frameworks can claim they genuinely care about you? Without this, Rails would just be another framework—most frameworks care only about bytes and widgets while Rails genuinely cares about you.
00:49:35.250 Not in the sense that it’s sentient, but it was designed with your experience in mind, considering your applications, your company, and your needs. I typically ask the audience, in past talks I've asked you to look for technical debt you can address to Rails, or I’ve encouraged you to upgrade your applications so that we can continue to support and utilize Rails long-term.
00:50:04.030 For this talk, I only want you to understand how much it means to me that you are all part of this community. I am immensely grateful to all of you. This timeline and this talk isn’t just a visual representation of the effort that went into multiple databases; it’s a visual representation of my love for Rails and this community.
00:50:35.360 While this RailsConf keynote was technically a talk, it’s actually a love letter. Regardless of your role, be it user, bug reporter, or contributor, multiple databases are a success because of you. Your success is my success. Thank you for allowing me to share this work with you.
Explore all talks recorded at RailsConf 2020 CE
+26