Database Performance

Summarized using AI

Technically, a Talk

Eileen M. Uchitelle • September 16, 2020 • online

In her talk titled "Technically, a Talk" at the RubyDay 2020, Eileen Uchitelle, a staff engineer at GitHub, delves into the complexities and advancements of Rails' support for multiple databases. The session is framed as a love letter to Rails, where she emphasizes the importance of leveraging proper database management for robust application performance. Here's a breakdown of the key points discussed:

  • Introduction to Multiple Databases: Eileen introduces multiple databases as a core feature in Rails, explaining its necessity for scalability and growth in modern applications.
  • Implementation and Configuration: She details how developers can configure their applications to interact with multiple databases, including horizontal sharding and functional partitioning, each serving different purposes for alleviating database loads.
  • Rails Connection Management: A significant focus is placed on Rails' connection management. Eileen highlights the introduction of APIs designed to establish connections seamlessly and switch between them as required, providing both functional and performance benefits.
  • Database Configuration Objects: The session discusses new architectural approaches, particularly the transition from traditional configuration hashes to database configuration objects. This shift enhances Rails’ internal processes and simplifies interaction with database tasks.
  • Migrations and Rails Tasks: Eileen explores how Rails has evolved to support database migrations across multiple databases. This includes adjustments to tasks like creation, dropping, and migrating, ensuring a smooth user experience.
  • Automatic Connection Swapping: The middleware for automatic connection context switching based on request types is introduced, guaranteeing that users can read their own writes immediately while managing read and write loads effectively.
  • Collaboration and Community Impact: Eileen reflects on the collaborative efforts behind these features and expresses her admiration for the Rails community, emphasizing the user-centric approach in development.
  • Conclusion and Call to Action: She concludes with a heartfelt invitation for developers to consider how they can contribute to Rails, reinforcing the message that the framework ultimately exists to support its users and their applications. The journey toward enhancing Rails for multi-database support showcases the team's commitment to making the complexities of database management accessible and manageable for developers.

Overall, this talk not only provides technical insights but also reinforces the community spirit and commitment to continuous improvement within the Rails ecosystem.

Technically, a Talk
Eileen M. Uchitelle • September 16, 2020 • online

Peer deep into Rails' database handling and you may find the code overly complex, hard to follow, and full of technical debt. On the surface you're right - it is complex, but that complexity represents the strong foundation that keeps your applications simple and focused on your product code. In this talk we'll look at how to use multiple databases, the beauty (and horror) of Rails connection management, and why we built this feature for you.

rubyday 2020 - Virtual edition, September 16th 2020. https://2020.rubyday.it/

Next edition: https://2021.rubyday.it/

rubyday 2020

00:00:48.239 Here we are for our next talk. We already discussed databases in the morning, comparing Active Record with a potential alternative. I wonder how many of you have dug into the innards of Active Record; there's plenty to understand in there. So let's welcome Eileen for our next live talk. Hello, Eileen! Nice to see you again! Hi, it's great to see you too! It's really weird to be giving a talk live from home. I know, right? It feels very intimate and comfy, but it also feels like people are looking at me from my house.
00:01:20.560 Yeah, totally. Okay, I can start now, right? Yeah, right on! If you want to, see you later. Awesome! Hi everybody! Ciao, buongiorno! I hope I didn't totally butcher that pronunciation. I just started learning Italian when I was first asked to keynote Ruby Day, and I challenged myself to learn some. However, I don't know enough to give a whole talk in Italian. I hope to visit Italy one day to see if I understand everything that I've learned.
00:02:20.560 I'm Eileen Uchitelle, and I'm a staff engineer at GitHub. I work on improving Rails and Ruby to help GitHub meet our scaling, reliability, and usability goals. I'm also a member of the Rails core team, which is responsible for deciding the future of the Rails framework, including when new releases should come out and what features we want to build and support.
00:02:40.400 If you want to get in touch with me, you can find me online at the handle @EileenCodes. Welcome to my talk titled "Technically, a Talk." This title is a bit of a joke; it's a play on how technical this talk is, but you'll also see that it's so much more than just a talk. Today, we're going to deep dive into how to use multiple databases, the design and architecture of the APIs, and how connection management works internally.
00:02:46.720 When I began working on this talk, I mapped out every single pull request and commit I was involved in that implemented multiple database support. I was impressed to find that over time, we amassed over 100 pull requests that added the functionality Rails needed to support database scalability and growth. We did work to improve migrations, configurations, and Rails tasks, and we built easy-to-use APIs for establishing multiple connections. Additionally, we created an auto-switching middleware.
00:03:18.400 As I review this timeline, I don't just see years of work; I see how much I care about this framework and our community. So while this is technically a talk, it's also so much more than that. This talk and this timeline represent my love letter to Rails and the community. I started working on multiple database support two and a half years ago. Before that, there were a few third-party gems that provided solutions, and at GitHub, we had written our own internal code for handling multiple databases.
00:03:55.200 From my experience at GitHub, I knew supporting multiple databases in Rails wasn't just a nice-to-have; it was a requirement for Rails' continued success as a modern web framework. We must keep improving the foundation that Rails provides for applications so that companies built on Rails can mature and grow with it for years to come. GitHub is one of Rails' many success stories; we weren't successful in spite of Rails; we thrived because of it. The changes we made allowed us to continue using Rails as our company grew.
00:04:36.720 There were many good reasons for us to upstream our database code to Rails. It meant we could delete thousands of lines of code from GitHub, making our code less complex, more modern, and more resilient. More importantly, it meant that we were finally sharing what we had built for Rails with the rest of the community. Today, I'm going to share that work with you. We'll first talk about what I mean when I say multiple databases, then look at how to set up your application, and finally dive into how architecture and design works.
00:05:05.039 When I talk about multiple databases, I'm referring to the ability of your Rails application to connect to, write to, and select from more than one database in a single environment. In a standard new Rails application, there is one database per environment. The database for the current environment traditionally holds all of your tables that your application needs. As an application grows, the database can become too large to handle the necessary data and traffic effectively.
00:05:30.240 When that happens, there are many ways to alleviate pressure on your database. In this talk, I will focus on three ways we now support in Rails. One way to break up your data is by using horizontal sharding, which is also known as horizontal partitioning or multi-tenant sharding. Horizontal sharding involves horizontally splitting your tables by row. With horizontal sharding, each partition of your database contains the same schema but different data. This approach limits the number of rows in each shard or partition, thereby reducing the indexing and selection workload.
00:06:14.960 Another way of partitioning your data to alleviate pressure on your database is with functional partitioning. In this method, tables are split by function or need rather than by row. Functional partitioning means splitting whole tables into separate servers or clusters. Each partition may have a unique or different schema, and this method focuses on moving high-volume write tables out of your main database cluster into their own clusters to relieve pressure on the main database.
00:07:05.120 Both of these types of database partitioning can also support read replicas. Replicas are copies of the primary write database, used to reduce pressure on the write database for read queries. The goal is to send as many read queries, like selects, as possible to the replicas. Now that we understand what multiple databases are, we can look at how to set them up in your application. If you watched my RailsConf keynote, this was initially going to include a demo, but I cut it due to time constraints.
00:07:54.720 In the following slides, we have a recipes application. The application has one primary database with tables such as users, and a second database called meals that contains recipes. The meals database has three shards. The default shard contains the shard keys that are used to look up data from the other two shards, which contain all of that content. Both the primary and meals databases have replica databases.
00:08:47.360 To implement this database setup, we need to include the databases in our configuration file. We do this by adding a new tier to our database yaml and then adding entries for each of the writers and their replicas. This configuration example has been simplified to fit on the slide. Once we have our configurations, we can run the standard Rails database tasks such as create, drop, and migrate; all of these will work. The only thing we need to do is set up our connection classes.
00:09:18.560 The primary connection for our application is used for models that inherit from application record. We need to update the existing application record to be aware of both connections by using the connects_to API. Additionally, we need to implement a new abstract class for our sharded models. Now, any model that inherits from meal application record will use these databases instead of the primary database. That's all your application needs to use multiple databases.
00:09:50.480 We haven't yet explored how to switch shards or roles, but even without that, your application is already set up to read and write from more than one database per environment. We'll look at how to actually switch connections later on, but first, I'd like to address the architectural changes needed for Rails to support multiple databases. At GitHub, we initially wrote our own code for handling multiple databases, which involved monkey-patching and extending Active Record.
00:10:22.720 Once I understood what was broken in Rails and what needed fixing, I built APIs in Rails that were inspired by our code at GitHub. I altered the APIs significantly to better align with Rails' style, but effectively, the internal code behaves the same. This approach to building functionality is somewhat unique because I developed the public-facing code first and addressed internal issues second. This might sound backwards, but it allowed me to architect changes based on real-world applications instead of striving for a perfect but unnecessary use case.
00:11:05.120 In Rails, we focus on building easy-to-use, simple, and functional APIs that prioritize user experience. Our goal is to optimize for your happiness. The internal complexity is managed by Rails so you can concentrate on building your application. One of the first improvements we made for the user experience with multiple databases was in migrations. The Rails migrations and various tasks now understand how to handle multiple databases.
00:11:54.720 The scaffold model and migration generate commands can accept a database argument. This argument will look up the migration paths you set in your database for sharding. We want to use a single directory and migrate that multiple times, so we can set each of the entries to use the same migration path. We won’t dive into the internal changes regarding migrations that allowed this to work because I previously discussed that at RailsConf 2018. The migration internals haven't changed since then, so if you're interested, you can find that talk on Confreaks or YouTube.
00:12:30.880 On the other hand, one of our most significant architectural shifts in Rails was turning database configurations into objects internally. Previously, database configurations were simply nested hashes which made sense and served us well when applications used just a two-level configuration. Rails would parse the database YAML and convert it into a two-level hash; it was straightforward to find the right configuration as it always corresponded to the current environment.
00:13:12.800 For instance, if we're in development, there's only one configuration that can possibly be selected. However, when Rails introduced a three-level configuration to support multiple databases per environment, it increased the complexity of the generated hash, breaking many of Rails’ internal assumptions. When an application boots, it must connect to a database; therefore, Active Record includes a rail tie that establishes a connection to the environment configuration. But if we have two or more configurations per environment, how does Rails know which one to connect to?
00:14:00.960 Since there's a possibility of having multiple writing databases per environment, this is a question we must answer. In addition, how can Rails comprehend that we have a three-level configuration and use that configuration to create named Rails tasks for each of the database entries? Constantly parsing hashes to determine our position in the stack is feasible but prone to error and adds unnecessary complexity. We wondered if we could parse the hash just once and know everything we needed about all configurations after the application booted.
00:15:10.720 This led us to develop a new database configuration object. The idea was that we could parse the configurations once, allowing for easier selection and enumeration of objects created. Once the hashes were converted to objects, we were able to access all required details about the configurations, such as their associated environment, name, and connection requirements. The database objects include an identifier name and a configuration hash, representing the first, second, and third levels of the YAML respectively.
00:15:55.439 If you're using a URL configuration, Rails generates a URL config instead of a hash config. The only difference is that the URL config includes the original URL that was passed for database configuration. Database configuration objects are instantiated by Rails when the application boots, and in an Active Record initializer, the configurations method creates a new database configurations object that loops through all the hashes, turning them into objects.
00:16:32.160 We won't delve into the build configs method because it isn't particularly interesting. What results is an Active Record database configurations object that contains an array of configurations, making it simpler for us to iterate, enumerate, and select the appropriate configurations from the list without having to parse a three-level hash repeatedly. Once we have these configuration objects, we can query for all of the configurations for an environment using the configs4 method.
00:17:11.680 This is quite similar to using a hash and will return a list of configuration objects for the production environment. We can also get configurations for an environment by name. For example, using the name production and meals will return a corresponding configuration object for those parameters. By default, replicas are excluded from the configurations returned by configs4 because, generally, you don’t need to perform any operations on the replicas—you're not going to create, drop, or migrate the databases as their names are the same.
00:18:00.640 Instead, they should be using the same configuration as the writer database. Under normal circumstances, your databases will already be created; however, you would need to implement a feature that synchronized data between the primary and replicas locally, which can be quite challenging. If there's a reason you need to retrieve replicas from the configurations list, you can pass include_replicas: true to configs4.
00:18:41.680 These database configuration objects facilitate Rails' interaction with the database configurations. Once we've correctly implemented these objects for generating Rails tasks and the configurations, we started working on the internal APIs. We ensured that all internal methods in Rails utilized these database configuration objects to streamline how Rails interacted with the database configurations. The goal was to eliminate situations where Rails would reach into the configuration hash.
00:19:30.720 The objects encapsulate the hash; we never want to ask the hash directly when the object can return the requisite value. To achieve this, we identified all instances where we accessed the hash and added reader methods accordingly, as we didn’t want to change all hashes to configurations and then need to access the hash directly. We want to obtain all the needed information through the objects themselves. In short, we needed to be able to ask the database configuration directly for its settings.
00:20:25.360 To accomplish this, we reviewed where we accessed the configuration hash to identify which keys or values Rails required, and which were pertinent to the client. Essentially, Rails should never have to reach into the hash stored within the database config for values; it should always be able to request data directly from the object itself. To facilitate this, we added reader methods to the database configuration objects. This allowed us to remove all the hash accessors in Active Record and transform them into genuine accessors.
00:21:21.439 So, as an example of an internal change in Rails, switching from hashes to using objects resulted in cleaner code and a singular location to set defaults. The configuration database objects have been particularly beneficial in enhancing Rails' internal code and played a vital role in features like generating the Rails tasks. Active Record’s database tasks are crucial for ensuring a good development and testing experience. Without these, you would need to execute numerous manual SQL queries just to set up your database.
00:22:14.960 Even though all internal support for multiple databases exists, the feature would be a failure without supporting Rails tasks. These tasks make using Rails applications enjoyable as they handle much of the work for you. The database objects were instrumental in creating these tasks because they allowed for enumeration, finding, and selecting the database configurations necessary for generating these tasks.
00:23:11.680 To support multiple databases within the Rails tasks, the first step was to update existing tasks to operate on all databases associated with an environment. To facilitate this, we modified the tasks to utilize the database configuration objects. As an example, the migrate task was revised so that it looks up the configurations for the environment and returns an array of database configs. Since configs4 does not return replicas by default, we assumed that all these objects pertained to the writer databases needing migration.
00:24:03.520 If we were to include replicas, we would raise errors since replicas should have a read-only user. Following this, we establish a new connection using the database config object. Since Rails tasks don’t boot the application and are unaware of which models need migration, we must ensure that we connect to the correct database while executing these tasks. Once connected, Rails calls migrate, and we subsequently return to the original configuration to ensure Active Record Base isn't connected to the wrong database where migrations are being conducted.
00:25:46.240 In addition to supporting all the original database tasks, we introduced specific tasks for each individual database. While creating these tasks seems straightforward, it turned out to be much more complicated than I anticipated. When looking at the code that generates the named migrate tasks, you will notice that we utilize a method named for each. Before the configurations are loaded, the configuration objects are generated when load config is called, but we needed to access the names—such as primary, meals1, meals2.
00:26:31.680 We needed to have access to these names before the environment was loaded because database configurations allow applications to add ERB and other environment-specific configurations into the YAML file. If we were to load the database YAML before the environment, an exception would occur. If we loaded the configuration to generate the tasks, Rails would experience a significant performance impact. Thus, we needed to devise a method to parse the YAML without evaluating any ERB or loading the entire application.
00:27:24.160 To achieve this, we created a special YAML loader and rail ties, which parse the ERB with a specially designated class called DummyERB. This class has a sole responsibility: to replace any ERB in the YAML with an empty string. Consequently, we receive database configurations that, while unsuitable for directly connecting to your databases, provide sufficient information to extract the environment name and configuration name for generating named tasks.
00:28:31.920 After invoking load config, we can reference the actual configuration objects since the environment will already be loaded. This approach helps us avoid performance hits while generating task names, and yet we still acquire all we need for multiple databases. One of my favorite features of the multiple database work I accomplished is the connection APIs; these simple APIs enable the establishment of connections and the swapping of connection contexts, effectively powering multiple databases.
00:29:40.799 The connects_to API allows you to set up more than one connection on your models. This method is quite similar to established connection, except that you need to specify both the type of connection and the database configuration simultaneously. In a single database application, you don't have to call established connection in your application code—Rails does that for you upon booting. If you’re using multiple databases, we need to inform Rails that there are additional connections that require establishment.
00:30:36.960 Depending on the types of connections required, there are two ways to establish connections with connects_to. If you’re utilizing functional partitioning with a single primary and a single replica, you should pass a hash to connects_to. The keys in this hash denote the role names for your connections—by default, Rails expects you to set primaries for writing and replicas for reading.
00:31:40.080 The second part of the hash represents the names of your database configurations, which must correspond precisely with the second tier of your database configuration. Connects_to should only be called on abstract classes; database clients have a finite number of open connections, so we want to ensure that each connection is only established once. Always set up your connections in an abstract class and inherit models needing communication with that database from the parent class.
00:32:57.360 You can also connect multiple shards and roles; however, the shards keyword argument and database arguments do not work together—you must utilize one or the other. By convention, default_shard_1 and default_shard_2 signify the shard keys. You must label one of the shards as default; if you don't, Rails will generate one for you, leading to duplicate connections. All other shards can be given any names, but the default shard should contain references to the other shards so you know which shard key to look up your data.
00:33:49.680 Similar to the database argument, the hash for each shard reflects the respective role and database configuration names. After establishing an active connection, your application models will need the capability to switch between roles and shards, which is facilitated through the connected_to API. This API allows you to swap between shards or roles, whether in a model, controller, or script; anytime you require a change of the connection context.
00:34:41.639 For instance, if you want to read data from your replicas, you can switch the role to reading. This configuration will now select data from the reading pools. The connection itself is determined by the model being loaded, such as User or DinnerRecipe. Rails knows that User inherits from ApplicationRecord and DinnerRecipe from MealApplicationRecord, allowing it to pinpoint the correct connection from the reading pools. If you attempt to write data from the reading role, Rails will raise a read-only error—all these safeguards apply regardless of whether replicas have a read-only user.
00:35:36.960 Moreover, connected_to can accept a prevent_writes argument, which induces a read-only error for any role. This feature may prove beneficial if your app is just beginning its use of multiple databases, ensuring you don’t mistakenly write when you intend to read before your replicas are fully operational in production. This is likewise handy if your role name differs from reading but you still wish to block writes.
00:36:32.720 Switching shards with connected_to is as straightforward as providing the shard key with the shard name you want to switch to. The shard key behaves similarly to the role key, looking up connections from the pool based on the context set in connected_to. If you pass only the shard without a role, Rails will assume you want the writing connection. To switch to the reading connection, you combine the role and shard together in connected_to.
00:37:29.440 Now, let's delve into how these connection contexts operate internally in Rails. In a single database application, we utilize a singular connection handler. By default, Rails always incorporates a writing role. The writing handler retrieves connections via class name, and as we look into ActiveRecord Base, we note that this class references the pool config, which points to the connection specification name. This serves as an index back to the class cache key associated with the ActiveRecord Base.
00:38:30.880 Under this, the db config is the database configuration object belonging to the pool, along with the schema cache. Lastly, the pool config links to the connection pool, which is another object housing additional details. When you call a model from one of the meal databases, like DinnerRecipe, Rails identifies the connection based on its parent class name, which is MealApplicationRecord. Similarly to ActiveRecord Base, that class maps to the pool config associated with the connection pool.
00:39:38.440 Upon switching connections from the writing to reading roles, Rails can locate the connection following the same process it applied for the writing handler. We first switch to the reading handler, enabling Rails to look up the connection from the pool relying on the parent class name. Given that Rails understands how to find the connection based on the class name, it becomes straightforward to retrieve connections from the pool once we modify the context.
00:40:26.720 As we examine this structure, you might find yourself pondering how this method supports shards, considering that connections are keyed out on the class name. If we were to add a second and third shard with the same class name, it would create challenges in identifying the correct connection when we switch shards. This is the very issue that blocked the addition of sharding to Rails.
00:41:29.040 To address this, we chose to create a new intermediary object called pool manager. Instead of directing the handler to the class and subsequently to the pool, the pool can then be directly pointed to the pool manager. This pool manager holds the shard keys passed into connects_to for non-sharded connections. For standard non-sharded connections, we consistently use a key called default, which allows us to uphold backward compatibility while enabling multiple connections for shards within a single class.
00:42:32.960 This intermediary mechanism enables us to utilize a public API for sharding without altering any existing public APIs your applications might be using. Don’t worry if you find the internal architecture a little difficult to follow; you do not need to grasp every detail to use multiple databases in your application.
00:43:36.880 Another feature provided by multiple databases in Rails is the middleware for automatic connection swapping. At present, only role swapping is supported, with shard support anticipated soon; however, that development has not yet materialized. This automatic role swapping middleware switches the connection context based on the type of request and how recent the last write was. The middleware guarantees read-your-own-write, meaning that if you write data, you will be able to read it because the requests are sent to the primary until it's safe to direct them to the replica.
00:44:44.880 This middleware does not ensure immediate writes for any user who didn’t process the data update. The configuration can be activated through Rails settings, which by default incorporates a two-second delay before rerouting read requests back to the replica post-write, accounting for replication lag. You have the flexibility to adjust this according to your application’s needs. Furthermore, any of the classes within the middleware can be modified or replaced, as they are designed to be monkey-patched within Rails. The database selector initializes with an app resolver class, context class, and options.
00:45:51.680 The resolver class determines which database role to switch to, while the context class indicates the conditions under which to switch. By default, the context for switching is stored in a session. When a request starts, the select database method is called, which checks if the request is a read request—namely head or get requests. If it is, we call resolve to read; otherwise, we call resolve to write. The resolver read method assesses whether it is permissible to read from the replica.
00:46:56.640 This validation checks if the time elapsed since the last write is appropriate. This timeframe is calculated by subtracting the timestamp stored in the session from the current time, then verifying if it falls beneath the delay stipulated by the middleware options, which defaults to two seconds. If this time frame is too long, we read from the primary or writing database by passing the writing role to connected_to and ensuring that writes are prevented. If the time since the last write is acceptable, we can perform a read from the replica using the reading role.
00:47:54.240 On the other hand, if we have a non-reading request that involves a put, patch, post, or delete, we call write to primary employing the writing role and prevent writes set to false. This method updates the context's last write timestamp, which is used to ascertain when the last write occurred, and it dictates whether we're safe to read from the replicas. As I mentioned earlier, the middleware is designed to be adaptable, enabling you to change or overwrite it according to your application’s requirements.
00:48:56.160 For instance, if you prefer to use a cookie for your context, you can craft your cookie class within your application and simply alter the Rails configuration to utilize your new class within the middleware. The middleware serves as a guide for implementing automatic connection swapping in Rails. It's intentionally compact and doesn't fulfill every requirement your application might have, given replication lag and other database traffic intricacies are unique to your application.
00:49:56.760 Before delving into the final portion of this talk, I want to take a moment to express my gratitude to several individuals. First, I want to thank the organizers for inviting me to speak at Ruby Day and for providing a platform to share our presentations. Next, I'd like to acknowledge the two individuals who contributed immensely in developing these features: Their tender love and support—John and Aaron. Interestingly, both of them are also speaking today.
00:50:47.960 Both played vital roles in assisting with building these features; for instance, the idea of database caches as configuration objects was actually Aaron's contribution. I also attribute the Dummy ERB class to him, while John assisted me in resolving numerous bugs I encountered in connection management. Additionally, the concept of the pool manager for sharding was his idea. Finally, I'd like to express my gratitude to all of you tuning in from home. While it's unfortunate we can't share a room together, I'm still glad to have this opportunity to speak with you.
00:51:46.320 When I first embarked on the journey of working with multiple databases, I had no idea how significant the effort would be. At GitHub, we had rolled our own multiple database setup years ago, which involved monkey-patching Active Record, devising custom methods for establishing connections, and writing our tailored Rails tasks. We had a collection of hacks to enable migrations, connection swapping, and an around filter for automatically swapping connections.
00:52:53.040 Since I wasn't responsible for the functionality we built at GitHub, I couldn't directly upstream our hacks. Instead, I needed to determine what Rails was lacking and what features we genuinely required. The existence of those features in GitHub didn't necessarily indicate that they were needed in Rails. I implemented most of the multiple databases work in a brand new application to properly assess the user experience, and it didn't take long to ascertain that even basic functionality was missing.
00:53:39.440 For example, I quickly found that migrations were entirely broken. There was no ability to modify the database connection while a migration was executing. We had introduced three-level config support for multiple databases per environment, but the feature wasn’t parsing the configurations accurately, causing applications to fail to boot. Furthermore, Rails lacked tasks for creating, deleting, migrating, or operating on more than one database per environment. Beyond simply missing essential features, there was no way to establish multiple connection per class.
00:54:31.440 There wasn't support for roles, shards, or replicas, nor was there any API or mechanism for automatic switching. In such situations, it’s easy to feel overwhelmed. I would be dishonest if I said I never faced frustration, confusion, or uncertainty regarding the best way to address an issue. And I wasn’t the first individual to attempt to incorporate first-class support for multiple databases into Rails; various gems provided similar functionalities.
00:55:14.960 Over the years, we attempted to prioritize this within Rails core. Yet, we often became bogged down by the pursuit of a perfect API and reconciling the internals without disrupting public APIs. I understood that this would be a substantial project. To mitigate that, I decomposed the issue into the features necessary to not only make multiple databases in Rails functional, but to create a positive user experience.
00:56:19.760 Initially, we needed explorations to clarify migration paths and provide context to the connections. We transformed configurations into object representations, enabling the functionality of Rails tasks. We developed connection APIs, fixed numerous internal aspects, and eventually constructed a sharding API. Following that, an API pattern for automatic swapping was established. While there remains considerable work to be done, we've accomplished a lot in the past two and a half years.
00:57:06.080 If we examine the timeline closely and explore each of the pull requests, we can observe that from January 2018 to June 2019, the majority of enhancements made to Rails were focused on the public API. I paid substantial attention to applications and user experience to guarantee a smooth and accessible feature rollout. Had I rewritten the internals without a clear understanding of user experience or APIs, we would have either had to force the API to accommodate the work done or continually revisit the internal code.
00:57:58.240 Throughout the latter half of 2019 into this year, our efforts were concentrated primarily on private APIs and internals. Much of this work involved remedying bugs or inconsistencies within the non-public-facing code. We discarded undocumented classes, modernized pitch-hacker, and transitioned hash lookups to deploy the database configuration objects. My approach—tackling the public API first and subsequently the internals—definitely stands out.
00:58:57.760 As software engineers, we often aspire to foresee each user need and future requirement while striving to mend every flaw first. We anticipate creating software that is not only resilient but also perfect. However, from time to time, it’s essential to acknowledge that Rails already has an adequate foundation. It was feasible for me to develop these public APIs first because the foundation was indeed strong enough. Rails’ foundation is why applications initiated on Rails can grow on Rails.
00:59:43.920 If we continue our efforts to enhance this foundation, Rails will continue assisting your applications in maintaining simplicity while effectively absorbing the complexities. Rails was developed with a robust foundation to accommodate your applications as they evolve from Rails into a framework that handles millions of requests and stores terabytes of data. The majority of Rails’ functionalities have been tailored from real applications running in production.
01:00:38.720 This is what renders Rails so stable. We build Rails to support your applications, not for imaginary perfect use cases. Constructing tools for perfect use cases doesn’t work when it involves supporting public, widely-used APIs. When we create open-source tools, we don’t build for ourselves, our companies, or our products; we build for everyone, including all of you.
01:01:31.680 It would be selfish for me, to rewrite connection management according to my ideals without considering your needs. When developing open-source functionality, it’s crucial to think about what public APIs could potentially break and what scenarios might have been overlooked. We must remain flexible enough to adapt when plans don't yield expected outcomes without being so malleable that features never ship due to interminable specifications.
01:02:25.920 Finding this equilibrium is challenging yet pivotal when sustaining an open-source framework. When creating features for Rails, we aren’t doing it for ourselves or our egos or our needs; we create Rails for you. We developed multiple databases for you. Rails is uniquely positioned. Even if you feel that Rails has excessive features or you disagree with the APIs, or are unhappy with how the community operates, you would find it challenging to locate any other framework that values you more than Rails.
01:03:13.360 We genuinely optimize for your programming happiness. How many languages or frameworks worldwide can assert that they profoundly care about you? Were it not for this, Rails would simply be another framework among many. These other frameworks typically focus on bytes and widgets and other impersonal metrics. In contrast, Rails is constructed with your experience in mind—who cares for your applications, your company, and you?
01:04:03.040 When I deliver presentations, I usually request something from the audience. In prior talks, I’ve urged you to evaluate your applications for technical debt that you may contribute upstream to Rails, or I’ve prompted you to upgrade your application, ensuring we continue to support Rails long-term. However, for this talk, I simply want you to recognize how meaningful it is to me that all of you are part of Rails and this community. I am sincerely grateful for each of you.
01:04:52.960 This timeline and this talk aren’t merely visual representations of the effort invested in multiple databases; they depict my deep affection for Rails and its community. While this may have been technically a talk, it’s actually a love letter. No matter what role you play—user, bug reporter, contributor—multiple databases are a triumph because Rails is a success due to you.
01:05:21.680 Your success is also my success. Thank you for allowing me to share this work with you. Okay, we're done. I’m uncertain about what’s next here, so please don’t worry about it.
01:05:51.560 Uh, I believe you can see in the chat how much people appreciate and love the work you've accomplished along with your colleagues. We are immensely grateful. I think I can speak for everyone when I say that one of the reasons we can still utilize Ruby and Rails in our daily jobs is due to your dedicated efforts and deep consideration.
01:06:35.679 What strikes me most is how precisely you have adhered to this direction—looking from the outside inward while keeping the users in mind. We are the users, and that strategy results in a superb experience when working with Rails. So thank you once more! We have a few questions, if you're open to them.
01:07:28.960 Yep! There was a question regarding whether multiple databases as a feature are available only in Rails 6 and later. I assume that is the case?
01:08:37.360 Yes, that's correct. It's exclusively available in Rails 6 or later.
01:08:51.640 That's just a consequence of when it was developed; we don't back-port features to previously released versions. Rails 5.2 was already available when I initiated this work—however, migration contexts work in 5.2, but nothing else does.
01:09:32.080 I believe the question was about whether there are tools that could potentially assist people using older versions of Rails to implement this feature. Do you have any useful information on that?
01:10:27.440 My recommendation is to upgrade and utilize the feature because staying on Rails 5.2 will soon no longer be supported. I recognize upgrades can be difficult; I’ve undertaken numerous upgrades; if anyone is interested in the GitHub upgrade experience, I haven’t discussed that since last year.
01:11:02.680 Although I understand that upgrades can be challenging, delaying them only complicates matters.
01:11:22.920 There are gems available; however, the issue is that the gems aren't compatible with how we formulated the Rails features. Therefore, even if you employ the octopus gem, transitioning over to Rails' multiple databases might prove to be tricky. I don’t know anyone who has successfully executed that conversion.
01:12:29.680 Okay, thank you! So if you need the feature, it's best to upgrade rather than attempt to implement something from a different gem. Otherwise, you could be creating a mess for yourself.
01:12:45.520 There were two other questions. One asked if it’s possible for Active Record to automatically use the replica for select queries and the primary for other queries, as an alternative to the connection swapping middleware.
01:14:01.680 We did consider that, but the challenge is that there's no means for Active Record to assess replication lag. The reason we chose to implement this method based on request type is simply that it’s the most effective solution.
01:15:02.560 This is how it functions at GitHub, and we also utilize GTID. We might also upstream GTID support—but that’s a separate layer that we are still figuring out. Therefore, on the query level, there's no way for Active Record to accurately determine if it’s safe to read from the replicas.
01:15:46.480 And should something happen to your replicas resulting in a five-second lag, we don’t want to direct everyone to the replicas without an option to disable it. So, utilizing request types is a superior approach because we can assure that, when utilizing GitHub, this is how it operates. If you write to the database, open an issue, we guarantee that you will see that issue immediately. If there’s a brief lag, the rest of the users won’t be able to see that issue until it gets copied over.
01:16:51.760 For comparison, usually, that lag is so minimal that no one even recognizes that it has occurred.
01:17:29.680 The second question pertains to whether this feature is suitable for multi-tenancy.
01:18:10.040 Yes, the sharding aspect corresponds to multi-tenancy. There are numerous terms for it that I couldn’t encapsulate all in the talk, but I’m confident others will comprehend.
01:18:45.840 The sharding support is still somewhat new and doesn't yet offer automatic switching for that purpose. Consequently, you will need to utilize custom around filters. It’s not extremely complex; Zendesk actually has a multiple database gem that already features something like this that we might borrow from. However, we don't utilize the sharding API at GitHub, which complicates our ability to continue enhancing it without feedback from users.
01:19:36.240 If you’re employing it and you have insights to share, please let me know.
01:20:14.800 Okay, thank you immensely! Until next time!
01:20:20.960 No, thank you! Thanks, everybody!
01:21:10.480 You
Explore all talks recorded at rubyday 2020
+2