00:00:16.279
Let's begin by considering traditional Rails.
00:00:20.279
A single-column primary key can be thought of as your favorite bicycle—simple, effective, and perfect for day-to-day commutes.
00:00:25.400
Much like an incremented integer ID, this is sufficient for many use cases. But what if you're planning a road trip that requires a more capable vehicle to get you there? This is where Rails 7.1 steps in with its native support for composite primary keys.
00:00:36.800
Much like having an SUV in your garage, it isn't a tool for your everyday use, but it becomes irreplaceable when dealing with complex relational data.
00:00:50.920
Hi everyone, I'm Nikita, and I'm based in Nota. I'm a Ruby on Rails developer, an open-source contributor, and a member of the Rails Triage team. I work at Shopify and am part of the Ruby on Rails infrastructure team, where I led the effort to introduce native support for composite primary keys.
00:01:08.080
We have our booth here at the conference, so come chat with us and ask any questions if you have any problems with Rails, open issues, or questions about this presentation.
00:01:27.840
In this talk, I would like to cover the fundamentals of composite primary keys, and how they can be used in multitenant applications. We will draw some parallels when it comes to sharding and implementing sharding in multitenant applications. Finally, we will conclude with a look at the Active Record 7.1 support for composite primary keys.
00:01:35.200
Let's begin with the fundamentals. Here, I'd like to discuss the definition of composite primary keys, provide a real-world use case, and consider when and why you might use them in your application. Before I begin, I want to underscore that having the capability to utilize composite primary keys does not necessitate their use for every case.
00:01:52.280
The decision to use composite primary keys should be driven by your database schema design. Once the decision is made to implement them at the database level, you can be assured that Rails will support it.
00:02:14.360
A composite primary key is a type of primary key made up of two or more columns that uniquely identify a row in a table. They play a crucial role in database schema designs, especially when it comes to representing complex relationships that cannot be represented with a single-column primary key.
00:02:38.280
For example, let's consider a student courses table identified by student ID and the course that the student is taking. Now, let's discuss real-world examples of entities that can potentially be represented by a combination of their properties.
00:02:57.560
For instance, a car entity can be identified by a combination of make, model, and year. A book can be identified by a combination of author and title. However, it's essential to recognize that while these examples highlight the utility of composite primary keys, it doesn't mean you must model these concepts in your actual application.
00:03:17.120
The decision should be based on the specifics of the data you're working on and your schema design. Composite primary keys are merely a tool in your toolbox, and like any tool, they should be used appropriately.
00:03:35.200
Now, let's turn our attention to a much more practical use case for a composite primary key: joint tables. Joint tables are a natural fit for composite primary keys. By using a composite primary key made up of two foreign keys to two related entities, we can ensure that we uniquely represent a single relationship.
00:03:47.200
For example, consider a blogging system where we typically have a blog posts table and a tags table. To represent the relationship between blog posts and tags, we will create a joint table, which represents a many-to-many relationship—where a blog post can have many tags, and a single tag can be associated with many blog posts.
00:04:03.840
In this table, the combination of tag ID and blog post ID uniquely represents an association, making it a perfect use case for a composite primary key. When defining a composite primary key, we have two choices regarding whether to place the tag ID upfront or the blog post ID.
00:04:31.720
Choosing one over the other necessitates considering the types of queries you plan on making. If you're frequently querying blog posts associated with a single tag, you will likely favor having the tag ID upfront. Conversely, if you are typically interested in tags associated with a specific blog post, you will favor having the blog post ID upfront.
00:04:50.080
To recap, joint tables are often the most natural fit for composite primary keys when they represent many-to-many relationships. However, not all concepts are suitable for composite primary keys in a relational database.
00:05:14.960
The use of composite primary keys should be driven by database schema design and not just by the fact that Rails has begun to support them. Next, let's discuss composite primary keys in multitenant applications.
00:05:39.320
In this section, we will examine Shopify’s monolith as an example of a multitenant application and observe how changing primary keys to composite primary keys on certain tables impacts performance. We will also explore the trade-offs we encountered when we switched certain tables to composite primary keys.
00:05:56.400
Before that, I'd like to remind you about one of the implicit Rails conventions: all tables by default automatically have an auto-incremented integer ID column. This means for a typical Rails application that most data is stored in the order the rows were created.
00:06:10.920
For most tables in most applications, this makes a lot of sense and simplifies the understanding for application developers. However, in the case of Shopify, there is an additional organizational layer: every record in almost every table belongs to a particular tenant, meaning every record is semantically associated with a specific online store.
00:06:36.720
This association is facilitated through the addition of a tenant key column, named shop ID, in every tenant-related table. On this slide, records are color-coded based on their association with a particular shop.
00:07:03.240
In some cases, the pattern of row access in a table can significantly differ from the pattern of record insertion. At Shopify, records are inserted using a sequential flat single-column primary key. Records belonging to multiple shops are inserted into the database very quickly.
00:07:18.680
However, when it comes to selecting records that belong to the same shop, the complexity arises due to records intermingling. For instance, switching from a single-column primary key to a composite one alters the data layout on disk.
00:07:36.640
It groups the records by their association with specific shops, simplifying selection of those records. Although, this data layout's intricacies are outside the scope of this presentation, let me illustrate this concept with a simplified analogy.
00:08:05.160
Imagine a bookshelf where books are arranged in order of purchase. Adding books is straightforward, as you can simply find the right slot and insert them. This is similar to appending records to a table with a single-column auto-incremented primary key.
00:08:31.760
However, if you need to find books by the same author, that becomes much more complicated. Even if you have a secondary index indicating each book's location, you'd still need to go through multiple shelves to retrieve them one-by-one, resulting in a time-consuming process.
00:09:06.800
Now let's consider a different approach where books are organized by author. Selecting books becomes straightforward as you just locate the shelf for that author, and all records will be on the same shelf.
00:09:22.440
While locating books is simplified, adding a new book takes more time since you need to find the appropriate section for that author. Let's extend this analogy further.
00:09:38.960
Imagine books are grouped by authors and tightly packed to save space. This organization complicates adding new books even more, as you not only need to find the correct section but may also need to rearrange books to make space.
00:10:11.120
This situation applies to tables in databases. When we applied this principle to Shopify's monolith, we observed a drastic improvement in query times. The most commonly used queries performed five to six times better.
00:10:41.760
Slow query logs reflected an 80% reduction in slow queries for tables that switched to composite primary keys. The remaining slow queries became ten times faster, and the overall time taken by slow queries also dropped.
00:11:05.440
In some extreme cases, performance jumped significantly; however, these were often for unique, low-volume queries. One notable downside was a degradation in insert performance, which we observed to be roughly ten times slower.
00:11:29.440
However, this trade-off was worthwhile as we found that most data is queried far more frequently than it is inserted. If your system has a high insertion rate, you may want to reconsider using composite primary keys for that particular table.
00:11:58.320
Let's recap: database tables in a multitenant application can benefit from using composite primary keys when the tenant key is included in the composite primary key. Switching to a composite key will likely enhance performance for bulk inserts and updates, albeit at the cost of slower insertion rates.
00:12:19.200
For applications where the speed of data insertion is critical, using composite primary keys may not be optimal. Moving forward, let's reveal the similarities between composite primary keys and tenant-based sharding.
00:12:38.960
Tenant-based sharding is a strategy employed to scale multitenant applications like Shopify. In this approach, data is partitioned by shards based on the tenant ID, resulting in data spread across multiple database instances.
00:12:59.040
It's crucial that data belonging to the same tenant resides on the same shard. In this presentation, we will cover two high-level designs of sharding.
00:13:20.080
The first design implies that the sharding solution operates within the application itself. We've already heard an example during this conference; for instance, Solid Cache uses this approach to determine which shard to read data from based on a hash key.
00:13:45.520
The second design assumes that the sharding solution is a standalone application, which connects to the database as if it were a relational database. The sharding solution itself decides which shard to query or write data to.
00:14:03.440
In this presentation, we will focus on the standalone solution. This solution imposes an essential requirement on the application: it expects the tenant key (the sharding key) to always be present in SQL queries, ensuring they avoid scattering across all shards.
00:14:21.440
You might have noticed that composite primary keys and tenant-based sharding share similar SQL requirements. They both require two or more columns to be present in SQL, whether those columns represent a unique association or a tenant key.
00:14:41.760
Thus, implementing a composite primary key may serve as a foundation for tenant-based sharding. While it's not strictly necessary, including a tenant key in the primary key allows you to route data on a per-tenant basis.
00:15:04.720
This effectively separates data belonging to different tenants. Implementing composite primary keys that include the tenant key can simplify future transitions to tenant-based sharding.
00:15:25.200
One such example of a standalone sharding solution is the one used by GitHub and Shopify, which I will not be able to cover in depth. Let's recap the similarities between models using composite primary keys and tenant-based sharding.
00:15:48.360
Both types tend to include multiple columns in queries and implement composite primary keys. These may be foundational but are not necessarily a step toward a sharding strategy.
00:16:00.800
Now, let's discuss Active Record's support for composite primary keys. In this section, I will describe how to define a model with a composite primary key, work with that model, and define a virtual primary key.
00:16:33.040
We will also reveal some limitations and recommendations for using composite primary keys. For our example, let's imagine a model called travel_route. This travel route can be uniquely identified by its origin and destination.
00:16:54.560
In this table, both columns will be part of the composite primary key. We will define this primary key using the primary key option in the create table command. The model itself can remain empty, as Rails will derive the primary key from the schema.
00:17:05.240
This makes the model behave such that the primary key will return all columns. The composite primary key ID method will return a full identifier as an array, containing all the values of the respective composite primary key columns.
00:17:40.720
The find method will start accepting the full identifier for your model, while methods like reload, update, and delete will utilize all parts of the composite primary key when performing queries.
00:18:01.520
There is a specific use case for implementing composite primary keys. This is particularly common for tables that began with a single-column primary key but are considering migrating to composites.
00:18:22.920
For example, consider a comments table with an auto-incremented ID column, but also a blog ID. For performance reasons, we might decide to create a composite primary key with two columns.
00:18:47.760
This change makes our comment model behave similarly to our earlier models, where the ID method will return a full identifier that consists of the blog ID and the auto-incremented ID populated upon creation.
00:19:04.240
However, this can conflict with the way we fetch the ID value for the model, as Rails treats the ID concept as a model identifier instead of merely an ID column accessor.
00:19:28.200
Our recommendation is to avoid naming a column ID as part of your composite primary key. Furthermore, it’s advisable to refrain from including an ID column in your table unless you intend to use it as your sole primary key.
00:19:47.760
To address this issue, Rails 7.1 introduces a new ID column attribute called ID value. This provides a way to access the value of the ID column regardless of its use or position within the composite primary key.
00:20:08.720
When it comes to bulk loading records with composite primary keys, Rails extends the `where` method to support query-by-tuple syntax, looking like a hash with keys and respective tuples of composite identifiers.
00:20:31.760
For our travel_route model, it would look like this, with the hash containing origin, destination, and unique identifiers of each route.
00:20:53.440
The new syntax is simple, and everyone should find it familiar. When associating a model with a composite primary key, Rails 7.1 introduces query constraints in every association to serve as a composite foreign key.
00:21:16.240
Expanding our example, let's create a travel_route_reviews table with route origin and destination columns that serve as a composite foreign key. To define an association of the travel route, we will establish a `belongs_to` association.
00:21:40.120
By passing query constraints in an array, we effectively instruct the association to utilize both columns as composite foreign keys. The composite primary key of the route will be derived automatically, allowing for straightforward Rails associations.
00:22:00.960
Now, if your application is not ready to change the database schema to use a composite primary key, consider this example of a simplified Shopify setup with orders and line items.
00:22:26.960
The models are quite simple: one order has many line items, and each line item belongs to an order. Every table has a shop ID that acts as the tenant key, determining the association to an online store.
00:22:51.240
To avoid changing the database schema, Rails 7.1 introduces a virtual primary key called query constraints, enabling developers to mimic the behavior of a composite primary key by defining them as a macro, specifying the columns to include in every query.
00:23:20.280
This slide illustrates that the shop ID is included in every query made by either association from both sides. Notably, we've maintained the associations without needing special configuration, as Rails can derive the query constraints when the necessary conditions are met.
00:23:47.160
Active Record 7.1 provides native support for composite primary keys. It's advisable to avoid using an ID column for anything other than a single-column primary key.
00:24:02.400
However, if you do use it or are not ready to alter your database schema, you can adopt query constraints as a virtual primary key, which is particularly useful in a multitenant environment.
00:24:18.720
As we approach the conclusion, I want to emphasize that you should adopt composite primary keys based on your database needs. Ensure its suitability for your database, and Rails will support it.
00:24:29.520
To recap key concepts, consider using composite primary keys when a separate single-column identifier is unnecessary, such as in joint tables or in multitenant applications.
00:24:46.720
Be prepared to trade off insertion rates in favor of faster query selects and updates. You should also contemplate using query constraints or a virtual primary key in a multitenant application.
00:25:02.160
This includes the tenant key in SQL queries, whether for data separation, preparing for sharding, or redefining the columns used in queries.
00:25:29.120
As a general reminder, if you run into any bugs with the feature, please report them to the GitHub issues tracker, and for discussion, proposals, or ideas, feel free to use Discord or discourse.
00:25:50.480
I would like to thank all the fantastic people who helped me develop this feature, bounce ideas, contribute documentation, and share insights. Thank you for your time, and thank you for listening.