Implementing Native Composite Primary Key Support in Rails 7.1

Talks

Nikita Vasilevsky

@nvasilevski

#ruby-on-rails

#activerecord

#composite-primary-keys

#multitenancy

#database-sharding

#database-performance

#rails-7-1

Implementing Native Composite Primary Key Support in Rails 7.1

by Nikita Vasilevsky

In the presentation titled Implementing Native Composite Primary Key Support in Rails 7.1, Nikita Vasilevsky, a developer at Shopify and member of the Rails triage team, explores the introduction of composite primary keys in Rails version 7.1. He illustrates how composite primary keys, which consist of two or more columns, enhance the ability to uniquely identify rows in a table and represent complex relationships that a single-column primary key cannot.Essentially, while a single-column primary key is effective for many applications, composite primary keys become essential for complex relational database scenarios.

Key Points Discussed:
- The Definition of Composite Primary Keys:
- Composite primary keys consist of multiple columns that uniquely identify a record.
- Example use cases include joint tables, such as linking blog posts to tags, demonstrating a many-to-many relationship where the combination of foreign keys serves as a composite primary key.
- Adaptation in Multitenant Applications:
- Benefits for applications like Shopify, where each record is tied to a tenant (e.g., a specific online store).
- Performance enhancements when switching from single-column to composite keys, leading to faster query times, though potentially slower insert rates.
- Examples and Performance Insights:
- Real-life performance improvements at Shopify exhibited a drastic enhancement in query times by up to six times, while highlighting trade-offs in insertion speed.
- Sharding and Composite Keys:
- Composite primary keys can also facilitate tenant-based sharding by ensuring tenant data remains grouped, thus simplifying data management across multiple shards.
- Active Record Support:
- Introduction of new Active Record features in Rails 7.1 to define and utilize composite primary keys, including query constraints and virtual primary keys.

Conclusions and Takeaways:
- Composite primary keys should be considered based on the specific needs of the database design rather than default use.
- They are beneficial in joint tables and multitenant applications where they can enhance performance but may lead to slower insert operations.
- Developers are encouraged to utilize features introduced in Rails 7.1 to streamline their applications while being mindful of the data access patterns in their domain.

This talk provides a comprehensive overview of the significance of the 'ID' concept in Rails and how composite primary keys can lead to more efficient data handling when designed correctly. By making informed decisions regarding schema design, developers can leverage the new capabilities offered by Rails 7.1 effectively. Nikita encourages attendees to report issues and engage with the community for further developments.

00:00:16.279 Let's begin by considering traditional Rails.

00:00:20.279 A single-column primary key can be thought of as your favorite bicycle—simple, effective, and perfect for day-to-day commutes.

00:00:25.400 Much like an incremented integer ID, this is sufficient for many use cases. But what if you're planning a road trip that requires a more capable vehicle to get you there? This is where Rails 7.1 steps in with its native support for composite primary keys.

00:00:36.800 Much like having an SUV in your garage, it isn't a tool for your everyday use, but it becomes irreplaceable when dealing with complex relational data.

00:00:50.920 Hi everyone, I'm Nikita, and I'm based in Nota. I'm a Ruby on Rails developer, an open-source contributor, and a member of the Rails Triage team. I work at Shopify and am part of the Ruby on Rails infrastructure team, where I led the effort to introduce native support for composite primary keys.

00:01:08.080 We have our booth here at the conference, so come chat with us and ask any questions if you have any problems with Rails, open issues, or questions about this presentation.

00:01:27.840 In this talk, I would like to cover the fundamentals of composite primary keys, and how they can be used in multitenant applications. We will draw some parallels when it comes to sharding and implementing sharding in multitenant applications. Finally, we will conclude with a look at the Active Record 7.1 support for composite primary keys.

00:01:35.200 Let's begin with the fundamentals. Here, I'd like to discuss the definition of composite primary keys, provide a real-world use case, and consider when and why you might use them in your application. Before I begin, I want to underscore that having the capability to utilize composite primary keys does not necessitate their use for every case.

00:01:52.280 The decision to use composite primary keys should be driven by your database schema design. Once the decision is made to implement them at the database level, you can be assured that Rails will support it.

00:02:14.360 A composite primary key is a type of primary key made up of two or more columns that uniquely identify a row in a table. They play a crucial role in database schema designs, especially when it comes to representing complex relationships that cannot be represented with a single-column primary key.

00:02:38.280 For example, let's consider a student courses table identified by student ID and the course that the student is taking. Now, let's discuss real-world examples of entities that can potentially be represented by a combination of their properties.

00:02:57.560 For instance, a car entity can be identified by a combination of make, model, and year. A book can be identified by a combination of author and title. However, it's essential to recognize that while these examples highlight the utility of composite primary keys, it doesn't mean you must model these concepts in your actual application.

00:03:17.120 The decision should be based on the specifics of the data you're working on and your schema design. Composite primary keys are merely a tool in your toolbox, and like any tool, they should be used appropriately.

00:03:35.200 Now, let's turn our attention to a much more practical use case for a composite primary key: joint tables. Joint tables are a natural fit for composite primary keys. By using a composite primary key made up of two foreign keys to two related entities, we can ensure that we uniquely represent a single relationship.

00:03:47.200 For example, consider a blogging system where we typically have a blog posts table and a tags table. To represent the relationship between blog posts and tags, we will create a joint table, which represents a many-to-many relationship—where a blog post can have many tags, and a single tag can be associated with many blog posts.

00:04:03.840 In this table, the combination of tag ID and blog post ID uniquely represents an association, making it a perfect use case for a composite primary key. When defining a composite primary key, we have two choices regarding whether to place the tag ID upfront or the blog post ID.

00:04:31.720 Choosing one over the other necessitates considering the types of queries you plan on making. If you're frequently querying blog posts associated with a single tag, you will likely favor having the tag ID upfront. Conversely, if you are typically interested in tags associated with a specific blog post, you will favor having the blog post ID upfront.

00:04:50.080 To recap, joint tables are often the most natural fit for composite primary keys when they represent many-to-many relationships. However, not all concepts are suitable for composite primary keys in a relational database.

00:05:14.960 The use of composite primary keys should be driven by database schema design and not just by the fact that Rails has begun to support them. Next, let's discuss composite primary keys in multitenant applications.

00:05:39.320 In this section, we will examine Shopify’s monolith as an example of a multitenant application and observe how changing primary keys to composite primary keys on certain tables impacts performance. We will also explore the trade-offs we encountered when we switched certain tables to composite primary keys.

00:05:56.400 Before that, I'd like to remind you about one of the implicit Rails conventions: all tables by default automatically have an auto-incremented integer ID column. This means for a typical Rails application that most data is stored in the order the rows were created.

00:06:10.920 For most tables in most applications, this makes a lot of sense and simplifies the understanding for application developers. However, in the case of Shopify, there is an additional organizational layer: every record in almost every table belongs to a particular tenant, meaning every record is semantically associated with a specific online store.

00:06:36.720 This association is facilitated through the addition of a tenant key column, named shop ID, in every tenant-related table. On this slide, records are color-coded based on their association with a particular shop.

00:07:03.240 In some cases, the pattern of row access in a table can significantly differ from the pattern of record insertion. At Shopify, records are inserted using a sequential flat single-column primary key. Records belonging to multiple shops are inserted into the database very quickly.

00:07:18.680 However, when it comes to selecting records that belong to the same shop, the complexity arises due to records intermingling. For instance, switching from a single-column primary key to a composite one alters the data layout on disk.

00:07:36.640 It groups the records by their association with specific shops, simplifying selection of those records. Although, this data layout's intricacies are outside the scope of this presentation, let me illustrate this concept with a simplified analogy.

00:08:05.160 Imagine a bookshelf where books are arranged in order of purchase. Adding books is straightforward, as you can simply find the right slot and insert them. This is similar to appending records to a table with a single-column auto-incremented primary key.

00:08:31.760 However, if you need to find books by the same author, that becomes much more complicated. Even if you have a secondary index indicating each book's location, you'd still need to go through multiple shelves to retrieve them one-by-one, resulting in a time-consuming process.

00:09:06.800 Now let's consider a different approach where books are organized by author. Selecting books becomes straightforward as you just locate the shelf for that author, and all records will be on the same shelf.

00:09:22.440 While locating books is simplified, adding a new book takes more time since you need to find the appropriate section for that author. Let's extend this analogy further.

00:09:38.960 Imagine books are grouped by authors and tightly packed to save space. This organization complicates adding new books even more, as you not only need to find the correct section but may also need to rearrange books to make space.

00:10:11.120 This situation applies to tables in databases. When we applied this principle to Shopify's monolith, we observed a drastic improvement in query times. The most commonly used queries performed five to six times better.

00:10:41.760 Slow query logs reflected an 80% reduction in slow queries for tables that switched to composite primary keys. The remaining slow queries became ten times faster, and the overall time taken by slow queries also dropped.

00:11:05.440 In some extreme cases, performance jumped significantly; however, these were often for unique, low-volume queries. One notable downside was a degradation in insert performance, which we observed to be roughly ten times slower.

00:11:29.440 However, this trade-off was worthwhile as we found that most data is queried far more frequently than it is inserted. If your system has a high insertion rate, you may want to reconsider using composite primary keys for that particular table.

00:11:58.320 Let's recap: database tables in a multitenant application can benefit from using composite primary keys when the tenant key is included in the composite primary key. Switching to a composite key will likely enhance performance for bulk inserts and updates, albeit at the cost of slower insertion rates.

00:12:19.200 For applications where the speed of data insertion is critical, using composite primary keys may not be optimal. Moving forward, let's reveal the similarities between composite primary keys and tenant-based sharding.

00:12:38.960 Tenant-based sharding is a strategy employed to scale multitenant applications like Shopify. In this approach, data is partitioned by shards based on the tenant ID, resulting in data spread across multiple database instances.

00:12:59.040 It's crucial that data belonging to the same tenant resides on the same shard. In this presentation, we will cover two high-level designs of sharding.

00:13:20.080 The first design implies that the sharding solution operates within the application itself. We've already heard an example during this conference; for instance, Solid Cache uses this approach to determine which shard to read data from based on a hash key.

00:13:45.520 The second design assumes that the sharding solution is a standalone application, which connects to the database as if it were a relational database. The sharding solution itself decides which shard to query or write data to.

00:14:03.440 In this presentation, we will focus on the standalone solution. This solution imposes an essential requirement on the application: it expects the tenant key (the sharding key) to always be present in SQL queries, ensuring they avoid scattering across all shards.

00:14:21.440 You might have noticed that composite primary keys and tenant-based sharding share similar SQL requirements. They both require two or more columns to be present in SQL, whether those columns represent a unique association or a tenant key.

00:14:41.760 Thus, implementing a composite primary key may serve as a foundation for tenant-based sharding. While it's not strictly necessary, including a tenant key in the primary key allows you to route data on a per-tenant basis.

00:15:04.720 This effectively separates data belonging to different tenants. Implementing composite primary keys that include the tenant key can simplify future transitions to tenant-based sharding.

00:15:25.200 One such example of a standalone sharding solution is the one used by GitHub and Shopify, which I will not be able to cover in depth. Let's recap the similarities between models using composite primary keys and tenant-based sharding.

00:15:48.360 Both types tend to include multiple columns in queries and implement composite primary keys. These may be foundational but are not necessarily a step toward a sharding strategy.

00:16:00.800 Now, let's discuss Active Record's support for composite primary keys. In this section, I will describe how to define a model with a composite primary key, work with that model, and define a virtual primary key.

00:16:33.040 We will also reveal some limitations and recommendations for using composite primary keys. For our example, let's imagine a model called travel_route. This travel route can be uniquely identified by its origin and destination.

00:16:54.560 In this table, both columns will be part of the composite primary key. We will define this primary key using the primary key option in the create table command. The model itself can remain empty, as Rails will derive the primary key from the schema.

00:17:05.240 This makes the model behave such that the primary key will return all columns. The composite primary key ID method will return a full identifier as an array, containing all the values of the respective composite primary key columns.

00:17:40.720 The find method will start accepting the full identifier for your model, while methods like reload, update, and delete will utilize all parts of the composite primary key when performing queries.

00:18:01.520 There is a specific use case for implementing composite primary keys. This is particularly common for tables that began with a single-column primary key but are considering migrating to composites.

00:18:22.920 For example, consider a comments table with an auto-incremented ID column, but also a blog ID. For performance reasons, we might decide to create a composite primary key with two columns.

00:18:47.760 This change makes our comment model behave similarly to our earlier models, where the ID method will return a full identifier that consists of the blog ID and the auto-incremented ID populated upon creation.

00:19:04.240 However, this can conflict with the way we fetch the ID value for the model, as Rails treats the ID concept as a model identifier instead of merely an ID column accessor.

00:19:28.200 Our recommendation is to avoid naming a column ID as part of your composite primary key. Furthermore, it’s advisable to refrain from including an ID column in your table unless you intend to use it as your sole primary key.

00:19:47.760 To address this issue, Rails 7.1 introduces a new ID column attribute called ID value. This provides a way to access the value of the ID column regardless of its use or position within the composite primary key.

00:20:08.720 When it comes to bulk loading records with composite primary keys, Rails extends the `where` method to support query-by-tuple syntax, looking like a hash with keys and respective tuples of composite identifiers.

00:20:31.760 For our travel_route model, it would look like this, with the hash containing origin, destination, and unique identifiers of each route.

00:20:53.440 The new syntax is simple, and everyone should find it familiar. When associating a model with a composite primary key, Rails 7.1 introduces query constraints in every association to serve as a composite foreign key.

00:21:16.240 Expanding our example, let's create a travel_route_reviews table with route origin and destination columns that serve as a composite foreign key. To define an association of the travel route, we will establish a `belongs_to` association.

00:21:40.120 By passing query constraints in an array, we effectively instruct the association to utilize both columns as composite foreign keys. The composite primary key of the route will be derived automatically, allowing for straightforward Rails associations.

00:22:00.960 Now, if your application is not ready to change the database schema to use a composite primary key, consider this example of a simplified Shopify setup with orders and line items.

00:22:26.960 The models are quite simple: one order has many line items, and each line item belongs to an order. Every table has a shop ID that acts as the tenant key, determining the association to an online store.

00:22:51.240 To avoid changing the database schema, Rails 7.1 introduces a virtual primary key called query constraints, enabling developers to mimic the behavior of a composite primary key by defining them as a macro, specifying the columns to include in every query.

00:23:20.280 This slide illustrates that the shop ID is included in every query made by either association from both sides. Notably, we've maintained the associations without needing special configuration, as Rails can derive the query constraints when the necessary conditions are met.

00:23:47.160 Active Record 7.1 provides native support for composite primary keys. It's advisable to avoid using an ID column for anything other than a single-column primary key.

00:24:02.400 However, if you do use it or are not ready to alter your database schema, you can adopt query constraints as a virtual primary key, which is particularly useful in a multitenant environment.

00:24:18.720 As we approach the conclusion, I want to emphasize that you should adopt composite primary keys based on your database needs. Ensure its suitability for your database, and Rails will support it.

00:24:29.520 To recap key concepts, consider using composite primary keys when a separate single-column identifier is unnecessary, such as in joint tables or in multitenant applications.

00:24:46.720 Be prepared to trade off insertion rates in favor of faster query selects and updates. You should also contemplate using query constraints or a virtual primary key in a multitenant application.

00:25:02.160 This includes the tenant key in SQL queries, whether for data separation, preparing for sharding, or redefining the columns used in queries.

00:25:29.120 As a general reminder, if you run into any bugs with the feature, please report them to the GitHub issues tracker, and for discussion, proposals, or ideas, feel free to use Discord or discourse.

00:25:50.480 I would like to thank all the fantastic people who helped me develop this feature, bounce ideas, contribute documentation, and share insights. Thank you for your time, and thank you for listening.

Rails World 2023