Implementing Native Composite Primary Key Support in Rails 7.1

00:00:10.519 All righty, Nikita is a Ruby on Rails developer and an open-source contributor. He enjoys cycling and camping and is excited that the summer is almost here. Let's give him a warm welcome!

00:00:27.480 Hi everyone! This is the last talk of the day before the keynote. I'm hoping to keep you entertained. There will be things on the slide that are moving, and I'll also be sharing something I learned or did. So, let's begin!

00:00:34.040 This presentation introduces the native support for composite primary keys in Rails 7.1. This feature allows you to manage complex data relationships. This talk will explore what this feature entails, its practical applications, and how it is used in real-world applications, such as at Shopify.

00:00:49.960 A little bit about myself: I'm Nikita, a Ruby developer, and I occasionally contribute to open-source Rails projects. I'm part of the Rails Staff team and live in Ottawa, Canada, not far from here. We're participating in the Hug Days tomorrow, so please come and discuss any topics you have in mind, specifically composite primary keys or tenant-based charging.

00:01:10.040 In this talk, we'll start by defining the term 'composite primary keys' and discussing their impact on database design. We'll look into some real-world examples where composite primary keys can be effectively utilized. Next, we'll explore the role of composite primary keys in multitenant applications, using the Shopify monolith as an example of a tenant-based application.

00:01:26.479 We will draw parallels between composite primary keys and tenant-based charging, which is a strategy to scale multitenant applications. Finally, we'll examine how Rails 7.1 supports composite primary keys and what tooling it provides for developers.

00:01:40.399 Let's start with some fundamentals. We'll cover the definition of composite primary keys and look at some examples to learn when and why we might use them in our applications. Before we begin, I'd like to mention that having the capability to utilize composite primary keys doesn't mean they must be used in every scenario.

00:01:58.840 The decision to implement composite primary keys should be driven purely by your database design. However, when such design decisions are made, be assured Rails is there to support it.

00:02:06.799 Composite primary keys are types of primary keys made up of two or more columns, which together guarantee the uniqueness of a row in a table. They play a crucial role in database design, especially when representing models and relationships that cannot be accurately captured with a single-column key. Let's go through some real-world examples of entities that can be identified by a combination of their properties.

00:02:31.160 For example, in a car entity, a combination of make, model, and year can uniquely identify a car. For a book entity, a combination of author and title can distinguish each unique item.

00:02:37.159 Now, let’s consider a practical example where we can effectively implement composite primary keys. Join tables are a natural fit for composite primary keys. By using a composite primary key made up of two foreign keys to the two related entities, we can ensure the uniqueness of each relationship.

00:02:59.640 Let's consider a blogging system. In this system, we have a post table and a tags table. A single post can have multiple tags, and a single tag can be associated with multiple posts. To represent this many-to-many relationship, we would typically have a post_tags join table. The primary key for this post_tags table could be a composite primary key made up of post_id and tag_id, ensuring that each combination of post and tag is unique.

00:03:28.200 When choosing the order of columns in the composite primary key, such as tag_id, post_id versus post_id, tag_id, it's important to consider the types of queries you'll be running most frequently. In some database engines, this will impact performance on certain operations. In this section, we've learned that join tables are the most common examples for using composite primary keys.

00:03:51.680 It’s also essential to recognize that composite primary keys aren't suitable for every single table. The use of composite primary keys should be driven by database schema design.

00:04:04.120 Let's now discuss composite primary keys in multitenant applications. In this section, I'd like to talk about Shopify as an example of a multitenant application and how altering primary keys on tables can improve the performance of certain operations. We will also explore the trade-offs of using composite primary keys.

00:04:35.840 One of the Rails conventions is that all tables have an automatically incrementing integer primary key called 'id.' This means that for a typical Rails application, most data is stored on disk strictly in the order the rows were created. For most tables in a Rails application, this approach works just fine and is easy for application developers to understand.

00:04:56.800 In the case of Shopify, however, there is an additional layer of organization. Every record in almost every table at Shopify semantically belongs to a specific tenant, called a shop. This association is facilitated through the inclusion of a tenant key in every tenant-related table, known as shop_id.

00:05:15.680 In the majority of these cases, the pattern of row access can significantly differ from the pattern of insertion. This is particularly true for the Shopify monolith due to its multitenant architecture, where a single database instance contains records from multiple shops.

00:05:40.040 With a simple auto-incrementing primary key, table insertion results in a mix of records for various shops. However, when it comes to querying data, most queries typically focus on records from a single shop at a time. By prepending the tenant key to the composite primary key, we can alter the data layout on the disk, making it easier to select records belonging to the same tenant as they are now grouped together.

00:06:06.240 To illustrate why querying records in bulk by tenant becomes more optimal when we include the tenant key in our primary key, let's use a bookshelf analogy. Imagine a bookshelf where books are stored in the order they were purchased. Inserting a new book in this case is straightforward: you simply find an empty slot and place the book there. This is similar to adding a new record to the end of a table in a database.

00:06:37.079 However, if you're asked to find all books by a specific author, you would have to go through each book one by one, checking the author's name. Even if you have a secondary index available that tells you the exact location of the book, you'll still have to take time to pick them all from different shelves. This process can be time-consuming and inefficient.

00:07:08.480 Now, imagine a different scenario where the same bookshelf is organized with books grouped by authors. In this case, finding all books by a specific author becomes much quicker and easier. You simply look at the section of the bookshelf dedicated to that author, and all the books you’re looking for are right there.

00:07:19.040 However, when the bookshelf is organized by authors, inserting a new book requires a bit more work. You first need to find the section for this author before placing the book on the shelf. If the books on the shelf are tightly packed together, leaving no space between different author sections, inserting a new book becomes even more challenging.

00:08:18.300 You cannot simply add the book to the end of the section; instead, you need to make more space within the section to keep all the books by the same author together. This might involve moving several books around, which can also be time-consuming and inefficient. While the bookshelf analogy doesn't fully represent the complexity and diversity of actual database storage engines, it simplifies understanding how data organization can impact various operations.

00:08:59.680 As we discussed, choosing the data layout on the disk will impact the performance of record insertion, deletion, or updates. We made a decision to switch one of our largest tables at Shopify, called the orders table, almost five years ago. This was purely a database decision, made before Rails supported it. As a result, we saw a five-fold improvement in query performance.

00:09:47.920 However, there was one notable downside that is worth mentioning. We observed roughly a tenfold degradation in insert performance by switching to a composite primary key. Since most data is queried and updated more often than inserted, this tradeoff was acceptable in our case. Nonetheless, if insert performance is crucial for your application, using composite primary keys may not be a suitable approach.

00:10:11.600 To recap, database tables in multitenant applications often benefit from the use of composite primary keys. By including the tenant key as part of the primary key, composite primary keys may enhance queries and updates but at the cost of a slower insertion rate.

00:10:39.360 Now, let's talk about similarities with tenant-based sharding. Tenant-based sharding is a strategy employed to scale multitenant applications. In this approach, data is partitioned based on the tenant ID, resulting in application data being spread across multiple shards.

00:11:06.520 Crucially, all data belonging to the same tenant must live on the same shard. For this presentation, I would like to discuss two different designs for implementing sharding in a database system. On the left, you'll see what I would call an integrated sharding solution, where the sharding solution is part of the application. This setup offers flexibility and control but adds complexity to the application code.

00:11:54.800 On the right side is a standalone sharding solution, where the sharding solution operates separately from the application, and the application connects to it as if it were a regular database. For the purpose of this presentation, we'll focus on the standalone sharding solution.

00:12:25.200 One example is a service that we discussed during the conference, where we realized that PostgreSQL partitioning might also be a good example that puts forward similar requirements. Speaking of requirements, the standalone sharding solution does impose one key requirement on the application: the need to pass the tenant key in every SQL query.

00:12:58.960 This is similar to how a composite primary key requires every query to include all parts of the key. This approach ensures the sharding solution can correctly route each query to the appropriate shard based on the tenant key. Implementing composite primary keys can be seen as foundational, even though it's not a necessary step towards implementing tenant-based sharding.

00:13:34.240 By including a tenant key in the primary key, you are already referring to your data on a per-tenant basis, which is a core concept in sharding. This can make the transition to a sharding strategy easier and more seamless when the time comes.

00:14:08.520 Let's recap: models using composite primary keys and tenant-sharded models both include multiple columns in the queries. Composite primary keys could be a foundational step towards a full sharding strategy but are not necessarily required.

00:14:46.599 In the next section, I'd like to discuss what Active Record provides to support composite primary keys. This includes defining a composite primary key in your Rails model, working with a composite primary key model, defining a virtual primary key, and discussing limitations and best practices.

00:15:40.720 Let's imagine a table called travel routes, where each route is uniquely identified by its origin and destination. To create this table, we will use a create_table method and pass a composite primary key as an array to the primary_key option in our TravelRoute model. The primary key will be a composite key made up of both origin and destination.

00:16:05.280 This means the identifier of each record isn't a single value but a combination of two values. When we update, delete, or reload a travel route record, Rails will construct an SQL query targeting the record by using both origin and destination.

00:16:32.959 There is a special case that commonly occurs when an application decides to migrate an existing table to a composite primary key. For instance, consider a table initially using a single-column ID as a primary key.

00:16:55.920 When we want to transition this table to a composite primary key that typically includes a tenant key, such as in this example where blog_id is part of the composite primary key alongside any other common columns, it leads to a slight confusion, especially when accessing the ID column.

00:17:23.480 If you try to access the comment ID, instead of receiving a single integer value, you would receive an array representing the composite primary key. To handle this situation, we might need to explicitly separate the values of the composite primary key by extracting them into both tenant key and ID.

00:17:54.120 This issue arises because Rails treats an ID concept as an identifier rather than as an ID column accessor. Therefore, having a column named ID as part of your composite primary key complicates obtaining the value of the ID column. To address this, Rails introduces a new attribute called id_value.

00:18:32.863 On this slide, we can see how the id_value method retrieves the value of the ID column, regardless of the shape of our primary key. Let's now look at how to bulk log records identified by composite primary keys.

00:19:09.720 Rails 7.1 extends the 'where' query method to support what we call 'query by triple' syntax. This is where you can query composite primary key records in bulk by passing a hash, with the keys being a set of column names and the values as pairs of composite primary key values or identifiers.

00:19:46.560 This structure allows us to specify multiple combinations of origins and destinations that we want to query. When it comes to associating models with a composite primary key, Rails 7.1 adds a query constraints option to the association, serving as a composite foreign key.

00:20:06.880 In this case, the travel route review model will use the route's origin and destination as the composite foreign key that maps to the origin and destination of the trail route model. As we've seen previously, this allows Rails to craft SQL queries that address associated records by both their origin and destination.

00:20:54.560 Using composite primary keys, as previously pointed out, is not necessary for supporting tenant-based charging. In this slide, we'll create two related tables: orders and line items, which are expected to be used within a multitenant architecture.

00:21:04.560 Each table includes a shop column serving as a tenant key, which is crucial for data partitioning and ensures data isolation between tenants. Active Record 7.1 introduces a concept of a virtual primary key called query constraints, which we can use as a macro, specifying all columns we would like to include when querying, updating, and deleting records.

00:21:39.680 As we saw previously, this macro allows the model to behave as though it uses a composite primary key without modifying the schema. You might have noticed we didn't need any special setup apart from the macro on the model; this is because Rails can derive query constraints on associations.

00:21:54.560 When model query constraints include two columns, one of which is the primary key of the model, and the other one is the tenant key, this makes it easy to manage tenant keys along with IDs. In conclusion, Active Record 7.1 natively supports composite primary keys, where the ID column is a part of the composite primary key.

00:22:30.720 The complexity introduced by this can be mitigated by utilizing the query constraints option as a virtual primary key. This is especially useful for tenant-based charging strategies.

00:22:54.720 Please use composite primary keys based on your database needs. Ensure it's suitable, and remember that Rails will support it. Consider using composite primary keys when having a separate single-column identifier is unnecessary, as in join tables and multitenant applications.

00:23:07.560 With many bulk reads and updates scoped per single tenant, you should consider using query constraints to ensure you include your tenant key in all your SQL queries. As a final reminder, the Rails Core Team reserves the GitHub issue tracker for bug discussions, while proposals are better suited for Discord or Discourse.

00:23:58.000 I’d like to thank all these great people who helped me review and implement parts of this feature and continue to improve it. Thank you!