00:00:21.080
Thank you so much for being here. I'm really excited to speak to you all on Day One of RailsConf 2023. We all made it to the end of the day, well done. Today, I'm going to share the story of how we migrated from MySQL2 to Trilogy in Shopify's largest Rails codebase.
00:00:36.480
Before we get started, let me introduce myself. My name is Adrianna Chang, and I am a senior software engineer at Shopify. Shopify is an e-commerce company where we offer a variety of products for online and in-person retail. I currently work on a team called Scalable Rails Apps. Our job is to ensure that Shopify’s Rails applications are easy to build, maintain, and refactor. We make sure that our apps use Rails the right way and that Rails continues to evolve to meet Shopify's needs.
00:01:16.080
I am based in Ottawa, Ontario, Canada, and it is finally starting to get warm here, which I'm very happy about. I started working with Rails at Shopify in 2016, and I am currently a member of the Rails Triage Team. I am really passionate about open source software development and making the Ruby and Rails communities more diverse and inclusive.
00:01:34.740
Speaking of diversity, I currently help run the WB.rb meetup. WB.rb is a Ruby community for women and non-binary individuals. We hold virtual meetups once a month, and if you're interested, please come talk to me about that later. Outside of work, I enjoy lifting heavy things and spending time outdoors, especially with my dog. I am a proud dog mom to Jasper, a Rottweiler who often hangs upside down like a baby.
00:01:53.399
But we’re not here to talk about my dog today, as much as I would like to. Instead, we’re going to discuss Trilogy. If you’ve never heard of Trilogy before, don’t worry; we will begin with an overview of the Trilogy database client. I'll talk about some of the advantages of Trilogy and a few caveats to switching.
00:02:13.500
Next, we’ll get our hands dirty with some code for Trilogy. We’ll discuss the MySQL protocol and I’ll show you how we implemented a feature in Trilogy that was necessary for our Rails monolith. Finally, we’ll go over the changes we made to our codebase to switch from MySQL2 to Trilogy, and how we deployed this large change to production. I’ll share the results we saw after switching, and there’s going to be a special surprise announcement at the end, so stay tuned.
00:02:57.000
As we tackle this topic today, we’ll use the analogy of climbing a mountain to discuss the journey we embarked on when transitioning from MySQL2 to Trilogy at Shopify. Our goal was to run Trilogy in production, which we can think of as our summit. However, the path was not very clear. We were uncertain about what incompatibilities might exist between MySQL2 and Trilogy, or what risks we needed to consider. Where do we even start?
00:03:41.640
It became apparent that before we could start our trek up the mountain, we needed to understand the lay of the land. So that's where we’ll begin our adventure today. Let's get started with talking about the Trilogy database client. Trilogy is a MySQL database client with Ruby bindings. A client is any program that can communicate with a MySQL server and issue commands.
00:04:13.080
Trilogy was open-sourced by GitHub in 2022. Matthew Draper from GitHub wrote a detailed blog post about it last year, which is linked at the bottom of the slide. Please give it a read if you’re interested in Trilogy. What’s cool about Trilogy is that it uses a custom implementation of the MySQL network protocol.
00:04:35.520
What does this mean? It means that Trilogy defines its own low-level API for communicating with a MySQL server. Most client programs rely on libmysqlclient, a C library distributed with MySQL, to connect and interact with a server. For instance, MySQL2, a very popular database client gem with Ruby bindings, must link against the MySQL client library. However, Trilogy does not depend on this.
00:05:02.040
Trilogy supports the most frequently used parts of the protocol, including the authentication handshake, which we will discuss more later, and the query and command execution. Overall, Trilogy is designed for flexibility, performance, and ease of embedding.
00:05:39.000
So, why might you want to switch to Trilogy? As we discussed, Trilogy has minimal dependencies required for compilation. It does not require the libmysqlclient or an equivalent C library, which leads to a few advantages: first, it simplifies installation, as you won't need to install and link against the MySQL client library or deal with compiler flags. It’s quite easy to get Trilogy up and running.
00:06:01.039
Additionally, it helps eliminate client-server version mismatch issues. Using different MySQL client versions compared to your MySQL server version can lead to problems, but removing libmysqlclient from the equation simplifies things significantly. This also minimizes the number of times data has to be copied when handling network packets, improving memory usage.
00:06:34.740
Trilogy is also designed to perform efficiently within the context of the Ruby virtual machine (VM). It is mindful of its use of dynamic memory allocation, and the API is designed to utilize non-blocking operations and I/O callbacks whenever possible.
00:07:02.040
However, switching to Trilogy does come with some caveats. First, there are still a few incompatibilities with MySQL8 related to features such as authentication plugins. If your application is already on MySQL8, this might be something to consider. Secondly, Trilogy may not be as feature-complete as MySQL2 and libmysqlclient, which means it might not be suitable for applications with a lot of custom database configurations.
00:07:35.100
Moreover, since Trilogy was freshly open-sourced, its documentation is a work in progress and may not be as extensive as other client library options. For those looking for an open source project to contribute to, however, consider checking out Trilogy.
00:08:11.430
So, we’re all at RailsConf, so let’s talk about Trilogy and Rails. We know that Active Record is Rails’s Object-Relational Mapping (ORM) library, or the library responsible for all the database interactions. GitHub released an Active Record adapter for Trilogy alongside the client code, but it is external to Rails, meaning it needs to be installed separately. However, setting up Trilogy in your Rails application is super simple.
00:08:54.300
In your Gemfile, you will need to install both the database client gem and the gem for the Active Record library. Then, in your database.yml or by using the DATABASE_URL environment variable, you just set the adapter to Trilogy. That’s pretty much all it takes.
00:09:31.560
Now, why did Shopify decide to make the switch to Trilogy? We aimed for a few things: First, a better developer experience, since we were experiencing a lot of compilation headaches with MySQL2 and its dependency on libmysqlclient, particularly on macOS. Our hope was that Trilogy would greatly simplify the process of booting a Rails app locally with a MySQL database. We were also looking for improved speed and query performance, as Trilogy claimed to be efficient within the context of the Ruby VM.
00:10:23.400
Furthermore, we were eager to move to a library with a strong sense of maintainership. Trilogy has several active contributors, and we had Shopify engineers who previously worked on the library at GitHub and could offer their support throughout the process. We had even loftier targets in mind: Aaron Patterson and Eileen Uchitelle were the original advocates for open-sourcing Trilogy during their time at GitHub. As part of Shopify’s Ruby and Rails team, they believed that if we adopted Trilogy, we could make a case to upstream the adapter to Rails.
00:11:09.000
This would ensure even stronger maintainership for Trilogy and prevent it from deviating from Rails standards. Therefore, we felt really excited about the impact we could have by adopting Trilogy in our monolith and potentially making it a Rails community standard.
00:11:59.160
Now that we’ve established a foundation, let’s take those first steps up the mountain by looking at a new feature we needed to add to the Trilogy client in order to facilitate our adoption. Before discussing the feature itself, we need to go over the MySQL client-server protocol. In essence, this means understanding how a client communicates with a MySQL database.
00:12:32.520
The server listens for connections on a TCP/IP port or through a local socket. When a client connects to the server, it must first go through an authentication phase to issue commands. This phase typically works like this: First, the server checks if the client's host is allowed to connect. If it is, the server sends back a handshake or greeting packet. The client then replies with an authentication packet containing a credential string, which includes the username, a scrambled password if applicable, and optionally an initial database to connect to.
00:13:20.540
The server verifies this information and responds with either an OK packet (indicating successful authentication) or an error if something went wrong. If the authentication is successful, the client can then begin issuing queries to the server in what is known as the query phase.
00:14:08.580
During this phase, the client sends a command packet to the server, which contains a command code to specify the intended command. Various commands can be issued; for example, a COM_PING command asks the server to respond with an OK packet if it is alive and reachable, while a COM_QUERY command requests the server to execute a given query.
00:14:34.560
For instance, a COM_QUIT command tells the server that the client wants to close the connection. I have linked to the MySQL documentation if you’re interested in seeing other types of commands that can be sent to the server.
00:15:17.520
Returning to MySQL protocol, the server will respond with various types of packets, which can cover a range of outcomes: an OK packet or an end of file, indicating success; an error if something went wrong; or a result set for query commands.
00:15:59.880
It’s also important to discuss protocol capabilities and flags. During the authentication phase, the client and server exchange information about their capabilities via a bitmask. For instance, a client connecting with DB implies that the initial database can be specified during authentication, while CLIENT_SSL indicates the client will employ SSL encryption for the session.
00:16:35.460
Another useful flag is CLIENT_MULTI_STATEMENTS, which shows that the client may send more than one statement per query. Multi-statement support is a feature we needed to implement in Trilogy, and we’ll dive deeper into it shortly.
00:17:12.979
Those of you who have used MySQL before might know that it supports the execution of a single string containing multiple SQL statements, separated by semicolons. For example, a client can pass a whole string to execute, and the server will handle it if the client supports multi-statement queries. To facilitate this, the client must communicate its multi-statement capabilities during the authentication phase.
00:17:58.740
By default, the multi-statement flag is not set. However, the client can also enable multi-statements after establishing a connection using a COM_SET_OPTION command. This flag indicates that multi-statements are supported.
00:18:33.780
So, it turns out when we evaluated Trilogy, we found that it did not implement multi-statement support. You might wonder why this is significant. For Rails applications, being able to send multi-statement queries can reduce database round trips for performance reasons. However, at Shopify, we were particularly interested in multi-statement support for bulk fixture inserts. Fixtures are Rails's default way of preparing and reusing test data.
00:19:11.520
Using multi-statements allows us to insert multiple fixtures into the database at once, which is more efficient. At Shopify, we have about a thousand fixture sets, some containing hundreds of individual fixtures. So, we were eager to leverage the multi-statement capabilities.
00:19:57.060
To see how the MySQL2 adapter utilizes multi-statements for bulk fixture insertion, the MySQL2 adapter implements a method called multi_statements_enabled that verifies if multi-statements are enabled by checking the adapter configuration. It also has a with_multi_statements method that guarantees a block runs in a multi-statement context; if multi-statements are already enabled, it yields the block directly. If not, it will set the multi-statement option on the server for the block's duration.
00:20:37.200
Now, focusing on the AbstractAdapter class, this class implements the insert_fixture_set method shared by adapters, requiring them to implement with_multi_statements to provide multi-statement functionality, which can be overwritten in a concrete adapter class to utilize this behavior. For instance, execute_batch in the MySQL2 adapter combines the fixture insertion statements into the fewest possible SQL strings, building up chunks of SQL until reaching the Max Allowed Packet value.
00:21:34.620
Max Allowed Packet refers to the maximum number of bytes permitted in a statement sent to the server. When utilizing multi-statement queries in MySQL, all results must be processed before the connection becomes usable again. Abandoned results discard all results to enable the connection to continue being functional without processing the elements.
00:22:13.920
We’ve now outlined how fixtures leverage multi-statement capabilities through the MySQL2 adapter. Let’s shift our focus to implementing multi-statements in Trilogy. Below, you’ll see the header file for Trilogy’s low-level protocol API. Don’t be alarmed if the syntax looks unfamiliar; the critical part is that Trilogy defines an enum with all client protocol flags. The value for the multi-statements flag was already encoded in the library, but it just wasn't being utilized.
00:23:00.640
As previously noted, multi-statement capabilities can be set both during connection initialization and afterwards by sending a COM_SET_OPTION command. Regarding the first requirement, enabling multi-statements when initializing new clients, I want to give a huge shout out to Part, who is present in the room today. He has contributed significantly to this work.
00:23:41.520
When calling Trilogy.new, the client constructor processes an options hash, looking for the multi-statement key. If it’s set, we then add the multi-statements flag to the capabilities bitmask. Now, regarding handling multi-statement results, we have split the API into two parts that mirror the MySQL2 implementation. The first function is more_results_exist, which checks if there are more results to a multi-result set. The second function, next_result, fetches the next result from the query response.
00:24:28.860
Both rely on a 'more results exist' flag included in the response packet from the server. This flag indicates whether additional results can be read on the result set. On the Ruby side, we can use the API as follows: if multi-statements are enabled and we send a multi-statement query, we can continuously call next_result to process additional results as long as more results exist.
00:25:17.200
At this point, we’ve addressed the initial requirement, but we still need to enable multi-statements option after the connection has already been set up. This change required us to modify the Trilogy C API since at the time, there was no way to send a COM_SET_OPTION command from the Trilogy client.
00:26:02.280
To resolve this, we had to build a 'set option' API into Trilogy itself and, as previously mentioned, Ruby interfaces with the Trilogy C API through a C extension. With these adjustments in mind, commands and clients are split into send and receive functions, allowing callers to wait for I/O readiness.
00:26:41.560
The set option command was then separate into two parts: a send function and a receipt function. The send function builds a set option packet, and we will take a closer look at this method shortly. Essentially, this method constructs a packet for the COM_SET_OPTION command using the server option being set. Given that MySQL’s protocol restricts client options to two—multi-statement on and multi-statement off—this method is relatively straightforward.
00:27:26.520
Now, we move on to the receive function, which serves to read the server's response packet into our connection buffer, handling the outcome. We are almost finished connecting everything; we now have a method to set server options on C side, so we must repeat this process from Ruby.
00:28:07.680
Thus, we will expose a set_server_option method on the Ruby side that calls the Trilogy set option send method we discussed earlier with the specified option. Once the COM_SET_OPTION command is sent, we wait for the server to respond via Trilogy set option received. The remainder of this function is about handling response checks and returning true for success or managing timeouts and errors.
00:28:47.520
Now that we've laid out the groundwork, we can initialize a client without the multi-statement option. However, we can later invoke the set server option method and enable multi-statement capabilities before we send a multi-statement query to the server.
00:29:37.200
Now, shifting back to Rails, we will apply the newly added multi-statement behavior in the Active Record adapter. The code for the Trilogy adapter is quite similar to the MySQL2 adapter. We can check whether multi-statements are enabled by inspecting the configuration. Then, we can use our newly created server option method to enable multi-statement capabilities for our block.
00:30:16.740
Here’s a refresher on how fixture insertion typically works in Rails: the AbstractAdapter implements the insert_fixtures_set method, and adapters should activate with_multi_statements to enable the relevant behavior. The execute_batch method can then be overridden to leverage the multi-statement capability.
00:30:53.880
The Trilogy adapter’s execute_batch function closely resembles the MySQL2 implementation; the main distinction is that Trilogy does not implement abandoned results. There are reasons elaborated in the repo for this, so if you're curious, check those out. Instead of relying on abandoned results, we had to iterate through the result set to restore the connection to a usable state.
00:31:36.720
Having examined how we implemented the multi-statement features in the Trilogy database client and the Active Record adapter, let's return to our journey up the mountain. After building multi-statement support in Trilogy, we were prepared to adopt it in our monolith. Spoiler alert: aside from some minor API changes, migrating from MySQL2 to Trilogy was relatively straightforward—Trilogy is regarded as a drop-in replacement.
00:32:12.420
If you're contemplating moving to Trilogy in an existing Rails application using MySQL2, here are the steps we undertook and some considerations to keep in mind. Step one is to configure your application to use Trilogy. We required the installation of both the Active Record adapter gem and the gem for the Trilogy client. We were utilizing edge versions of both, but feel free to point to the current stable release if you prefer.
00:33:10.860
In our case, we were on edge because we were deploying some changes and wanted to traffic those changes on our branch. You must also set the adapter to 'Trilogy' in the database.yml file or specify the DATABASE_URL environment variable, based on your database setup. That’s all it typically takes to get Trilogy configured and running in your application!
00:34:00.120
The next step is to address API differences when interacting with the database client or client results directly. Most of the time, we navigate via Active Record APIs, so it should be uncommon to directly engage with the client. However, if there are instances where you do so, you’ll need to be aware of some subtle API changes. For instance, consider a method executing custom SQL on the database connection that iterates over results.
00:34:55.020
With MySQL2, calling `execute` with the query returns a MySQL2 result object, allowing enumeration using each_as_hash. For Trilogy, you instead receive a Trilogy result object, which requires you to enumerate it using each_hash.
00:35:31.440
Here’s another straightforward example: MySQL2 has a query_options method that presents the options passed in when initializing the client. We used this method to extract the host specified for the current connection as our cache key. Trilogy, on the other hand, has a query_flags method reflecting options at query time, including casting options. However, this query_flags concept is distinct from MySQL2's query_options, as Trilogy’s query_flags lack a connection options correlation.
00:36:18.120
We had to add a method to Trilogy to expose options passed during connect time without naming it query_options to avoid confusion with the existing query_flags method. Instead, we opted for connect_options; if previously using query_options on the MySQL2 client, you will now need to use connect_options with Trilogy.
00:37:04.680
Finally, we needed to address discrepancies in error classes. This is relatively simple: if you were rescuing specific MySQL2 error classes, you need to change these to use the corresponding Trilogy error class instead, which is called TrilogyError. Trilogy handles comparable errors to MySQL2; for example, both libraries have errors for timeouts and connection errors, allowing for easy mapping.
00:37:48.840
One key difference exists in connection-related error handling between MySQL2 and Trilogy. In MySQL2, if the database is down, it raises a connection_not_established error. This is exemplified by a test in our monolith that asserts connection not established when simulating the database being down. In contrast, with Trilogy, it opts to close the connection client-side when unable to interact with the server. Consequently, if the database is down, Trilogy will raise a Trilogy_closed connection error.
00:38:38.520
The Trilogy adapter translates these closed connection errors to Active Record's connection_failed error, so your Rails application will receive it as an ActiveRecordConnectionFailed error instead of ActiveRecordConnectionNotEstablished. If testing database-related issues, ensure to update your test assertions accordingly when shifting to Trilogy. Remember to validate and rescue for both connection_not_established and connection_failed.
00:39:22.560
At this point, our CI was green, and the summit was in sight! We felt excited to test in production, yet cautious about deploying such a significant change. We had to formulate a strategy for a safe rollout.
00:40:03.900
Before delving into our deployment process for Trilogy to production, let’s expand our perspective and discuss the Rails application we were migrating. Most of Shopify's e-commerce software resides in a Rails app called Shopify Core, a modular monolith. To give you some context regarding the load our database infrastructure supports, over Black Friday and Cyber Monday 2022, our MySQL fleet handled up to 14 million queries per second at peak.
00:40:47.520
This is a simplified view of our database infrastructure, and it will be relevant later on, so stay tuned. Shopify Core operates as a horizontally-partitioned Rails application, with a fleet of MySQL instances running on Google Cloud Platform managed by Chef.
00:41:37.200
Our MySQL instances are distributed across several regions, and we utilize ProxySQL. ProxySQL is a protocol-aware proxy that sits between MySQL clients and servers, essentially multiplexing around a hundred thousand client connections from our Rails application to approximately 2,000 backend connections to our MySQL fleet.
00:42:24.180
Now, let’s discuss how we deployed Trilogy to production. We began by running the Trilogy database client with one percent of production traffic. To achieve this, we exported an environment variable for one percent of our pods, configuring the database adapter in our database.yaml according to whether this environment variable was set.
00:43:10.620
The experimental group was the environment variable we exported, setting it to Trilogy client for one percent of our pods. This adjustment required us to support both MySQL2 and Trilogy in our application code, meaning wherever we directly interacted with the client API, we had to check the connection class and accommodate the different APIs.
00:43:55.680
One question we had was how to manage our tests. We didn’t want a separate CI pipeline for Trilogy, considering our extensive CI suite would be too costly. We also aimed to maintain MySQL2 test coverage since it would still represent the majority of production usage. Initially, we debated running a half-and-half CI worker setup—one half with Trilogy and the other with MySQL2—however, we were concerned a failing test on one worker might get re-enqueued with another worker, allowing it to pass under a different adapter, making debugging difficult.
00:44:44.520
Ultimately, we decided to maintain all tests with MySQL2 while running our Trilogy branch in CI daily until we were ready for a 100% deployment.
00:45:44.760
The one percent rollout proceeded smoothly without any issues arising. Feeling confident, we ramped it up to trial Trilogy with fifty percent of our production traffic.
00:46:23.760
However, shortly after deployment, we began to witness elevated error rates. We observed a substantial spike in the number of client connections on our primary shard, which was alarming, leading to a service disruption. As a result, we had to implement a revert.
00:47:09.840
While the revert was being deployed, we collaborated with Shopify’s incident response team to investigate the issue. A member of our database platform team helpfully suggested a potential cause.
00:47:49.680
Returning to ProxySQL, it manages a pool of backend connections that are shared among client connections from the application. However, when two clients feature differing protocol capabilities, ProxySQL needs to execute a COM_CHANGE_USER command to set the new connection options every time it shifts backend connections.
00:48:33.120
So, if Trilogy and MySQL2 had different protocol capabilities, the frequent COM_CHANGE_USER commands increased load on our primary shard, causing the database overload and subsequent errors.
00:49:11.160
To validate this theory and assess the differences between Trilogy and MySQL2’s connection options, we were able to query ProxySQL's connection list, which provided insight into the client flags set across different connections.
00:49:59.760
We discovered that MySQL2 enabled the CLIENT_MULTI_STATEMENTS flag by default, while Trilogy did not. This flag indicates that the client can send multi-statement queries and receive multiple result sets; however, multi-results only signal that a client can handle multiple results. The inclusion of CLIENT_MULTI_STATEMENTS implies multi-results, but not vice versa.
00:50:39.060
The fix was comparatively straightforward: we needed to add that multi-results flag into the default bitmask applied to the client connection, resolving the issue.
00:51:17.460
Finally, we were prepared to trial Trilogy again—this time at 100% capacity on a single node.
00:51:55.920
We initiated the rollout for 100% on one of our European clusters, double-checking that the protocol capability flags matched between Trilogy and MySQL2 to allow ProxySQL the capacity to share backend connections without conjunction.
00:52:41.840
Everything appeared in order. We gathered metrics to monitor performance. This graph depicts request times; MySQL2 averaged 3.46 milliseconds per request while Trilogy averaged 2.7 milliseconds, roughly a 22% improvement. Likewise, when analyzing the query time, MySQL2 required around 1.43 milliseconds for each query, while Trilogy executed its queries at approximately 1.24 milliseconds, marking about a 17% faster response.
00:53:27.480
Encouraged by these outcomes, we continued to run Trilogy at 100% on one of our European clusters for 24 hours, after which we decided to proceed with an overall rollout.
00:54:14.040
This marked the moment at which we implemented all of our testing adjustments; our complete test suite would now engage with Trilogy as opposed to MySQL2. The rollout was met with success and encountered no obstacles along the way.
00:55:00.660
That brings us to the conclusion of our journey regarding the migration from MySQL2 to Trilogy at Shopify. There were challenges throughout, mainly due to the complexity and magnitude of the application being migrated. For the most part, as I mentioned, Trilogy should serve as a drop-in replacement for MySQL2.
00:55:45.840
Yet the payoff of reaching our summit was definitely worthwhile. At the outset, I noted how difficult it can be to embark on such a journey because the path is unclear. I hope this information allows you to navigate through a similar transition, learning from our experiences along the way.
00:56:32.160
Now, onto the special surprise announcement: we successfully managed to upstream Trilogy to Rails. With both GitHub and Shopify running Trilogy in production, we advocated for its inclusion as an alternative to MySQL2 in Rails 7.1.
00:57:27.120
Now, developers will not need to install a separate dependency for the Trilogy adapter as it will be shipped with Rails instead.
00:58:19.680
As next steps, you might want to give Trilogy a try in a new Rails application, or you can explore migrating an existing application from MySQL2 to Trilogy using the strategies we discussed today. Additionally, consider contributing to the Trilogy client or the adapter now integrated into Rails. Or just showcase your new MySQL knowledge to colleagues.
00:58:58.200
Thank you so much for attending my talk. I'd also like to extend a heartfelt thank you to all the incredible individuals who assisted in Shopify’s journey to adopt Trilogy; we could not have accomplished this without their efforts. If you'd like to discuss Trilogy further, feel free to approach me or tweet at me. My social media handles are displayed on the slide.