00:00:08.280
Oh okay, so um yeah, I'm Mustafa. Thank you for being here. I'm Mustafa, and I'm an engineer on the storage team at Instacart. I am particularly interested in anything related to storage, but mainly focus on connection pools and connection pooling.
00:00:14.320
I'm also a big fan of Niagara Falls. I got to visit the American side several times, and I hear the Canadian side has the better view. So, I'm excited to visit that. However, I'm actually here to tell you about Instacart: our story with Rails and Postgres, and some of the things that we learned along the way.
00:00:30.039
Instacart was actually started about 12 years ago with the idea that you should be able to get your groceries delivered to your doorstep within an hour. We began with Rails 3.1 and one instance of Postgres 9.1. Today, we are running tens of Rails applications and hundreds of Postgres instances. You can see from the number of Rails applications that we have a monolith hidden somewhere among these ten. But yes, we have a lot of Rails applications.
00:01:00.280
My talk today is about going beyond a single database, but I want to spend some time appreciating the nice and beautiful world of a single instance of Postgres. When you start adding instances to your system, you're introducing complexity. You need to think about load balancing, routing and switching, and you also need to consider sharding. There’s a lot of performance you can squeeze out of modern Postgres, such as connection pooling. For example, you can scale up your database by adding more cores and CPUs.
00:01:39.560
You can also improve your caching, indexing, and refactoring, amongst many other low-hanging fruits that you can reap. Before making the jump to a more complex setup, make sure you have examined these avenues and have considered your query patterns. This consideration will inform your decision about which direction to take. Query patterns will be a recurring theme in this talk. Always try to find ways to understand your query patterns. We used PG Hero extensively; it’s written in Ruby and provides a lot of information about your database, including your top queries.
00:02:26.959
It’s based on PG Stat statements and can tell you if you have more reads or writes, what your top queries are, and generally speaking, your route to scaling your Postgres is going to be nonlinear. For Instacart, we went down the route of single instance replicas and then primaries. For you, it might be different; you might have a clear tenant separation from the start, making sharding a sensible approach right away. So, make sure the patterns guide these decisions.
00:02:57.000
Before we leave this slide, I must acknowledge that this is a poorly generated AI image; it's your legally mandated image for the presentation. I'm not going to do more of those, but since it’s 2024, we have to have them. Back to Instacart. Like I said, we started with one primary and then added a replica. It made sense for us to add a replica, and when you add a replica, one of the first sources of complexity you'll face is routing.
00:03:20.720
Deciding where your query should go—should it go to the primary or the replica—is relatively difficult. The second complication you’ll encounter is replication lag because data does not move instantaneously from the primary to the replica. You need to manage that and take it into consideration. So let’s talk about routing. In technical terminology, it's often referred to as switching. You might think it’s easy because Rails solved this problem years ago, allowing you to route your queries using a simple pattern, but remember when we started Instacart, which was around 12 years ago, with Rails 3.1, it had no native support for that kind of functionality.
00:04:16.760
When we began thinking about adding a replica, this feature was still not part of Rails, so we sought help from MAA. MAA is actually the mythical creature at the bottom, not the MAA we typically refer to; rather, it’s this gem created by the folks at Utask Rabbit. The description is concise and directly to the point; it clearly describes its functionality, and it’s a very clever piece of code. So, if you are using basic Rails, for each entry in your database.yml file, Rails will create an Active Record connection pool for it.
00:05:05.759
Even if you're using shards, each primary will get its own connection pool. If you’re using only Postgres, this pool is created by Postgres adapter objects. You can verify this by checking your console. You'll see the connection pools there. These adapters act as an interface between Active Record and Postgres, containing the Postgres connection object that talks the Postgres protocol. What MAA did was a bit different; they wrapped multiple Postgres adapters. This means they didn’t just have one adapter; instead, their adapter wrapped multiple adapters, with each interacting with one instance in your cluster.
00:06:03.680
The adapter itself, which connects to Active Record, gathers all the method calls and decides where to route them based on the methods and their parameters. This enabled MAA to perform interesting functions, as it had visibility over the entire cluster and could manage load balancing, failover, and automatic switching. We utilized MAA for an extended period, but recently began phasing it out for several reasons. When you check out a connection from the Active Record pool, Rails performs a health check, and MAA, being overly eager, attempted to health check every connection within the adapter.
00:06:49.680
In total, we had four health checks being sent for each checkout, which can be excessive. On a good day, this setup worked, but we faced several outages, particularly with the primary. In those cases, the health check would take longer, pushing the entire request past the deadline, resulting in a timeout. However, notice that this request didn't require the primary; it only needed a replica, so the primary had no reason to interfere.
00:07:50.600
Thus, this was one problem we faced. The second approach is simply to send all queries to the primary if you do not have an explicit router for the routing situation. This is the safe approach, but it again introduces the issue of becoming dependent on the primary being functional, which was also unacceptable for us. The third problem was that MAA had an extensive surface area with about 400 methods, and every time we tried to fix edge cases, it became exceptionally challenging to work with. Our solution was to completely rewrite it.
00:08:13.700
Our new approach is lazy, meaning that if you don’t need to connect to the primary, then you simply don’t. This is precisely what Rails does today. If you're using vanilla Rails without any of these fancy add-ons, this is the behavior you would expect. However, we didn’t have this feature when we started. Additionally, instead of wrapping the Postgres adapter, we wrapped the Postgres connection object, which operates on a very manageable surface area of methods, allowing us to implement all these methods manually.
00:09:59.480
Even though we moved away from MAA, we inherited a lot of its routing rules, including the auto-switching rules. We find these rules extremely useful for scaling our systems. The first automatic switching rule is to send any read queries outside of a transaction to a replica by default. While this might not be the safest option, it has proven beneficial. For instance, it reduces the load on your primary server.
00:10:35.880
If you implement a particular feature using this approach, two years down the line, if your primary becomes overloaded and you need to query data, you may not know if those queries were sent to the primary by mistake or intentionally due to developer choices. Defaulting such reads to replicas has been incredibly helpful for us. Regarding writes, we always direct them to the primary, meaning we do not open transactions against our replicas. We found this to work well because our replicas are set behind a non-transaction mode connection pooler.
00:11:15.760
Transaction mode connection poolers favor queries conducted outside of transactions as it allows the pooler to optimize, so the pooler consolidates queries from multiple clients on a server connection back to back, avoiding idle time. This is what Postgres thrives on, allowing you to utilize fewer connections efficiently. Another aspect most people are aware of is replication lag. When you write to a primary database and then attempt to read from a replica, there is a chance that you might see data that is temporarily unavailable due to lag.
00:12:01.360
To mitigate this, we decide to maintain all reads to the primary database after writes. Therefore, after executing a write transaction, the subsequent query will also be directed to the primary. This helps us avoid application lag. Rails supports this feature; however, we have not utilized it in our implementation, finding our current strategy sufficient for handling latency.
00:12:59.840
Finally, if you need consistent reads, you should utilize the primary or require transactional guarantees through a transaction, though careful moderation is essential. As you increase the load on your primary, you will lose the benefits associated with adding replicas. Previously, adding replicas might seem a straightforward solution, but it is advisable to step back and analyze if you have considered any low-hanging fruits or if you truly understand your query patterns before proceeding with such an addition.
00:13:26.680
For us, adding more replicas was effective because we were using MAA, which managed the connections well. However, using vanilla Rails makes it necessary to rely on your infrastructure to assist with this, which is entirely feasible with connection poolers. It is critical to always employ a connection pooler in your production environment.
00:14:04.880
Using PG Bouncer, starting from version 1.17, it introduced a feature allowing you to configure multiple hosts for load balancing. When PG Bouncer creates a new connection for the pool, it rounds randomly among the listed hosts, allowing for a mosaic of connections within your pool. However, in the event of a failover, things could become problematic. The new contender for load balancing is PG Cat. Full disclosure: I am one of its maintainers, which might color my opinions. We utilize PG Cat at Instacart, as it’s written in Rust and boasts first-class support for load balancing.
00:15:06.080
PG Cat offers two means of load balancing: random and least outstanding connections. We have observed that random is often more effective than round robin during failovers. To implement, you can point your Rails configurations to the replica URLs associated with the pools you create, which can structure your architecture accordingly.
00:15:47.200
However, regardless of the load balancing solution you choose, certain challenges commonly arise. On a good day, you would expect perfect load balancing across your instances. You may have three equally sized Postgres instances and desire to see traffic evenly distributed among them. On a bad day, if one replica fails, you would want to observe traffic shifting to the operational instances while seeing minimal increases in application latency. Ideally, a replica failover should be a seamless event, one that does not disrupt service or require paging anyone.
00:16:24.320
After a failover, the load balancing should recover. This situation was particularly troublesome for us when using PG Bouncer, which does not claim to function as a load balancer. Thus, we would often have to reboot PG Bouncer to regain optimal balance. Consequently, observing load patterns, failover events, and recovery times is crucial to ensuring your application isn't overwhelmed with errors.
00:17:19.600
Now, let's discuss failover, which occurs when one database becomes inoperative, and traffic must be diverted to another. To understand standard connection pooling, consider how the process works with Postgres. When a client attempts to check out a connection from a pool and the pool's connections are busy, the client may experience a timeout and receive an error due to the overloaded connections.
00:18:27.679
In PG Cat, if a replica begins to lag, that becomes a signal indicating that the replica should be avoided. The traffic should shift away from it. On the other hand, PG Bouncer merely disconnects clients with query timeout errors. Regarding MAA’s approach, it would ban a replica once it detected a disconnection.
00:19:15.760
It's important to realize that the response to a failing replica relies on these timeout settings. If your connection timeout or query timeout is set excessively high, you risk indefinite wait times for a connection that won't come. Thus, it's crucial to maintain appropriate timeout settings to ensure normal operations.
00:20:01.640
This wraps up the discussion surrounding the lifetime of replicas. We also explored how to distribute workloads across the different primary databases. However, when COVID-19 struck, we faced significant operational challenges, prompting us to fast-track our largest primary into a sharding architecture.
00:20:46.720
Sharding is the operation of splitting portions of your records across different instances, thereby preventing a single database from becoming a bottleneck. This process can be labor-intensive as it necessitates a careful, orchestrated cutover to create replication between the old unsharded system and the newly sharded system, which must remain active during the transition to prevent data loss.
00:21:57.599
Implementing sharding typically requires temporarily blocking rights to the old system while monitoring replication status closely to ensure data accuracy. You may also need to prepare for reverse replication in case issues arise. Ultimately, you can route traffic to your new sharded system with minimal downtime.
00:23:10.080
There are two primary types of sharding: vertical and horizontal. Vertical sharding involves moving entire tables to their own instances, while horizontal sharding separates rows of a single table based on a sharding key. The primary key ensures that data maps effectively to the correct instance while avoiding operations that span multiple databases, like cross-table joins.
00:24:48.080
PG Query is an advanced tool we utilize for handling queries efficiently, making it easier to determine which tables are involved in a query. This facilitates easier segmentation between different shards. On the other hand, nested transactions, which are not viable because an inner transaction can successfully commit while an outer transaction fails, further complicate the state management.
00:26:10.960
As we conclude our discussion, remember that the sharding landscape is still evolving. A key takeaway is to ensure your sharding key is present in most of your workloads to avoid complications down the line. Let’s return to the beautiful AI-generated image as a closing thought.