Keeping Throughput High on Green Saturday

by Alex Reiff

The video titled "Keeping Throughput High on Green Saturday," presented by Alex Reiff at RailsConf 2019, focuses on the strategies employed by Weedmaps to maintain high throughput on one of its busiest days, "Green Saturday," which aligns with the cannabis celebration on April 20th. This presentation outlines how lessons learned from previous years' traffic spikes led to systematic improvements in API performance.

Key Points Discussed:
- Overview of Weedmaps: The largest technology company dedicated to the cannabis industry, offering various software solutions for different market verticals.
- Discovery API: The core focus of the presentation is the discovery API, which retrieves content for Weedmaps’ website and apps and interfaces with Elasticsearch to provide quick responses.
- Traffic Spike on Green Saturday: Each year, Weedmaps experiences its highest traffic on April 20th. In 2016, the company's systems struggled under the load, prompting improvements to their infrastructure and process management.
- Improvements Over the Years: Improvements started from splitting database reads/writes to implementing containerization and optimizing Elasticsearch performance, leading up to the 2019 strategy.
- Micro Caching Implementation: Alex introduced micro caching, a technique where web responses are cached for a short period to enhance performance, along with using NGINX for proxy caching.
- Geolocation Data Handling: Focused on optimizing how geolocation data is processed, including discussions on rounding coordinates to reduce request variability and improve caching effectiveness.
- Caching Strategy Development: Developed a caching strategy to replace synchronous queries with background job processing, enhancing user experience.
- Performance Testing: Emphasis on benchmarking and verifying performance changes to catch potential slowdowns before they impact users.
- Achievements on Green Saturday 2019: On that day, Weedmaps processed 100,000 requests per minute with an average latency of 80 milliseconds and a 9% cache hit rate, demonstrating the effectiveness of their optimizations.

Conclusions and Takeaways:
- Efficient caching policy and system architecture lead to improved application performance.
- Optimizing data handling and reducing external requests can significantly enhance user experience during peak loads.
- Continuous testing and post-implementation performance verification are vital to maintaining high throughput.

Overall, the presentation provides valuable insights into leveraging infrastructure and application optimizations for handling increased user traffic during key business events.

00:00:27.200 Welcome to room 101 II. My name is Alex Reiff and I am elated to be here! This is just my second time coming to RailsConf and my first time speaking at any conference, so it's super exciting. Please go easy on me.

00:00:35.010 I am from a little company called Weedmaps. You may have heard of us; we are the largest technology company solely focused on the cannabis industry. We have solutions for retailers, growers, wholesalers, and consumers. Pretty much every market vertical in the cannabis industry has some kind of software-as-a-service from us.

00:00:41.670 Today, I'm going to talk about a big day at Weedmaps and how we kept throughput high on Green Saturday. My team focuses on what we call our Discovery API. It's a read-only API that serves most of the content on our front end, so the website and our native apps hit the Discovery API to get their contact. Its data source is Elasticsearch, which gives us quick reads and a simple horizontal scaling mechanism when pressure gets too high on the cluster.

00:01:01.440 We source Elasticsearch from our queue pipeline, which we've implemented called Dabit. Dabit is a layer on top of RabbitMQ, so messages are put on the queue by our core application, which is mostly our business owners working with the admin panel in CMS. This core application hosts most of the business logic and maintains the source data in Postgres.

00:01:17.160 If you want to hear more about Postgres and how we do it, come to Matt Zimsky and Craig Booch's talk tomorrow, as they've got stuff about Postgres, Active Record, and all that good stuff. So, data makes its way from core on to RabbitMQ, then to the Discovery API, into Elasticsearch, and then people surrounding a cloud of smoke get that data.

00:01:27.469 If that setup sounds interesting to you at all and you want to come talk about it more, come see us at our booth in the exhibit hall. We're definitely hiring! Now, the Discovery API has three main areas of focus. Firstly, it determines where I am. Secondly, it determines the retailers and delivery services, along with cannabis doctors near me. Finally, it shows me their products and the best deals I can find on them.

00:01:40.140 In short, when people are searching for green, they are using the services that my team is responsible for. There is one particular day when all kinds of people are searching for green. If you've read the conference program, you probably read about Green Saturday, and you might be wondering if that is a real thing. Well, kind of.

00:02:05.610 This is the second to last weekend in April. A lot is going on on Friday. We observe the first night of Passover, witness the miracle of the burning bush, and remember that on Easter Sunday we commemorate Jesus Christ ascending to a higher plane. But on that Saturday, two days before Earth Day, there is another holiday what I'm calling Green Saturday.

00:02:20.280 Now, I have an asterisk there because it's not always on Saturday, but it definitely always falls on this date, April 20th. You may know it colloquially as 4/20, the day we celebrate the cannabis movement, all the progress made in access and criminal justice reform, and a day when people consume a lot of cannabis.

00:02:35.130 To fulfill that load, they need to find it, and to find it, they come to Weedmaps. So on April 20th, Weedmaps experiences our highest traffic spike of the year, and it consistently elevates year over year.

00:02:47.940 I've been at Weedmaps since 2016. You can tell by my limited edition early run 'Get High' shirt; you can't get those anymore. So, 2016 was my first year at Weedmaps, and when 4/20 rolled around, I was expecting a party. Unfortunately, that party was over GoToMeeting because our database on the core application was deadlocked within a few hours of the start of business. Needless to say, that was a dark day in Weedmaps history.

00:03:11.250 From there, we became serious about hardening our systems. We brought DevOps in-house and started a hiring spree that is still ongoing today. Every year since then, we have made improvements. In 2017, we split our Postgres reads and writes from our core application because our end-user traffic was very, very read-heavy, which alleviated pressure off our master node.

00:03:29.000 No more deadlocks! Good stuff! By 2017, we also had a new v2 API backed by Elasticsearch. It was used by a few clients for a few routes and not much else, but it did help ease some pressure off the core application. Despite a little shakiness, we stayed up in 2017, which was a good year. By 2018, California had gone recreational, and we needed to scale significantly to manage that.

00:03:57.400 We containerized our applications and put them in Docker, using a system called Rancher to orchestrate Docker containers. This allows us to scale quickly and change configurations as needed. By 2018, most of the traffic was directed towards this v2 API, which had evolved into the Discovery API. However, it was still using the original small Elasticsearch cluster we had set up back in the day, and it was encountering stability issues.

00:04:20.270 So we decided to upgrade to the latest version of Elasticsearch at the time, version six, and more importantly, we thoroughly tuned our index configuration to best match the data we had. 2018 4/20 was a Friday, and it was one of the quieter Fridays at Weedmaps.

00:04:43.000 Fast forward to 2019: it's been three years at Weedmaps, my white shirt is now threadbare and faded, and our traffic baseline has doubled compared to a year ago. We had a few high-traffic events, and although things were a little uncertain, we pondered what low-hanging infrastructure changes we could make. What could we address? What could we tweak in our DevOps?

00:05:12.080 Obviously, Kubernetes was not that talk. We got serious about managing our requests. Here is a view of the Weedmaps homepage, where you can see two main routes that power the main content: /location and /brands. The /location route receives the bulk of traffic and has three main phases.

00:05:28.710 1. It determines the user's location based on geolocation coordinates provided by the device. 2. It maps that location to one or more sales regions, depending on what services are legal and available in the area. 3. Finally, it pulls up the businesses' advertisements in those sales regions.

00:05:47.310 It's debatable whether a single API should serve all of that information, and it's an ongoing push and pull with our frontend clients, something backend engineers can relate to. However, generally, this data does not change with extremely high frequency, so we considered caching lists, all while ensuring a good experience for business owners looking to update their pages.

00:06:11.670 For example, they may have a new logo or perhaps they are trading under a new name, or they might want to quickly display some new advertising. The compromise we found is using micro-caching. Micro-caching is a strategy where you cache the entire web response for a brief period. In this case, 'micro' refers to your time to live, not the cache payload.

00:06:27.240 For organizations like us, with heavy requests, the cached payload can be substantial. You might have heard about microdosing; this is the opposite, taking a small dose of something that lasts for a long time. With micro-caching, we take a large payload and hold it for a shorter amount of time.

00:06:44.010 We love using Nginx at Weedmaps, and we have it enabled at many layers of our application stack. Nginx makes this caching very easy. As you can see in line two, we use a proxy cache path directive stored by default in the file system, specifying a key zone. In this case, I am calling it 'my_cache' and I say I can hold 150 million cache keys.

00:07:02.280 There are other configurations as well; here I'm indicating that I only want to retain 200 megabytes before I begin purging old caches. Scrolling down to line eight, you’ll find our location block where we proxy Puma or Unicorn, or whatever app server is used. Slightly below this is where we set our proxy settings.

00:07:18.880 We only wish to capture status 200 response codes and retain them for half a minute—30 seconds. The proxy cache key is where I define the unique aspects of my request. In our case, this is pretty complex. We have some routes based on GeoIP and various device parameters, but this is just a simplified example using the standard Nginx variables: hostname, request URI, and authorization header.

00:07:36.860 So, when users are logged in, their unique content does not get cached. But we are running in Docker and using a local file system, so we have many instances of this local cache, and they cannot be shared, meaning our cache hit rate will not be great.

00:07:55.580 Enter OpenResty! OpenResty builds upon Nginx adding some gourmet features. It includes a low-level API alongside libraries to simplify developing with that language.

00:08:06.140 It also comes with a series of Nginx modules allowing it to communicate with non-HTTP upstreams like Memcache. Just like we configure our Rails cache instance for Memcache to store our cache values across different application instances, we can achieve the same with Nginx using OpenResty.

00:08:19.500 After rolling out OpenResty and adding micro-caching, the results were still not great—we achieved about a five or six percent cache hit ratio. We don't have many types of requests, but the variation is quite significant. This is primarily because we rely on mobile devices to send a user's coordinates, which are often far too precise.

00:08:39.720 Mobile devices can log coordinates with up to 12 decimal points, which is highly unnecessary for most applications. To further illustrate, here's a chart that I borrowed from Wikipedia, illustrating distance correlated to decimal precision.

00:08:58.600 As you can see, below the fifth decimal point we are at sub-meter precision—probably more than enough accuracy! For example, the location of our Weedmaps afterparty is at the Aria on First Street, just a few blocks from the Convention Center. I encourage you to join us tomorrow!

00:09:15.580 Google tells me that this coordinate is approximately 44.98454° North latitude and -93.268504° West longitude. That's quite a mix of numbers! Perhaps we can drop it down to four decimal places and still remain within the same block.

00:09:32.420 So, what if we reduce it even to two decimal points? That would place us a few blocks away. One point, however, might be too far—perhaps we can find a sweet spot between two and five, depending on the use case.

00:09:49.800 In our application, we have a `DryStruct` that models coordinates. Props to the Dry-Rb team; we value and utilize a lot of their tools in the Discovery API. We can easily call a rounding method within our struct to determine coordinate precision.

00:10:06.070 We extract the coordinates from the parameters, round them accordingly, and pass them into the query. However, if we do this in Rails, we are already behind the micro-cache, so it doesn’t enhance our cache rates.

00:10:23.860 But as we have our micro-cache set up in Nginx, we began considering re-writing the request URI in a way that would allow it to leverage the caching more effectively. That's precisely what we implemented using a Lua plug-in at the API gateway that manages all our API services.

00:10:41.520 So, the Discovery API, which I was talking about, operates through this Elixir service, proxying requests using Kong, which interestingly is OpenResty. Kong’s strength is its dynamic routing and a robust plugin architecture, operating similarly to Rails middleware with hooks to modify the request and response at different handling stages.

00:10:58.300 Now, I must confess, Lua isn't all that fun. So if anyone feels inspired, it would be fantastic if someone could create Crystal bindings for Nginx. So, going back—before rolling out our plug-in, we were seeing cache rates around 5-6%.

00:11:12.960 After implementing the plug-in and enabling it on our Discovery API routes, our cache hit rate improved significantly to 9-10%. I'll take that! So now, Nginx is handling a few thousand requests, giving Rails a bit of relief.

00:11:30.160 However, we can’t take our foot off the pedal just yet, as we still need to process thousands more location requests. It was time for us to examine our route controller. We utilize New Relic and Twilio as our application monitoring tool to acquire detailed traces of transactions.

00:11:48.900 It identified that our regional query consistently ranked as the slowest operation. We then enabled Elasticsearch's slow query logs, which confirmed this suspicion.

00:12:00.100 Of the three main components addressed within the route—the geolocation, region determination, and pulling in advertisements—we chose to focus on region determination.

00:12:20.150 Fortunately, our sales team does not frequently modify region boundaries once established. Therefore, given a coordinate, it is improbable that the region will alter minute-to-minute or even day-to-day.

00:12:31.080 Thus, we can afford to cache region information a bit longer—no need for micro-caching here, this case calls for about a ten-minute cache duration.

00:12:47.940 Here’s that rounding snippet again; we extract the coordinates, pass them to the struct for rounding, and use Rails cache.fetch methodology, submitting our cache key along with a set time to live of ten minutes.

00:13:03.850 Realistically, a duration of ten hours might have sufficed, but we opted for ten minutes upon expiration.

00:13:18.580 If the cache key isn’t present or the previous value has expired, we execute the query again, establishing what is termed a read-through cache—a standard approach most people associate with caching.

00:13:35.930 Yet, there exists an alternative cache strategy: consider a write-behind cache. When the cache key expires, instead of updating during the fetch and returning new data, the old cache value is returned while a background worker is queued to retrieve and store new data.

00:13:48.470 This was something we implemented ourselves, as illustrated in the set method on line seven. Traditionally, when caching, you'll pass time-to-live as parameters. However, in this instance, we store our duration as an expiration parameter in the cache payload.

00:14:05.680 You can see line eight & nine where we set up our cache payload. Upon fetching the cache, we first check the expiration.

00:14:20.340 If we are still before the expiration date, we return the payload. However, if we have surpassed that date, we must refresh while returning it. That refresh triggers a background worker to fetch our API data.

00:14:38.600 As seen on line 27, we employ a simple active job responsible for saving data back to the cache. Our additional goal is to shift any upstream latency spike away from the end user during synchronous web requests.

00:14:57.640 Instead of encountering a slow API, the users interact with a slower Sidekiq worker.

00:15:17.720 In this scenario, our upstream service would be Elasticsearch, where we store all our region data.

00:15:36.280 Several conditions can cause latency spikes with Elasticsearch. One issue stems from bursts of writes during high read loads. Imagine on 4/20 at noon; shops usually open at 8 AM and rapidly run out of stock.

00:15:58.470 To rectify this, they activate the sync button on their POS systems, effectively writing thousands of menu items at the same time. A few items might observe slight spikes in latency, but there's more.

00:16:12.540 This issue is exacerbated by persistently running costly read queries. Elasticsearch permits document joinders, establishing parent-child document relationships. We use this approach to separate region document metadata from geometry.

00:16:27.130 While it was convenient for those Delta updates, expressing a query could be quite convoluted. You could query while trying to find a region with child documents fitting the geometry corresponding to a specific coordinate.

00:16:43.200 Admittedly, it’s cumbersome if you've worked with the Elasticsearch DSL; it often involves deeply nested structures, which isn't an ideal experience.

00:17:01.850 So, what did we do? We re-evaluated our indexing strategy. We consolidated the two documents into a single structure. Using Elastic's painless scripting interface, we created tools to maintain those Delta updates.

00:17:18.530 Post implementation, we found outstanding performance results. Our peak response time before the upgrade was 45 milliseconds for the location route, dropping down to about 30 milliseconds after deploying our changes.

00:17:38.090 That’s a 33% improvement—precisely what we were hoping for! Awesome! So, are there any other areas we could similarly enhance?

00:17:58.700 The answer is yes! Recall the other route on the homepage for brands and categories. Previously, the homepage displayed brand logos instead of product cards, necessitating joining documents, which we have now avoided.

00:18:18.860 Even though we learned our lesson, by collaborating with our product team, we devised a strategy to link old and new queries, effectively generating both responses while eliminating the joins.

00:18:36.920 Set against the backdrop of our earlier improvements, we anticipated excellent results. But alas, the outcome was unexpected. The big spike, perceived as Elasticsearch’s fault, was indeed a side effect of our code.

00:18:54.020 When we began using the non-joined query results to build legacy responses, we ended up parsing that Elasticsearch response twice over, effectively doubling CPU time.

00:19:11.520 So, how did we resolve this? Simple: memoization was the answer. But we quickly recognized the root problem: we neglected to test the performance of the changes.

00:19:27.940 We made assumptions on how the modifications would affect performance without validating them. Once we conducted proper tests, we confirmed our latency returned to expected levels.

00:19:43.000 So, for two routes, we were enhancing our implementation for Green Saturday on 4/20! Our excitement was palpable! Yet, the most critical performance test is how production performs on that big day. Anyone curious about our results?

00:20:03.150 During peak times, we served 100,000 requests per minute through our Kong API gateway. A whopping 53% of those requests traversed the Discovery API, with 20% pertaining to location requests.

00:20:23.700 All our meticulous micro-tuning truly paid off, resulting in a respectable 9% cache hit rate—equating to 5,000 requests sourced from the cache. Thank you, Nginx, for contributing to this success!

00:20:39.980 This traffic marked nearly triple our typical Saturday throughput, with an average latency maintained below 100 milliseconds, averaging around 80 milliseconds during peak times.

00:20:54.540 Indeed, our uptime during this remarkable day was 100%.

00:21:09.000 So, to recap, what strategies contributed to our victory? We cached a significant amount of data at various service layers. You should evaluate whether your users always require the latest information straight from your source of truth.

00:21:24.960 If your application runs a public website, there are likely several scenarios where caching your web responses, even temporarily, is viable.

00:21:42.930 Consider tuning user inputs to align better with your application’s context. The various sensors and signals available in our smartphones can provide an overly precise set of data.

00:21:59.920 Make sure to limit any external requests influencing your user’s response times. It’s disheartening to witness spikes in latency on your performance monitor, only to trace them back to external services.

00:22:18.810 When possible, redirect these external requests into background jobs to persist data for your API service, or leverage cron jobs to refresh data at regular intervals.

00:22:35.060 In short, consider what data is critical to your users and how frequently they require it.

00:22:53.110 Please ensure your schema aligns with your queries. This principle applies to any database you may use: Elasticsearch, Postgres, MySQL, or others. Often as your application evolves, your initial index configuration may need to be reassessed.

00:23:06.720 You may discover new fields requiring filtering or unexpected joins emerging in your data operations.

00:23:22.580 Always revisit the latest documentation for your database, as updates often clarify existing functionalities, reveal new features, and highlight previously obscure gotchas.

00:23:42.600 Don't get discouraged! Lastly, it's critical to benchmark when implementing improvements within your application.

00:23:59.320 It’s human nature to rush into production upon resolving issues for expedited benefits, but investing time to validate performance tests is essential.

00:24:18.800 Many tools can simulate load, such as Apache Benchmark, JMeter, or those we employ at Weedmaps.

00:24:35.940 Perform tests in your master branch to establish baselines, then compare results when accessing your feature branches. This way, you might catch potential issues before impacting application performance.

00:24:53.110 To summarize, the key to achieving high throughput lies in granting your Rails applications and databases a breather. Take advantage of the services available to you; it is likely that your proxy layers or Elasticache stores are underutilized.