Forget Scaling: Focus on Performance

00:00:12.259 Thank you. Hello. As Richard said, I'm Terence Lee, known as hono2 on Twitter. I wear this blue hat and hold a bunch of titles.

00:00:20.220 The first title is leader of the League of Blue Hats. I wear blue hats, and this is Jessica Settles, who has a picture wearing a blue hat. This is Chris Kelly from New Relic, who photoshopped the hat on himself because it's a pretty cool League.

00:00:33.719 I am from Austin, Texas. There, I'm known as the Chief Taco Officer. I believe tacos are the best thing in Austin. If you're from LA and think tacos are amazing there, you really need to come to Austin. If you're in Austin, I will personally take you out for tacos if I'm in town since I think they're pretty awesome.

00:00:39.300 I work on Bundler, like Richard mentioned, and I also work for a small company called Heroku. We optimize for developer happiness, which is one of the goals of our platform.

00:00:49.740 I'm sure you've seen 'git push heroku master' to deploy. You just type 'git push heroku master' and it deploys your app to the platform. Right now, we are running a little over three million apps on the platform.

00:01:02.780 I am part of the Ruby task force, which is the Ruby product team aimed at delivering the best Ruby experience we can on our platform. We're focused on maintaining the build pack, working on community open-source projects like Bundler, Rails, and more, as well as helping out in the community.

00:01:13.119 If you experience problems, please come talk to us. We would be more than happy to listen to your issues. We'll have a booth today and tomorrow, and we also have Ruby task force meetings.

00:01:25.080 I also work with a really cool guy, a co-worker of mine who created Ruby, and I'm really happy about that. But enough with the introductions—today, I'm going to talk about performance.

00:01:36.600 So let's forget about scaling for a little bit. Any app that is well prepared can be scaled much more easily. By thinking about some of this upfront, we can concentrate more on scaling later, as our app will be set up to accommodate that.

00:01:45.480 In this talk, we will focus on the performance of a single app on a single process initially. The side effect of this is that it makes the app easier to scale.

00:01:52.379 The first thing I want to address is the difference between speed versus throughput when discussing performance. When I talk about speed, I'm referring to response times—how fast your application returns a page for a single request.

00:02:08.119 Throughput, on the other hand, concerns the rate of performance in terms of bytes per second, especially regarding networks.

00:02:14.319 For web servers, throughput is often illustrated through a number of requests per minute that can be processed. Both aspects are essential, and they are interrelated.

00:02:25.080 The faster you can return a response, the more throughput you're likely to get. It's simple logic; if your website processes each request in 200 milliseconds, you'll process more requests in a minute than if it takes 100 milliseconds.

00:02:36.360 There are two aspects to consider: the client-side and the server-side. When looking at the cycle of a single page request to a server, you'll have the request and response, followed by client-side processing once the server returns a request.

00:02:45.480 We'll first focus on the server-side aspect. It's crucial not to let your web server slow you down.

00:02:54.300 Initially, we recommended minimal guidance for Ruby web servers, but that has changed recently. Here we discuss two types of Ruby web servers you might choose from: single-request servers and concurrent servers.

00:03:08.639 A single-request server can only respond to one request at a time, regardless of how many requests come in. Consider it as a queue system where requests pile up while the server processes the first one.

00:03:19.301 Examples of single-request servers include Thin and Webrick. For simplicity, we won't discuss threaded Webrick or threaded Thin mode since I haven't had success using either in production.

00:03:25.740 In contrast, concurrent-request servers can process multiple requests simultaneously. In this case, when additional requests come in, the server can handle them while still processing the original request.

00:03:36.480 The benefit of using a concurrent web server is that it achieves greater throughput, as more users will experience reduced waiting times.

00:03:45.480 When discussing concurrency in Ruby, we primarily differentiate between threading and process concurrency. Threading means one process uses multiple threads.

00:03:52.379 The mental model here is that a single Ruby process spins up a thread for each request. Imagine a thread pool of six thread workers managing one request at a time.

00:04:01.560 The advantage of having a threaded web server is the lower memory footprint due to shared memory among threads, as they do not need to duplicate resources.

00:04:10.560 However, the downside of the global VM lock within MRI restricts you to processing one request at a time. Many web servers mitigate this since a lot of time spent is waiting for I/O operations.

00:04:20.100 With I/O blocking, a thread goes to sleep, allowing another thread to continue processing requests.

00:04:29.220 Using JRuby and Rubinius can circumvent the global VM lock, making them useful for CPU-intensive work, as they don't have this limitation.

00:04:35.220 The main drawback is that your code has to be thread-safe, which is not trivial for many existing Rails apps.

00:04:45.300 Popular multi-threading servers like Puma have had mixed success in production environments.

00:04:54.300 Alternatively, running multiple processes allows each request to use a separate thread, eliminating thread-safety concerns.

00:05:01.920 With this approach, a master process can spawn child processes to handle requests, such as using a Mongrel cluster. Requests are sent to the child processes and responses are returned accordingly.

00:05:13.800 The key benefit here is not worrying about thread safety issues, allowing you to keep your existing Rails app without extra overhead.

00:05:19.800 When using processes, you benefit from true parallelization since the global VM lock doesn't impede concurrent processing.

00:05:29.620 However, the downside is higher memory consumption stemming from independent processes. There has been some improvement in memory efficiency in Ruby 2.1.

00:05:39.840 For multi-process servers, Unicorn is the most commonly used option. It forks a large number of child processes to service requests.

00:05:48.840 At Heroku, we recommend Unicorn for most Rails apps due to its ability to handle concurrent requests effectively without worrying about thread safety.

00:05:56.460 To illustrate the degree of concurrency achievable from a single Heroku Dyno running Unicorn, consider the application Code Triage. This Rails 3.2 app can accommodate up to seven Unicorn workers on a single Dyno.

00:06:06.660 Another example is the Bundler API, which facilitates operations during 'bundle install.' Since it operates as a small Sinatra app with a smaller memory footprint, it can run 16 Unicorn workers on a double Dyno.

00:06:14.760 With these setups, we're processing up to 16 requests per Dyno at all times without needing to fret over thread safety.

00:06:20.040 Now, if you're considering how many workers your own app can run, remember that increasing the number of Unicorn workers raises throughput.

00:06:30.840 However, each Unicorn worker consumes more memory. Within the Unicorn configuration file, you can adjust the number of worker processes by modifying the 'worker_processes' line.

00:06:38.220 Traditionally, single Dynos have around half a gig of memory, while the recent launch of 2X Dynos grants an entire gig, providing you with more headroom to increase worker processes.

00:06:47.340 My first recommendation is to enable runtime metrics logging because it outputs memory and CPU usage details for each Dyno in your logs.

00:06:52.920 If you visit log2vis.herokuapp.com, you’ll find a dashboard that provides real-time metrics involving requests per minute, memory consumption, response time, and general web Dyno activity based on your worker configuration.

00:07:00.060 This feature doesn't retroactively log information, but it’s invaluable for tracking the maximum memory usage of each Dyno.

00:07:09.120 Maximizing the number of Unicorn workers while keeping RAM usage within total limits is crucial to avoid swapping, which leads to poor performance.

00:07:18.960 Most importantly, tests should be conducted under load. Utilize tools such as Blitz.io or Apache Bench to stress-test various endpoints and observe maximum memory usage under load.

00:07:26.940 In summary, for optimal performance, choose a concurrent web server to increase the throughput of your application.

00:07:36.540 Next, let’s discuss backend speed. Most of you likely have a database connected to your Rails app, and you should analyze your slow queries and consider adding indexes to optimize their performance.

00:07:42.780 In Rails, you can set the AutoExplain threshold to capture any query taking an excessive amount of time, which can then be logged for analysis.

00:07:49.740 Should you be using Heroku, you're likely on the Postgres service managed by the Data team. New PostgreSQL databases will operate on version 9.2 by default.

00:07:55.260 Interestingly, the PG stat statements extension helps track your database queries and resource usage. Make sure to enable this extension, as it won't be automatically activated.

00:08:05.520 Once enabled, it begins logging all your queries, and you can execute the SQL statement to view the slowest queries by the time taken to process them.

00:08:14.040 PostgreSQL normalizes queries to help identify slow queries and gives you total execution time, average time taken, and other essential metrics.

00:08:23.700 From this, you should utilize 'EXPLAIN ANALYZE' to detect if any queries are performing table scans and index them to enhance performance.

00:08:30.060 Monitoring performance is also essential. An often-mentioned quote is: 'There are no performance problems, only visibility problems.' If you can't identify what slows your app down, fixing it becomes nearly impossible.

00:08:38.700 New Relic is a familiar tool among many of you, as it provides detailed insights into app performance. For Heroku applications, you can get the standard plan for free.

00:08:49.080 Another internal tool we use a lot is Librato, which collects data and presents it through graphs. It requires you to instrument your app to gather metrics, which we do extensively.

00:08:55.620 The Metrics gem simplifies the collection and reporting of performance data, allowing you to connect it to your Librato account and get metrics like the 95th percentile, maximum, mean, and various graphs.

00:09:03.540 The dashboard view helps you visualize performance metrics, keeping you informed of what is happening in your application so you can identify potential issues.

00:09:09.720 The 95th percentile is especially useful because averages don't always tell the full story; it’s vital to ensure that performance is consistent.

00:09:20.460 Users often rely on consistent response times, so even if a few response times are much faster, the slowest responses can cause dissatisfaction. Thus, optimizing for the slowest use case is essential.

00:09:29.880 I also want to mention the Skylight product. It provides a histogram of response times, allowing you to evaluate performance distributions and detect slow performance issues efficiently.

00:09:37.320 Monitoring caching performance can also be informative, revealing the differences in response times when caching is implemented versus when it is not.

00:09:46.260 It's vital to instrument your application accurately. Like testing, performance measurement and monitoring should be integrated into any features you launch.

00:09:57.600 Understanding how to improve request times involves breaking down your web and worker processes. The web processes should handle requests, while worker processes manage tasks external to typical web interactions.

00:10:06.600 For example, a registration email should be sent through a worker instead of halting a user's experience during API calls or lengthy calculations.

00:10:12.700 Slow web requests lead to lower throughput and diminished performance, so optimizing those areas is crucial.

00:10:19.020 Once the backend is fully optimized, your app does hit capacity on a single Dyno, which leads to the next question: what do you do? There are two options: scale up or scale out.

00:10:27.800 Scaling up, or vertical scaling, involves purchasing larger hardware to improve processing speed, memory, and resource availability.

00:10:36.780 Scaling out, or horizontal scaling, means acquiring multiple machines and distributing loads across them. Vertical scaling is simpler but may hit physical limitations.

00:10:43.140 Scaling up is easy, as it usually doesn't necessitate any code changes or complex process communications—just buy a bigger machine and migrate your app.

00:10:52.020 Scaling out is more challenging but theoretically provides unlimited scalability as you keep adding more machines and allow them to communicate.

00:10:58.860 On Heroku, we utilize ephemeral web machines, meaning they can go away without disrupting operational continuity.

00:11:04.800 Users familiar with our platform know about the 'heroku ps:scale' command, which defines the number of web processes you wish to run.

00:11:10.740 For scaling vertically, the newly introduced 2X Dyno is meant to increase available RAM and headroom.

00:11:16.380 When approaching how to scale your database, one common observation is that many applications tend to be read-heavy.

00:11:24.240 Heroku allows the creation of database followers, which splits read traffic from a master database to enhance performance.

00:11:30.600 To provision a follower, you only need to add a new Heroku add-on. In our project Bundler API, we primarily perform reads and manage our load efficiently.

00:11:39.420 Reading from followers lets you scale horizontally, adding as many followers as needed while distributing read requests.

00:11:45.540 Be mindful that each PostgreSQL database has a maximum limit of 500 connections under a dedicated plan. Each Unicorn worker needs at least one connection to the database.

00:11:55.020 If using Puma, one connection per thread is essential, ensuring all threads can act concurrently without waiting for connections to become available.

00:12:03.180 If your setup approaches this 500 connection limit, feel free to come speak with us at our booth for guidance.

00:12:11.100 Now, having established a fast server and database, let's shift our focus to improving frontend performance.

00:12:17.058 The client-side encompasses how data is processed in the browser and the loading of necessary assets. If your app isn't solely serving JSON, assets must be optimized for user experience.

00:12:27.240 Ruby web servers typically have slow asset-loading capabilities. Reducing the size of your assets through minification and compression, like Gzip, can speed up load times.

00:12:36.420 Another factor affecting response time is geographical distance. For example, a server on the West Coast will respond faster to users on the East Coast than to those in Africa.

00:12:42.720 Considering this, implementing a Content Delivery Network (CDN) can significantly mitigate response latency by serving static assets closer to users.

00:12:53.880 CDN providers include Akamai, CloudFront, and Cloudflare. Heroku currently offers an add-on, CDN Sumo, in beta for seamless integration.

00:13:04.200 Configuring a CDN within your Rails application is quite straightforward. Set the asset host in your ActionController to the CDN's URL, and Rails will fetch assets from there.

00:13:11.400 When the first request for an asset is made, the CDN fetches it from your server if it is not cached, stores it, and serves it for following requests.

00:13:23.760 After deploying, you might want to quickly fetch all your assets to warm up the CDN so it can deliver content faster.

00:13:31.680 Finally, consider how long to retain cached assets by setting expiration headers on your Rails assets. You can set them to public with a maximum age since most assets don't change frequently.

00:13:38.160 However, JavaScript and CSS changes should prompt clearing that cache. This can be conveniently managed with asset fingerprinting provided by Rails.

00:13:48.600 By enabling fingerprinting, Rails generates a hash for each asset. Every time there’s a modification, a new fingerprint is created, prompting the client to request the updated asset.

00:13:56.640 With minimal configuration, you can set up your Rails app to leverage CDNs effectively, resulting in faster asset delivery.

00:14:05.520 With regard to latency, Heroku now allows spinning up apps in the Europe region for better response times for European clients.

00:14:12.420 To replicate an app to Europe, use the Heroku Fork command, which clones the last release of your app, along with its add-ons and databases.

00:14:19.500 This feature streamlines the process of testing and launching apps in different geographical regions.

00:14:27.600 In conclusion, by architecting your app correctly, you can postpone dealing with scaling issues related to state, allowing for greater scalability as your app evolves.

00:14:34.800 Using a concurrent web server like Unicorn simplifies handling multiple requests while improving throughput. Monitoring tools provide valuable insights that drive tuning and optimizations.

00:14:43.440 Leverage powerful database introspections for adding indexes, employing followers, and ensuring your database can handle its load.

00:14:52.020 By utilizing CDNs for asset delivery and relocating app servers closer to your user base, you create a responsive application that scales effectively.

00:14:59.460 Thank you for your time. Are there any questions?

00:15:05.460 Thanks, everyone.

Forget Scaling: Focus on Performance

Key Points Discussed: