00:00:12.259
Thank you. Hello. As Richard said, I'm Terence Lee, known as hono2 on Twitter. I wear this blue hat and hold a bunch of titles.
00:00:20.220
The first title is leader of the League of Blue Hats. I wear blue hats, and this is Jessica Settles, who has a picture wearing a blue hat. This is Chris Kelly from New Relic, who photoshopped the hat on himself because it's a pretty cool League.
00:00:33.719
I am from Austin, Texas. There, I'm known as the Chief Taco Officer. I believe tacos are the best thing in Austin. If you're from LA and think tacos are amazing there, you really need to come to Austin. If you're in Austin, I will personally take you out for tacos if I'm in town since I think they're pretty awesome.
00:00:39.300
I work on Bundler, like Richard mentioned, and I also work for a small company called Heroku. We optimize for developer happiness, which is one of the goals of our platform.
00:00:49.740
I'm sure you've seen 'git push heroku master' to deploy. You just type 'git push heroku master' and it deploys your app to the platform. Right now, we are running a little over three million apps on the platform.
00:01:02.780
I am part of the Ruby task force, which is the Ruby product team aimed at delivering the best Ruby experience we can on our platform. We're focused on maintaining the build pack, working on community open-source projects like Bundler, Rails, and more, as well as helping out in the community.
00:01:13.119
If you experience problems, please come talk to us. We would be more than happy to listen to your issues. We'll have a booth today and tomorrow, and we also have Ruby task force meetings.
00:01:25.080
I also work with a really cool guy, a co-worker of mine who created Ruby, and I'm really happy about that. But enough with the introductions—today, I'm going to talk about performance.
00:01:36.600
So let's forget about scaling for a little bit. Any app that is well prepared can be scaled much more easily. By thinking about some of this upfront, we can concentrate more on scaling later, as our app will be set up to accommodate that.
00:01:45.480
In this talk, we will focus on the performance of a single app on a single process initially. The side effect of this is that it makes the app easier to scale.
00:01:52.379
The first thing I want to address is the difference between speed versus throughput when discussing performance. When I talk about speed, I'm referring to response times—how fast your application returns a page for a single request.
00:02:08.119
Throughput, on the other hand, concerns the rate of performance in terms of bytes per second, especially regarding networks.
00:02:14.319
For web servers, throughput is often illustrated through a number of requests per minute that can be processed. Both aspects are essential, and they are interrelated.
00:02:25.080
The faster you can return a response, the more throughput you're likely to get. It's simple logic; if your website processes each request in 200 milliseconds, you'll process more requests in a minute than if it takes 100 milliseconds.
00:02:36.360
There are two aspects to consider: the client-side and the server-side. When looking at the cycle of a single page request to a server, you'll have the request and response, followed by client-side processing once the server returns a request.
00:02:45.480
We'll first focus on the server-side aspect. It's crucial not to let your web server slow you down.
00:02:54.300
Initially, we recommended minimal guidance for Ruby web servers, but that has changed recently. Here we discuss two types of Ruby web servers you might choose from: single-request servers and concurrent servers.
00:03:08.639
A single-request server can only respond to one request at a time, regardless of how many requests come in. Consider it as a queue system where requests pile up while the server processes the first one.
00:03:19.301
Examples of single-request servers include Thin and Webrick. For simplicity, we won't discuss threaded Webrick or threaded Thin mode since I haven't had success using either in production.
00:03:25.740
In contrast, concurrent-request servers can process multiple requests simultaneously. In this case, when additional requests come in, the server can handle them while still processing the original request.
00:03:36.480
The benefit of using a concurrent web server is that it achieves greater throughput, as more users will experience reduced waiting times.
00:03:45.480
When discussing concurrency in Ruby, we primarily differentiate between threading and process concurrency. Threading means one process uses multiple threads.
00:03:52.379
The mental model here is that a single Ruby process spins up a thread for each request. Imagine a thread pool of six thread workers managing one request at a time.
00:04:01.560
The advantage of having a threaded web server is the lower memory footprint due to shared memory among threads, as they do not need to duplicate resources.
00:04:10.560
However, the downside of the global VM lock within MRI restricts you to processing one request at a time. Many web servers mitigate this since a lot of time spent is waiting for I/O operations.
00:04:20.100
With I/O blocking, a thread goes to sleep, allowing another thread to continue processing requests.
00:04:29.220
Using JRuby and Rubinius can circumvent the global VM lock, making them useful for CPU-intensive work, as they don't have this limitation.
00:04:35.220
The main drawback is that your code has to be thread-safe, which is not trivial for many existing Rails apps.
00:04:45.300
Popular multi-threading servers like Puma have had mixed success in production environments.
00:04:54.300
Alternatively, running multiple processes allows each request to use a separate thread, eliminating thread-safety concerns.
00:05:01.920
With this approach, a master process can spawn child processes to handle requests, such as using a Mongrel cluster. Requests are sent to the child processes and responses are returned accordingly.
00:05:13.800
The key benefit here is not worrying about thread safety issues, allowing you to keep your existing Rails app without extra overhead.
00:05:19.800
When using processes, you benefit from true parallelization since the global VM lock doesn't impede concurrent processing.
00:05:29.620
However, the downside is higher memory consumption stemming from independent processes. There has been some improvement in memory efficiency in Ruby 2.1.
00:05:39.840
For multi-process servers, Unicorn is the most commonly used option. It forks a large number of child processes to service requests.
00:05:48.840
At Heroku, we recommend Unicorn for most Rails apps due to its ability to handle concurrent requests effectively without worrying about thread safety.
00:05:56.460
To illustrate the degree of concurrency achievable from a single Heroku Dyno running Unicorn, consider the application Code Triage. This Rails 3.2 app can accommodate up to seven Unicorn workers on a single Dyno.
00:06:06.660
Another example is the Bundler API, which facilitates operations during 'bundle install.' Since it operates as a small Sinatra app with a smaller memory footprint, it can run 16 Unicorn workers on a double Dyno.
00:06:14.760
With these setups, we're processing up to 16 requests per Dyno at all times without needing to fret over thread safety.
00:06:20.040
Now, if you're considering how many workers your own app can run, remember that increasing the number of Unicorn workers raises throughput.
00:06:30.840
However, each Unicorn worker consumes more memory. Within the Unicorn configuration file, you can adjust the number of worker processes by modifying the 'worker_processes' line.
00:06:38.220
Traditionally, single Dynos have around half a gig of memory, while the recent launch of 2X Dynos grants an entire gig, providing you with more headroom to increase worker processes.
00:06:47.340
My first recommendation is to enable runtime metrics logging because it outputs memory and CPU usage details for each Dyno in your logs.
00:06:52.920
If you visit log2vis.herokuapp.com, you’ll find a dashboard that provides real-time metrics involving requests per minute, memory consumption, response time, and general web Dyno activity based on your worker configuration.
00:07:00.060
This feature doesn't retroactively log information, but it’s invaluable for tracking the maximum memory usage of each Dyno.
00:07:09.120
Maximizing the number of Unicorn workers while keeping RAM usage within total limits is crucial to avoid swapping, which leads to poor performance.
00:07:18.960
Most importantly, tests should be conducted under load. Utilize tools such as Blitz.io or Apache Bench to stress-test various endpoints and observe maximum memory usage under load.
00:07:26.940
In summary, for optimal performance, choose a concurrent web server to increase the throughput of your application.
00:07:36.540
Next, let’s discuss backend speed. Most of you likely have a database connected to your Rails app, and you should analyze your slow queries and consider adding indexes to optimize their performance.
00:07:42.780
In Rails, you can set the AutoExplain threshold to capture any query taking an excessive amount of time, which can then be logged for analysis.
00:07:49.740
Should you be using Heroku, you're likely on the Postgres service managed by the Data team. New PostgreSQL databases will operate on version 9.2 by default.
00:07:55.260
Interestingly, the PG stat statements extension helps track your database queries and resource usage. Make sure to enable this extension, as it won't be automatically activated.
00:08:05.520
Once enabled, it begins logging all your queries, and you can execute the SQL statement to view the slowest queries by the time taken to process them.
00:08:14.040
PostgreSQL normalizes queries to help identify slow queries and gives you total execution time, average time taken, and other essential metrics.
00:08:23.700
From this, you should utilize 'EXPLAIN ANALYZE' to detect if any queries are performing table scans and index them to enhance performance.
00:08:30.060
Monitoring performance is also essential. An often-mentioned quote is: 'There are no performance problems, only visibility problems.' If you can't identify what slows your app down, fixing it becomes nearly impossible.
00:08:38.700
New Relic is a familiar tool among many of you, as it provides detailed insights into app performance. For Heroku applications, you can get the standard plan for free.
00:08:49.080
Another internal tool we use a lot is Librato, which collects data and presents it through graphs. It requires you to instrument your app to gather metrics, which we do extensively.
00:08:55.620
The Metrics gem simplifies the collection and reporting of performance data, allowing you to connect it to your Librato account and get metrics like the 95th percentile, maximum, mean, and various graphs.
00:09:03.540
The dashboard view helps you visualize performance metrics, keeping you informed of what is happening in your application so you can identify potential issues.
00:09:09.720
The 95th percentile is especially useful because averages don't always tell the full story; it’s vital to ensure that performance is consistent.
00:09:20.460
Users often rely on consistent response times, so even if a few response times are much faster, the slowest responses can cause dissatisfaction. Thus, optimizing for the slowest use case is essential.
00:09:29.880
I also want to mention the Skylight product. It provides a histogram of response times, allowing you to evaluate performance distributions and detect slow performance issues efficiently.
00:09:37.320
Monitoring caching performance can also be informative, revealing the differences in response times when caching is implemented versus when it is not.
00:09:46.260
It's vital to instrument your application accurately. Like testing, performance measurement and monitoring should be integrated into any features you launch.
00:09:57.600
Understanding how to improve request times involves breaking down your web and worker processes. The web processes should handle requests, while worker processes manage tasks external to typical web interactions.
00:10:06.600
For example, a registration email should be sent through a worker instead of halting a user's experience during API calls or lengthy calculations.
00:10:12.700
Slow web requests lead to lower throughput and diminished performance, so optimizing those areas is crucial.
00:10:19.020
Once the backend is fully optimized, your app does hit capacity on a single Dyno, which leads to the next question: what do you do? There are two options: scale up or scale out.
00:10:27.800
Scaling up, or vertical scaling, involves purchasing larger hardware to improve processing speed, memory, and resource availability.
00:10:36.780
Scaling out, or horizontal scaling, means acquiring multiple machines and distributing loads across them. Vertical scaling is simpler but may hit physical limitations.
00:10:43.140
Scaling up is easy, as it usually doesn't necessitate any code changes or complex process communications—just buy a bigger machine and migrate your app.
00:10:52.020
Scaling out is more challenging but theoretically provides unlimited scalability as you keep adding more machines and allow them to communicate.
00:10:58.860
On Heroku, we utilize ephemeral web machines, meaning they can go away without disrupting operational continuity.
00:11:04.800
Users familiar with our platform know about the 'heroku ps:scale' command, which defines the number of web processes you wish to run.
00:11:10.740
For scaling vertically, the newly introduced 2X Dyno is meant to increase available RAM and headroom.
00:11:16.380
When approaching how to scale your database, one common observation is that many applications tend to be read-heavy.
00:11:24.240
Heroku allows the creation of database followers, which splits read traffic from a master database to enhance performance.
00:11:30.600
To provision a follower, you only need to add a new Heroku add-on. In our project Bundler API, we primarily perform reads and manage our load efficiently.
00:11:39.420
Reading from followers lets you scale horizontally, adding as many followers as needed while distributing read requests.
00:11:45.540
Be mindful that each PostgreSQL database has a maximum limit of 500 connections under a dedicated plan. Each Unicorn worker needs at least one connection to the database.
00:11:55.020
If using Puma, one connection per thread is essential, ensuring all threads can act concurrently without waiting for connections to become available.
00:12:03.180
If your setup approaches this 500 connection limit, feel free to come speak with us at our booth for guidance.
00:12:11.100
Now, having established a fast server and database, let's shift our focus to improving frontend performance.
00:12:17.058
The client-side encompasses how data is processed in the browser and the loading of necessary assets. If your app isn't solely serving JSON, assets must be optimized for user experience.
00:12:27.240
Ruby web servers typically have slow asset-loading capabilities. Reducing the size of your assets through minification and compression, like Gzip, can speed up load times.
00:12:36.420
Another factor affecting response time is geographical distance. For example, a server on the West Coast will respond faster to users on the East Coast than to those in Africa.
00:12:42.720
Considering this, implementing a Content Delivery Network (CDN) can significantly mitigate response latency by serving static assets closer to users.
00:12:53.880
CDN providers include Akamai, CloudFront, and Cloudflare. Heroku currently offers an add-on, CDN Sumo, in beta for seamless integration.
00:13:04.200
Configuring a CDN within your Rails application is quite straightforward. Set the asset host in your ActionController to the CDN's URL, and Rails will fetch assets from there.
00:13:11.400
When the first request for an asset is made, the CDN fetches it from your server if it is not cached, stores it, and serves it for following requests.
00:13:23.760
After deploying, you might want to quickly fetch all your assets to warm up the CDN so it can deliver content faster.
00:13:31.680
Finally, consider how long to retain cached assets by setting expiration headers on your Rails assets. You can set them to public with a maximum age since most assets don't change frequently.
00:13:38.160
However, JavaScript and CSS changes should prompt clearing that cache. This can be conveniently managed with asset fingerprinting provided by Rails.
00:13:48.600
By enabling fingerprinting, Rails generates a hash for each asset. Every time there’s a modification, a new fingerprint is created, prompting the client to request the updated asset.
00:13:56.640
With minimal configuration, you can set up your Rails app to leverage CDNs effectively, resulting in faster asset delivery.
00:14:05.520
With regard to latency, Heroku now allows spinning up apps in the Europe region for better response times for European clients.
00:14:12.420
To replicate an app to Europe, use the Heroku Fork command, which clones the last release of your app, along with its add-ons and databases.
00:14:19.500
This feature streamlines the process of testing and launching apps in different geographical regions.
00:14:27.600
In conclusion, by architecting your app correctly, you can postpone dealing with scaling issues related to state, allowing for greater scalability as your app evolves.
00:14:34.800
Using a concurrent web server like Unicorn simplifies handling multiple requests while improving throughput. Monitoring tools provide valuable insights that drive tuning and optimizations.
00:14:43.440
Leverage powerful database introspections for adding indexes, employing followers, and ensuring your database can handle its load.
00:14:52.020
By utilizing CDNs for asset delivery and relocating app servers closer to your user base, you create a responsive application that scales effectively.
00:14:59.460
Thank you for your time. Are there any questions?
00:15:05.460
Thanks, everyone.