00:00:15.599
Hi everyone, I'm Ben Hamill. You can find my Twitter, website, and email address online. I work at a company called Return Path.
00:00:20.640
We are the global leader in email intelligence, which may sound impressive, but essentially, we focus on making email more effective and engaging. Specifically, I work on a product called Context IO, which is an HTTP API for email.
00:00:31.760
You may have spoken to Tony at our sponsor booth. By the way, we're hiring! If you're interested, feel free to talk to me afterward or reach out to Tony.
00:00:49.440
Today, I want to tell you a story about my first experience with measuring and improving performance in Ruby, particularly concerning user experience with performance. Initially, I found this project intimidating, but it turned out to be much more manageable than I expected. My goal is to convey that this experience doesn't have to be daunting.
00:01:14.240
In this talk, we'll cover three main topics. First, I'll discuss the state of the codebase when I started. Then, I'll share some tricks I discovered to create the illusion of speed without actually making the code faster. Finally, we'll explore how to achieve real speed improvements.
00:01:36.079
The API mentioned earlier will remain unnamed throughout this presentation, partially to protect the identity of the innocent and also because it's related to a project we're not ready to disclose yet. However, I can share some insights about this particular API. First, all exchanges with it are XML-based, which presents its own set of challenges.
00:02:03.119
When you make HTTP requests with XML bodies, you receive responses in XML as well. The external API recognizes two kinds of entities: documents, which consist of a body primarily composed of text, and a set of changes, which include actions such as adding, deleting, or modifying documents.
00:02:31.599
For example, if someone moves a document to a new folder, you'll get a change notification. If they receive a new item in their account, that falls under an addition, and naturally, if a document is deleted, that's recorded as such. The change document is quite lightweight, but if you want to see the details, you need to fetch the document itself.
00:02:57.840
In my application, we perform what we call a 'sync.' This means we synchronize our understanding of the account with the foreign API to reflect what is accurate on their server. We make a request, fetch the change document, and for each command in the change document, we retrieve the body of the message, conduct some analysis, and repeat this for all entries in the change document.
00:03:21.200
After processing all entries, we timestamp the result and conclude our sync process. This is where the challenge arises. The sync process stores the data, and there is an iPhone app being developed by team members that relies on this data.
00:03:46.959
When a new user downloads the app and completes sign-up, it triggers a sync followed by post-processing. When I find something interesting, the app updates to display that information to the user. The determination of 'interesting' is carried out by a Bayesian classifier, but for our current discussion, it's important to know that there is a delay between completing the sync and displaying results, which can be frustrating for users.
00:04:26.800
In my experience, if the wait time exceeds five seconds, users become impatient. They want to see results quickly, and prolonged wait times can lead to a poor user experience. Our goal here is to avoid situations where users feel like they are waiting indefinitely.
00:05:00.400
To address this, we first need to identify which specific parts of the process are slow. While we know the entire operation feels slow, pinpointing the exact bottleneck is crucial. We had an intuition that the issue lay outside our codebase since, naturally, no one believes their written code is the issue.
00:05:43.200
However, it's essential to validate these assumptions through measurement. Guessing and then dedicating time to optimizing code without evidence can result in wasted effort. Therefore, we began measuring.
00:06:01.360
I started using various tools to investigate the performance. The first was 'top'—a command-line tool available on Linux. While I made a presentation slide, I ran it on an alpha server to gather data.
00:06:28.160
This tool provides extensive output about system resource usage, which I summarized. One of the key metrics shows the amount of free memory available. A low value indicates high RAM usage while a high value suggests normal usage. Fortunately, it turned out that we had plenty of free memory.
00:06:58.400
Additionally, I observed the CPU usage of various processes. Each line represents an active process, with the CPU percentage comprising the total usage across all cores. In our case, the CPU numbers were relatively low, indicating that our application was not being constrained by CPU resources.
00:07:35.040
Next, I explored 'iostat,' another tool that provides insights into disk reads and writes, which is vital for understanding input/output wait times. Using this tool, I gathered data on how frequently our system was waiting on disk operations.
00:08:07.599
The 'iostat' will show the percentage of time spent waiting for disk activity. In our case, this metric indicated that the disk was not the bottleneck. With these insights, I turned my focus toward network activity, suspecting it might be responsible for the delays we were experiencing.
00:08:43.360
This led me to investigate 'perftools.rb,' a Ruby gem that is a re-implementation of an older C tool called 'profiler.' After integrating this gem into our application on the alpha server, I could start tracking method calls and resource usage effectively.
00:09:20.960
The tool allows you to collect profiling data to analyze which methods hold the most execution time. When I examined the output, I noticed that a significant amount of time was spent on network-related actions, especially 'io_select,' which indicated that the process was mostly waiting for external responses.
00:09:52.440
After further exploration, I determined that retrieving the body of a document took about half a second, and when processing a considerable number of documents, this time added up very quickly, leading to an overall slowdown.
00:10:28.920
By this point, I had spent roughly three days investigating performance improvements. My inherent developer instinct sparked my interest in threading—specifically, how I could leverage it to enhance performance further. However, I realized that learning how to implement threads effectively would demand significant time investment.
00:11:04.640
I was considering whether there was something simpler I could apply immediately that would provide meaningful improvement without delving deep into threading concepts. Understanding that users don't recognize when a sync starts and ends helped clarify what truly matters: providing feedback to users quickly.
00:11:48.560
This realization led me to devise three strategies to create the perception of speed. First, I reorganized the change document processing order to prioritize more recent changes since users typically care more about the latest activity rather than older data.
00:12:22.560
Second, I eliminated unnecessary API calls, particularly for delete actions since no further information would be gained by making a request for a nonexistent document. Thirdly, I optimized how updates were processed by reducing calls where possible, such as updating metadata without needing to fetch the entire document.
00:12:41.760
Finally, I leveraged persistent HTTP connections with the Faraday gem, which made a significant difference by preventing repetitive connection setups and teardowns, thus improving the overall experience.
00:13:11.080
After implementing these strategies, I observed some remarkable improvements. Using time benchmarks, I noted a significant decrease in feedback time for users from 15 minutes down to just 2.5 minutes. I felt accomplished, thinking I had maximized the benefits from my changes within just one day.
00:13:37.920
My enthusiasm was tempered when I realized that while significant progress had been made, the time was still too long for a good user experience. Reflecting on this led me back to the potential of using threads for greater performance gains.
00:14:24.560
Before diving into threading, I had to grasp the implications of the Global Interpreter Lock (GIL) that restricts multithreading capabilities in MRI Ruby. Understanding the nature of parallelization and which tasks could benefit from threads helped me frame my approach to problem-solving.
00:15:02.080
The key takeaway is that while certain tasks are limited by the GIL and require time to process sequentially, tasks involving system calls, like IO operations, can run concurrently, providing significant performance improvements.
00:15:43.680
This realization made threading worthwhile. I implemented a persistent thread pool, maintaining threads throughout the life of the application, allowing for efficient task management without repeatedly spinning up and shutting down new threads for each task.
00:16:29.600
While implementing this persistent thread pool did introduce some complexity, it also allowed me to reuse threads efficiently for processing work items, maintaining performance without excessive overhead. The subsequent testing showed a marked improvement in elapsed sync times.
00:17:11.440
After about a week of implementing threads, I saw a reduction in sync time to 17 seconds—this was an encouraging edge toward superior user experience. While this new time was still not perfect, it represented a considerable optimization.
00:17:38.240
This process underscored that there is always room for improvement, and although further refinements could be pursued, we had to keep the trade-off between performance and development effort in mind.
00:18:18.560
To summarize everything, when facing application performance challenges, the initial step involves identifying the specific areas that contribute most to slowdowns. Tools like top, iostat, and proftools.rb can provide valuable insights.
00:18:53.680
Once you've determined the bottlenecks, consider if you can create perceived speed improvements by optimizing feedback mechanisms or input-output management. If all else fails, you can turn to threads, adapting them thoughtfully to your application workload to enhance performance.
00:19:18.080
Thank you for your time, and I welcome any questions you may have.