I Made a Slow Thing Fast

LoneStarRuby Conf 2013

Ben Hamill

1 talk

I Made a Slow Thing Fast

by Ben Hamill

In the talk titled "I Made a Slow Thing Fast," Ben Hamill narrates his experience in enhancing application performance in Ruby by addressing user experience with slow external API communication. This session, presented at the LoneStarRuby Conf 2013, highlights the complexity of dealing with performance issues, particularly the perception of speed versus actual speed improvements.

Key Points Discussed:
- Initial Codebase and Challenges: Hamill describes the initial state of the codebase as intimidating but manageable, focusing on improving the user experience during synchronization with a slow external API.
- Measuring Performance: He emphasizes the importance of measurement and investigation to identify performance bottlenecks rather than making assumptions about where the problems lie. Tools like 'top,' 'iostat,' and 'perftools.rb' were useful in pinpointing issues with network-related actions and disk usage.
- Creating the Illusion of Speed: Hamill shares strategies for enhancing the perceived performance without making significant changes to the codebase. These include prioritizing recent changes, eliminating unnecessary API calls, and optimizing update-processing to provide quicker feedback to users.
- Implementing Threading for Performance: As a next step towards real performance improvement, Hamill explores multithreading with Python's Global Interpreter Lock (GIL) in mind. He successfully implemented a persistent thread pool to manage tasks more efficiently, which resulted in further reductions in synchronization times.
- Results and Reflection: The changes led to a substantial decrease in user feedback time, from 15 minutes to 17 seconds over a week of implementation. While this pace still needed improvement, it marked a significant enhancement.

Conclusions and Takeaways:
- Identify specific bottlenecks using appropriate tools to inform optimization steps.
- Consider optimizing user feedback and input-output management as immediate ways to boost perceived speed.
- When necessary, thoughtfully incorporate threading to tackle performance challenges, keeping in mind the balance between complexity and efficiency.

Ben Hamill's insightful narrative offers newcomers valuable lessons in performance optimization and the critical balance between perception and reality in application responsiveness.

00:00:15.599 Hi everyone, I'm Ben Hamill. You can find my Twitter, website, and email address online. I work at a company called Return Path.

00:00:20.640 We are the global leader in email intelligence, which may sound impressive, but essentially, we focus on making email more effective and engaging. Specifically, I work on a product called Context IO, which is an HTTP API for email.

00:00:31.760 You may have spoken to Tony at our sponsor booth. By the way, we're hiring! If you're interested, feel free to talk to me afterward or reach out to Tony.

00:00:49.440 Today, I want to tell you a story about my first experience with measuring and improving performance in Ruby, particularly concerning user experience with performance. Initially, I found this project intimidating, but it turned out to be much more manageable than I expected. My goal is to convey that this experience doesn't have to be daunting.

00:01:14.240 In this talk, we'll cover three main topics. First, I'll discuss the state of the codebase when I started. Then, I'll share some tricks I discovered to create the illusion of speed without actually making the code faster. Finally, we'll explore how to achieve real speed improvements.

00:01:36.079 The API mentioned earlier will remain unnamed throughout this presentation, partially to protect the identity of the innocent and also because it's related to a project we're not ready to disclose yet. However, I can share some insights about this particular API. First, all exchanges with it are XML-based, which presents its own set of challenges.

00:02:03.119 When you make HTTP requests with XML bodies, you receive responses in XML as well. The external API recognizes two kinds of entities: documents, which consist of a body primarily composed of text, and a set of changes, which include actions such as adding, deleting, or modifying documents.

00:02:31.599 For example, if someone moves a document to a new folder, you'll get a change notification. If they receive a new item in their account, that falls under an addition, and naturally, if a document is deleted, that's recorded as such. The change document is quite lightweight, but if you want to see the details, you need to fetch the document itself.

00:02:57.840 In my application, we perform what we call a 'sync.' This means we synchronize our understanding of the account with the foreign API to reflect what is accurate on their server. We make a request, fetch the change document, and for each command in the change document, we retrieve the body of the message, conduct some analysis, and repeat this for all entries in the change document.

00:03:21.200 After processing all entries, we timestamp the result and conclude our sync process. This is where the challenge arises. The sync process stores the data, and there is an iPhone app being developed by team members that relies on this data.

00:03:46.959 When a new user downloads the app and completes sign-up, it triggers a sync followed by post-processing. When I find something interesting, the app updates to display that information to the user. The determination of 'interesting' is carried out by a Bayesian classifier, but for our current discussion, it's important to know that there is a delay between completing the sync and displaying results, which can be frustrating for users.

00:04:26.800 In my experience, if the wait time exceeds five seconds, users become impatient. They want to see results quickly, and prolonged wait times can lead to a poor user experience. Our goal here is to avoid situations where users feel like they are waiting indefinitely.

00:05:00.400 To address this, we first need to identify which specific parts of the process are slow. While we know the entire operation feels slow, pinpointing the exact bottleneck is crucial. We had an intuition that the issue lay outside our codebase since, naturally, no one believes their written code is the issue.

00:05:43.200 However, it's essential to validate these assumptions through measurement. Guessing and then dedicating time to optimizing code without evidence can result in wasted effort. Therefore, we began measuring.

00:06:01.360 I started using various tools to investigate the performance. The first was 'top'—a command-line tool available on Linux. While I made a presentation slide, I ran it on an alpha server to gather data.

00:06:28.160 This tool provides extensive output about system resource usage, which I summarized. One of the key metrics shows the amount of free memory available. A low value indicates high RAM usage while a high value suggests normal usage. Fortunately, it turned out that we had plenty of free memory.

00:06:58.400 Additionally, I observed the CPU usage of various processes. Each line represents an active process, with the CPU percentage comprising the total usage across all cores. In our case, the CPU numbers were relatively low, indicating that our application was not being constrained by CPU resources.

00:07:35.040 Next, I explored 'iostat,' another tool that provides insights into disk reads and writes, which is vital for understanding input/output wait times. Using this tool, I gathered data on how frequently our system was waiting on disk operations.

00:08:07.599 The 'iostat' will show the percentage of time spent waiting for disk activity. In our case, this metric indicated that the disk was not the bottleneck. With these insights, I turned my focus toward network activity, suspecting it might be responsible for the delays we were experiencing.

00:08:43.360 This led me to investigate 'perftools.rb,' a Ruby gem that is a re-implementation of an older C tool called 'profiler.' After integrating this gem into our application on the alpha server, I could start tracking method calls and resource usage effectively.

00:09:20.960 The tool allows you to collect profiling data to analyze which methods hold the most execution time. When I examined the output, I noticed that a significant amount of time was spent on network-related actions, especially 'io_select,' which indicated that the process was mostly waiting for external responses.

00:09:52.440 After further exploration, I determined that retrieving the body of a document took about half a second, and when processing a considerable number of documents, this time added up very quickly, leading to an overall slowdown.

00:10:28.920 By this point, I had spent roughly three days investigating performance improvements. My inherent developer instinct sparked my interest in threading—specifically, how I could leverage it to enhance performance further. However, I realized that learning how to implement threads effectively would demand significant time investment.

00:11:04.640 I was considering whether there was something simpler I could apply immediately that would provide meaningful improvement without delving deep into threading concepts. Understanding that users don't recognize when a sync starts and ends helped clarify what truly matters: providing feedback to users quickly.

00:11:48.560 This realization led me to devise three strategies to create the perception of speed. First, I reorganized the change document processing order to prioritize more recent changes since users typically care more about the latest activity rather than older data.

00:12:22.560 Second, I eliminated unnecessary API calls, particularly for delete actions since no further information would be gained by making a request for a nonexistent document. Thirdly, I optimized how updates were processed by reducing calls where possible, such as updating metadata without needing to fetch the entire document.

00:12:41.760 Finally, I leveraged persistent HTTP connections with the Faraday gem, which made a significant difference by preventing repetitive connection setups and teardowns, thus improving the overall experience.

00:13:11.080 After implementing these strategies, I observed some remarkable improvements. Using time benchmarks, I noted a significant decrease in feedback time for users from 15 minutes down to just 2.5 minutes. I felt accomplished, thinking I had maximized the benefits from my changes within just one day.

00:13:37.920 My enthusiasm was tempered when I realized that while significant progress had been made, the time was still too long for a good user experience. Reflecting on this led me back to the potential of using threads for greater performance gains.

00:14:24.560 Before diving into threading, I had to grasp the implications of the Global Interpreter Lock (GIL) that restricts multithreading capabilities in MRI Ruby. Understanding the nature of parallelization and which tasks could benefit from threads helped me frame my approach to problem-solving.

00:15:02.080 The key takeaway is that while certain tasks are limited by the GIL and require time to process sequentially, tasks involving system calls, like IO operations, can run concurrently, providing significant performance improvements.

00:15:43.680 This realization made threading worthwhile. I implemented a persistent thread pool, maintaining threads throughout the life of the application, allowing for efficient task management without repeatedly spinning up and shutting down new threads for each task.

00:16:29.600 While implementing this persistent thread pool did introduce some complexity, it also allowed me to reuse threads efficiently for processing work items, maintaining performance without excessive overhead. The subsequent testing showed a marked improvement in elapsed sync times.

00:17:11.440 After about a week of implementing threads, I saw a reduction in sync time to 17 seconds—this was an encouraging edge toward superior user experience. While this new time was still not perfect, it represented a considerable optimization.

00:17:38.240 This process underscored that there is always room for improvement, and although further refinements could be pursued, we had to keep the trade-off between performance and development effort in mind.

00:18:18.560 To summarize everything, when facing application performance challenges, the initial step involves identifying the specific areas that contribute most to slowdowns. Tools like top, iostat, and proftools.rb can provide valuable insights.

00:18:53.680 Once you've determined the bottlenecks, consider if you can create perceived speed improvements by optimizing feedback mechanisms or input-output management. If all else fails, you can turn to threads, adapting them thoughtfully to your application workload to enhance performance.

00:19:18.080 Thank you for your time, and I welcome any questions you may have.

LoneStarRuby Conf 2013