Rocky Mountain Ruby 2012

Algorithms

Algorithms

by Gordon Diggs

In this informative talk presented at the Rocky Mountain Ruby 2012 event, Gordon Diggs shares his insights and experiences on algorithms, specifically focusing on a project he undertook to manage his growing record collection. With a background at Paperless Post in New York City and a personal passion for collecting records, Diggs encountered challenges in tracking his extensive catalog efficiently. To tackle this, he developed a Sinatra application that not only cataloged records but also generated pie charts to illustrate the distribution of values within the collection.

Initially, Diggs implemented a solution using MapReduce-style code, but performance issues became evident, as processing 9,000 records took an unacceptably long 23 and a half seconds. This prompted him to rethink his approach to algorithm design and performance optimization.

Key points of his solution include:

  • Caching Results: Acknowledging the bottleneck caused by repeated calculations, he first attempted to cache results, updating them only when changes were made. However, the volume of records necessitated a more scalable solution.
  • Switching to Postgres: Recognizing the limitations of his initial Ruby implementation, Diggs migrated his app's backend to Postgres. He highlighted the advantages of SQL, particularly its efficiency in counting and grouping operations, which directly addressed his performance needs.
  • Simplified SQL Query: The core of his final solution was a streamlined SQL statement capable of selecting, counting, grouping, and ordering data effectively. After refining the data, Diggs achieved significant performance improvements, with query execution time dropping to about half a second for the same dataset of 9,000 items.

Diggs concluded by emphasizing the superiority of SQL for specific tasks compared to Ruby, advocating for developers to expand their skillsets to include SQL for better efficiency and performance in data-related applications. His experience illustrates the importance of choosing the right tools and methodologies for different programming challenges, especially in data management and algorithm design. This practical insight serves as a valuable lesson for developers looking to optimize their applications for performance.

00:00:06.319 Cool, so today I want to talk about algorithms. More specifically, I want to discuss an algorithm. My name is Gordon Diggs, and I work for Paperless Post in New York City. However, this talk isn't about that. In my free time, I collect records.
00:00:20.160 I've started acquiring a lot of them, and it's hard to keep track of everything. So I created a Sinatra app that allows me to catalog all my records. This app generates pie charts that illustrate the frequency of every value in every column, which is important information to have about your record collection.
00:00:39.360 Initially, I implemented a solution that should look familiar to anyone who has ever written MapReduce code or concurrent code. However, while it was a neat solution, it turned out to be quite slow. With 9,000 items, that algorithm took about 23 and a half seconds to run.
00:01:05.360 So I had to consider my options. The first step I took was to reduce the need for recalculating the results by caching them, only recalculating when something changed. The problem with this approach is that I manage a lot of records, which meant I needed to find another way to achieve better performance.
00:01:20.000 The first thing I did was switch to Postgres. This decision was due to SQL having powerful capabilities like counting and grouping, which are precisely what I needed to improve my solution.
00:01:30.799 Here’s the new solution: at its core is just one SQL statement that selects certain columns, counts, groups, and orders them. After doing some data cleanup, I found that with the same 9,000 items, this process now takes about half a second.
00:01:54.479 So why is this the case? SQL is inherently more efficient at these operations than our Ruby implementations. Because of this, I can run the query whenever I want, and it scales much better. Therefore, when I have 9,000 records, I no longer have to wait 30 seconds for my page to load; I can generate graphs all day.
00:02:11.680 The takeaway from this experience is that while Ruby excels at many tasks, there are certain operations that SQL performs better. So, don’t hesitate to learn a bit of SQL if you’re not familiar with it, or to dive in and write some SQL. Thank you!