Supercharge Your Workers with Storm

In his talk at RailsConf 2014, Carl Lerche introduces Apache Storm, a distributed real-time computation system designed to enhance background job processing, specifically in scenarios where using a database for coordination leads to significant performance issues. Given the increasing demand for processing large data volumes quickly and reliably, Lerche articulates the value of adopting Storm in modern applications.

Key points covered in the presentation include:

- Overview of Storm: Lerche describes Storm as a powerful and distributed worker system, explaining its benefits of distribution and fault tolerance alongside its operational complexities, such as the need to manage a Zookeeper cluster, Nimbus processes, and worker processes.

- Use Case - Twitter Trending Topics: Lerche uses the example of processing Twitter hashtags to illustrate Storm’s capabilities. He explains how hashtags can be tracked and their trend rates calculated using an exponentially weighted moving average, which requires efficient data processing in real-time.

- Operational Overhead: The presentation highlights the operational challenges and overhead when scaling systems using traditional methods like Sidekiq or Redis. Lerche showcases how using memory caching can improve performance before introducing Storm to better manage background tasks.

- Core Concepts of Storm: He details Storm's fundamental abstractions, which include streams, tuples, spouts, and states, emphasizing how they facilitate data flow and processing across different systems.

- Building a Data Processing Topology: Lerche outlines how to implement a data processing pipeline using Storm's API and establish a topology for processing tweets, extracting hashtags, and aggregating counts efficiently.

- Handling Failures: He discusses the inevitability of failures in distributed systems and how Storm manages message processing guarantees such as at-least-once processing, explaining its approach to failure recovery and how it prevents message loss.

- Final Thoughts and Impact: Lerche concludes with a recap of Storm’s capabilities, emphasizing its power in handling stateful jobs, complex data processing flows, and its overall value for developers looking for scalable solutions to data handling problems.

Overall, Lerche's talk provides an in-depth exploration of Apache Storm's features and benefits for developers, reinforcing the importance of real-time data processing in today's applications.

Supercharge Your Workers with Storm
Carl Lerche • May 06, 2014 • Chicago, IL • Talk

If you have ever needed to scale background jobs, you might have noticed that using the database to share information between tasks incurs a serious performance penalty. Trying to process large amounts of data in real-time using the database for coordination will quickly bring your servers to their knees.

This talk will show you how to use Apache Storm, a distributed, realtime computation system, to solve these types of problems in a fast, consistent, and fault-tolerant way.

Carl is currently working at Tilde, a small company he co-founded, where he spends most of his day hacking on Skylight and drinking too much coffee. In the past, he has worked on a number of ruby open source projects, such as Bundler and Ruby on Rails.

Help us caption & translate this video!

http://amara.org/v/FG1m/

RailsConf 2014