Building a real-time analytics engine in JRuby

The video titled "Building a real-time analytics engine in JRuby" features speaker David Dahl from Burt, a company based in Gothenburg, Sweden, that specializes in online advertising analytics. In this session at wroc_love.rb 2013, Dahl discusses how Burt successfully built a real-time analytics engine using JRuby, highlighting various strategies and tools employed to manage high-volume data processing effectively on AWS.

Key points from the presentation include:

Initial Challenges: Burt's journey began with traditional methods of logging data into databases, which failed to meet demands during major campaigns, leading to the realization that pre-calculation and a distributed system were necessary.
Transitioning to JRuby: After facing limitations with MRI Ruby, Dahl's team switched to JRuby to leverage Java's capabilities for better threading and performance, while still using Ruby’s developer-friendly syntax.
System Architecture: The system architecture is designed around a distributed model where each application or service processes specific tasks, allowing for real-time data tracking and metric generation.
Use of RabbitMQ: RabbitMQ emerged as a crucial tool in their architecture for handling inter-process communication and service isolation, significantly aiding in buffering operations and avoiding data loss.
Database Strategy: Dahl shares insights on selecting appropriate databases like MongoDB and Cassandra, emphasizing that while MongoDB is effective, it can also be problematic if misapplied.
Concurrency Management: The presentation underscores the importance of Java’s concurrency tools, such as blocking queues and atomic variables, to manage multithreading complexities in JRuby applications.
Performance Over Optimization: The focus shifted from maximizing performance to scalable solutions, allowing Burt to utilize AWS’s Elastic MapReduce while accepting the inherent trade-offs of using JRuby over Java.
Lessons Learned: Dahl highlights the significance of making operations idempotent and utilizing a well-defined acknowledgment strategy for data processing to ensure system reliability and data integrity.

The presentation concludes with a Q&A segment where Dahl addresses specific inquiries about the performance comparisons between JRuby and Java, as well as RabbitMQ's role in their architecture. Overall, the talk is a comprehensive overview of leveraging JRuby in building a robust real-time analytics engine, showcasing how combining Ruby's flexibility with Java’s performance can yield effective results in web-based data analytics.

Building a real-time analytics engine in JRuby
David Dahl • April 10, 2013 • Wrocław, Poland

This video was recorded on http://wrocloverb.com. You should follow us at https://twitter.com/wrocloverb. See you next year!

When most people hear big data real time analytics, they think huge enterprise systems and tools like Java, Oracle and so on. Thats not the case at Burt. We're building a real time ad analytics engine in Ruby, and we're handling tens of thousands of requests per second so far. Thanks to JRuby we can have the best of two worlds, the ease of developing with Ruby, and the robustness of the Java echosystem. In this talk you'll discover how we built a massive crunching pipeline using only virtual servers on AWS and free tools like JRuby, MongoDB, RabbitMQ, Redis, Cassandra and Nginx.

wroclove.rb 2013