Talks
Speakers
Events
Topics
Sign in
Home
Talks
Speakers
Events
Topics
Leaderboard
Use
Analytics
Sign in
Suggest modification to this talk
Title
Description
RailsConf 2017: Processing Streaming Data at a Large Scale with Kafka by Thijs Cadier Using a standard Rails stack is great, but when you want to process streams of data at a large scale you'll hit the stack's limitations. What if you want to build an analytics system on a global scale and want to stay within the Ruby world you know and love? In this talk we'll see how we can leverage Kafka to build and painlessly scale an analytics pipeline. We'll talk about Kafka's unique properties that make this possible, and we'll go through a full demo application step by step. At the end of the talk you'll have a good idea of when and how to get started with Kafka yourself.
Date
Summarized using AI?
If this talk's summary was generated by AI, please check this box. A "Summarized using AI" badge will be displayed in the summary tab to indicate that the summary was generated using AI.
Show "Summarized using AI" badge on summary page
Summary
Markdown supported
The video titled *Processing Streaming Data at a Large Scale with Kafka* features Thijs Cadier presenting at RailsConf 2017. In this talk, Cadier explores the challenges of processing streaming data using a standard Rails stack, highlighting its limitations in scalability. He introduces Kafka as a solution for building an efficient analytics pipeline, elaborating on its unique properties that allow for scalable stream processing. Key points discussed include: - **Definition and Challenges of Streaming Data**: Cadier defines streaming data and discusses common issues such as database locking and the difficulties of handling concurrent updates from multiple sources. - **Database Performance Limitations**: He illustrates the performance bottlenecks faced when updating databases directly with each incoming log line, especially at high scales. - **Sharding and Load Balancing**: He describes attempts to shard data across multiple databases and the complications that arise from needing to query across these shards. - **Introduction to Kafka**: Cadier introduces Kafka as a distributed messaging system that allows for effective load balancing, routing, and failover capabilities, which are essential for processing large-scale streaming data. - **Key Concepts of Kafka**: He breaks down Kafka’s architecture, explaining fundamental components like topics, partitions, brokers, and consumers, emphasizing how they work together to ensure data is processed reliably and efficiently. - **Building an Analytics Pipeline**: Cadier walks through a practical example of setting up an analytics system using Kafka. He shares how logs are ingested, pre-processed, and aggregated through various Kafka topics to ultimately update a database with country visit statistics. - **Demo of Kafka Implementation**: The presentation includes a demonstration of the system in action, showing how incoming data is processed and how consumers handle scaling and partition assignment automatically. In conclusion, the presentation highlights the robust nature of Kafka in managing streaming data efficiently, allowing developers to scale applications effectively while minimizing downtime and processing overhead. The audience is encouraged to explore Kafka further for their own streaming data needs, as Cadier provides practical insights and guidance on implementation in a Ruby environment.
Suggest modifications
Cancel