Talks

Simplifying Logs, Events and Streams: Kafka + Rails

EuRuKo 2016

00:00:04.060 And yeah, now it's time for Terence to come to the stage. He is Heroku's Ruby lead, specifically the lead of the Ruby task force. He has worked on some open source projects you might have heard of, like Ruby the language and Ruby on Rails.
00:00:09.469 You know, small things like Bundler and Rescue. He also enjoys Friday hugs like the rest of us and believes in bringing people together for Ruby.
00:00:15.980 So let's give a warm welcome to Terence as he takes the stage.
00:00:52.210 Good afternoon! Oh god, let me turn this on. Nice.
00:00:59.800 Happy Friday! As I mentioned earlier, I enjoy Friday hugs. If you've seen me and talked to me before, you've probably seen me do one of these.
00:01:07.159 This is actually my second time here, so I’m excited to be back at another one. I have this photo I took back then, and I want to continue this tradition.
00:01:18.640 I know you all just sat in your chairs, but if you could all stand up. If you're not familiar with Friday hugs, we basically take a photo of people hugging the camera.
00:01:31.069 So I’d like you all to join me in doing a Friday hug. I collect these as memorabilia of this event.
00:02:08.720 (pause for photo)
00:02:22.030 Thank you for entertaining me with that! My name is Terence; I go by hone02 on Twitter.
00:02:29.099 I'm mostly known for my blue hat—not really for my work in open source, but I do have some blue hat stickers. So if you’d like some swag, definitely come by and talk to me. I work at Heroku on Ruby experience stuff with Richard Schneeman, who some of you might know for his work with Rails.
00:02:51.310 If you deploy Ruby apps on Heroku, you're using some of the stuff that we work on. So if you have any problems with that, come talk to us.
00:03:10.720 I brought a few shirts and stickers, along with some iron-on patches for Heroku, but it's pretty limited, so come by and say hi.
00:03:15.849 I live in Austin, Texas, where we have some fantastic tacos. If you're ever visiting town, definitely reach out to me! I’d be more than happy to take you out for some tacos.
00:03:35.669 I organize a conference called 'Keep it Weird,' which is also in Austin, and one of my favorite things about the Ruby community is that we can engage in fun and quirky activities like Friday hugs or Ruby karaoke.
00:03:49.269 For those of you who are not familiar with Ruby karaoke, it is exactly what it sounds like—you just get together and sing with a bunch of Rubies.
00:04:01.930 Some of my favorite moments on the conference circuit have been going around with people like Charlie Nutter and watching him sing Shakira. He is actually an amazing singer! I talked to someone yesterday at the speaker dinner about possibly getting a group together tomorrow after the conference ends, so if you’re around and you like to sing, we’ll try to get a crew together.
00:04:21.240 Now that I've finished with the introductions, let's talk about Kafka.
00:04:27.509 The agenda for today is to discuss Kafka, some design constraints, and terminology. We will also show you how you can use Kafka with Ruby.
00:04:33.460 We'll dive into a concrete use case scenario with code examples, and we’ll explore various patterns of how Kafka is used in real-world production applications at scale.
00:04:47.650 There won’t be any code samples in the latter part since it’s more general information about Kafka’s applications. I remember about a year ago, I heard of Kafka, but I had no real idea what it was—just that it was a buzzword.
00:05:07.860 One of the first things I did was go to the official documentation, which is an Apache project. The introduction states that Kafka is a distributed partitioned, replicated commit log service that provides the functionality of a messaging system with unique design.
00:05:32.080 I remember reading this and having no idea what it meant—just a bunch of buzzword salad.
00:05:37.630 So if I were to summarize it in just four words, I would say it's a distributed pub/sub messaging system.
00:05:51.250 There's a lot of terminology that’s similar to things you might be familiar with if you’ve used Redis, RabbitMQ, or AMQP, which all have similar messaging functionalities.
00:06:10.960 But Kafka has unique design constraints. It needs to be fast, scalable, and durable with respect to data loss and retention.
00:06:24.610 When we talk about performance, I remember a colleague of mine, Tom Crayford, who works on the Heroku Kafka team said that a small Kafka cluster can handle hundreds of thousands to millions of messages per second.
00:06:38.289 To give you a reference, with AMQP, you're talking about tens of thousands of messages per second on a single instance.
00:06:49.390 So with this small Kafka cluster, we are discussing an order or two of magnitude greater throughput, but this comes with certain design constraints that we'll explore.
00:07:04.020 I don't know how many of you are familiar with Skylight, the Rails performance monitoring tool done by the folks at Tilde.
00:07:16.870 Kafka is a core part of their architecture that enables them to deliver their product.
00:07:29.620 When you're in the business of performance monitoring, collecting a ton of data and metrics about your application's performance is essential.
00:07:41.200 They generate histograms based on those metrics, and Kafka serves as the backbone to handle all this streaming data before it's stored in Cassandra for processing.
00:07:55.240 One other nice aspect of Kafka is that it's one of the few pieces of their architecture that doesn't go down.
00:08:01.770 You wouldn’t want to use a product where, if the service goes down, you lose data or get some misleading representation of the histogram.
00:08:20.200 In Kafka, even if the service that calculates your histogram goes down during the process, Kafka saves positional data.
00:08:33.760 When it comes back, if it hasn’t committed that position due to incomplete processing, it will just start from where it left off.
00:08:39.220 So with tools like Skylight, even if their service goes down, you won’t lose stored data.
00:08:55.450 This is a significant reason why Kafka fits this particular use case.
00:09:01.350 If we step back and look at the terminology we will be discussing when designing and building with Kafka, it's primarily similar to producer-consumer systems that you're used to.
00:09:16.430 For example, if you're using Sidekiq in Rails, you’re familiar with the pattern where a series of tasks feed work into a data store—like sending email requests for new user registrations. In this case, the producers are the components creating this work.
00:09:40.510 Redis would serve as the Kafka cluster holding all this data, while the consumers are Sidekiq workers reading from the data store.
00:09:46.870 In Kafka, things you're sending into the Kafka cluster are referred to as messages. These messages are stored as byte arrays, meaning you can store various types of data.
00:10:05.019 By default, you can use strings, but you can also utilize structured data types like JSON or MessagePack, ensuring that the serialization type used is supported for later deserialization.
00:10:27.600 We create a large stream of messages due to high throughput; these messages are organized into concepts called topics.
00:10:39.760 Topics represent a feed of messages and are split into multiple partitions, allowing for replication and distribution across your cluster.
00:10:51.930 A single topic can be divided into partitions 0, 1, and 2, and within each of these partitions, the order of messages is guaranteed.
00:11:11.800 Since Kafka is a commit log, it allows only appending messages, ensuring that you can’t reorder them once stored.
00:11:25.880 This design grants nice guarantees about data retrieval. You can easily find a particular message by looking at the offset from the start, allowing you to replay any data based on these offset positions.
00:11:39.760 The Skylight team uses this approach for calculating histograms.
00:11:44.540 Messages in Kafka can be distributed across partitions, usually spread evenly, but you can assign a key to a message to ensure that all messages with the same key go to a specific partition.
00:11:57.140 This allows for optimizations since you know that all messages for, say, user ID 12 will go to one particular consumer.
00:12:10.970 On the consumer side, we also have this concept of consumer groups, allowing you to scale the consumption of work.
00:12:29.000 In the picture, we have two consumer groups, A and B, each connected to a Kafka cluster with four partitions. Both groups can subscribe to the same topics.
00:12:42.680 Within a given consumer group, each partition is split evenly amongst the consumers. Consumer group A, for instance, might have one consumer taking partitions 0 and 3, while another consumer handles partitions 1 and 2.
00:12:57.400 Scaling the consumers in a group causes Kafka to automatically redistribute the partitions evenly among the added consumers.
00:13:10.000 This makes it easy to scale a Kafka cluster to handle higher throughput.
00:13:27.290 Additionally, if consumer six drops in consumer group B, a heartbeat system detects this and stops assigning that consumer any partitions, passing those on to other consumers.
00:13:42.640 Kafka makes this scalability simple and built into its architecture.
00:13:50.889 When discussing distributed systems, we must also consider how messages are delivered, particularly due to CAP theorem.
00:14:05.200 Most people seek to guarantee that a message is delivered exactly once, but that is challenging in a distributed context. Kafka provides options, defaulting to at least once delivery.
00:14:19.150 This means that when subscribed to a topic, messages are delivered at least once, but if faults occur, it may deliver some messages more than once.
00:14:36.600 You must implement duplication detection to ensure that messages are only treated once. In contrast, the at most once delivery guarantees messages are sent only once but does not ensure every message is received.
00:14:54.880 Most applications typically adopt the at least once delivery model and include their duplication detection.
00:15:12.640 Using Kafka’s messages and topics, you can guarantee that messages are written in order for a given partition and retrieve those messages in the same order.
00:15:30.750 And since topics are distributed and replicated across your cluster, if any nodes fail, you still have access to the messages.
00:15:49.290 Now that we have an understanding of Kafka, let's discuss how to use it within Ruby.
00:16:02.360 One of the oldest Kafka libraries available is JRuby Kafka, which simply wraps around the official Kafka libraries written in Scala.
00:16:16.740 The downside is you may need to be comfortable with reading Java API documentation. For those using MRI Ruby apps, converting to JRuby might be challenging and potentially infeasible if using native C extensions.
00:16:35.800 Fortunately, Zendesk has been working on the Ruby Kafka library, enabling development in both JRuby and MRI while leveraging Kafka.
00:16:53.330 Older libraries such as Poseidon only support older Kafka versions, while Kafka 0.9 introduced many new APIs around consuming messages with consumer groups.
00:17:10.890 We'll focus on Ruby Kafka because it supports version 0.9 APIs, which offer improved functionalities.
00:17:30.740 To create a simple message producer for Kafka, we must first establish a connection to it. This looks similar to a Redis connection.
00:17:52.020 Since we're dealing with a cluster, you pass in a set of seed brokers, which are the initial instances to connect to. Once connected, Kafka can provide the rest of the cluster’s network topology.
00:18:03.030 To produce a message, you can instantiate a synchronous producer that sends messages to the cluster, specifying the topic and optionally a partition if desired.
00:18:27.540 If the message delivery fails, an exception is raised. You can decide how to handle this failure, implementing either an at least once or an at most once delivery mechanism.
00:18:43.020 For Rails applications producing messages, asynchronous producers are recommended to avoid blocking web requests during message delivery.
00:19:00.290 You can specify thresholds on the number of messages in the buffer or how long to wait before sending messages to Kafka.
00:19:19.700 For example, you can take a simple Ruby hash, serialize it to JSON, and send that to the Kafka cluster.
00:19:37.690 On the consumer side, we call JSON.parse on the message values to process them.
00:19:54.130 In Rails, you can set up your Kafka connection and initialize the producer within an initializer. You’ll pass in the Rails logger to keep track of Kafka producer logs.
00:20:07.440 It’s important to ensure that the Kafka producer shuts down properly to avoid leaking connections to the Kafka cluster.
00:20:30.210 Now, let’s dive into an example within the orders controller, creating orders and sending them into Kafka.
00:20:48.590 You can convert an ActiveRecord object to JSON and send that to Kafka. If desired, associate an order with a user ID to ensure all messages for that order go to the same partition.
00:21:06.150 Now, we need to process the messages inside Kafka, which is where the new APIs from version 0.9 come in.
00:21:23.480 These APIs allow us to create consumer groups, which are essential for scaling message consumption.
00:21:43.480 You would instantiate Kafka like before but call consumer, passing in a unique group ID to differentiate the type of work being done.
00:21:55.290 You can subscribe the consumer to multiple topics and process messages as they arrive. Each message retrieval is a blocking call.
00:22:09.540 The messages can contain information like value (body), key, and partition, which allows for flexible data handling.
00:22:26.820 Kafka's version 0.9 also supports SSL connections, which is crucial when Kafka is not on the same machine as your Rails app.
00:22:45.860 Now, let’s walk through a demo app focused on metrics management based on web traffic.
00:23:01.360 We will calculate metrics like response time, service time taken by requests, status codes, and use Kafka to manage this high-throughput data.
00:23:12.310 The architecture involves a main app that receives web requests, logs pertinent routing data, and sends it to the metrics app for storage.
00:23:31.790 For this example, I'm using a Rails application named 'Ctriage,' created for engaging with open-source contributions. Users can follow repositories and get reminders for issues.
00:23:50.200 Ctriage is hosted on Heroku. With it, we gather the router logs and build a Ruby app to produce messages sent to Kafka.
00:24:05.460 Once messages are in Kafka, we can create another Ruby app to consume them, aggregate data, and store it in Redis, allowing various applications to access it.
00:24:16.920 Next, let’s examine how we retrieve information from Heroku's router logs, which capture incoming requests.
00:24:28.650 Each log entry provides various data, including the requested path, connection time, service time, and status code, vital for accumulating metrics.
00:24:46.120 On Heroku, we can add an HTTPS drain to send all logging data to a specified endpoint.
00:25:05.000 In this case, we will host an app on Heroku that takes the log data and sends it to Kafka for processing.
00:25:21.080 The POST request body will be a collection of these log messages. The headers indicate how many messages are included in the body.
00:25:37.650 Note that Heroku's logging system is not entirely compliant with the RFC 5424 Syslog specifications.
00:25:54.000 However, various libraries and tools exist in Ruby to parse this data effectively.
00:26:06.130 Now, let’s take a look at an app that processes this log data.
00:26:19.370 This Sinatra app will handle POST requests, returning a 202 status to acknowledge receipt of messages.
00:26:38.140 Since the Kafka producer is not thread-safe, we use a connection pool to ensure single-threaded access.
00:26:53.820 Within the `process_messages` method, chunk the incoming body into an array of messages to send to Kafka.
00:27:08.750 By using an asynchronous producer, there's no need to call deliver messages, enabling Kafka to manage message sending based on identified thresholds.
00:27:20.000 Kafka facilitates the setup of a cluster, but running it as a service can be challenging due to its open-source nature.
00:27:37.390 Running a cluster involves configuring Zookeeper for synchronization across nodes, and if you're from the Ruby ecosystem, you may not be familiar with JVM technology.
00:27:54.420 Heroku is launching a Kafka service soon to alleviate the setup challenges, allowing you to create it as an add-on and have everything managed for you.
00:28:12.320 Once your Kafka cluster is up, you can access it using various commands to inspect performance, topics, and messages.
00:28:41.000 When developing locally, Kafka auto-creates topics; however, we recommend disabling this feature in production environments.
00:29:01.000 To prevent typos or other issues, explicitly create topics you plan to use with Kafka.
00:29:20.950 Kafka CLI allows you to monitor the Kafka cluster's activity, topics, and message flow, which is crucial for performance monitoring.
00:29:30.800 Now, on the consumer side, we can create a metrics consumer to monitor the previously created router topic.
00:29:50.750 By utilizing a unique group ID for the consumer, it will start pulling messages from Kafka, processing each one as they come in.
00:30:06.750 This consumer can initiate connections to Redis where this metric data will be stored.
00:30:23.690 Once in Redis, you can analyze, display, and utilize the data as needed for various applications.
00:30:39.420 Kafka’s structure allows for easy aggregation and playback of data. Separate consumer groups can process the same streams of data in different ways.
00:30:54.840 For example, one consumer can track metrics, while another replays web traffic against a staging app based on the same logs.
00:31:09.320 I have made all this code available on GitHub, where you can explore and experiment with these functionalities.
00:31:23.720 In the GitHub repository, you can learn how to set up everything using Docker, allowing you to manage Kafka and Redis without local installations.
00:31:41.490 Now, let's look at some other potential use cases for Kafka.
00:31:59.920 Using Kafka as part of a messaging system is a great application, especially in high-volume financial transactions where data loss is unacceptable.
00:32:17.180 Kafka’s durability ensures that throughput remains consistent and allows low latency, perfect for requirements where data management is critical.
00:32:33.940 Kafka was originally created by LinkedIn to track user activity. You could create activity topics to monitor various organizational aspects.
00:32:49.080 In the eCommerce space, companies like Shopify probably leverage Kafka for activity tracking. At Heroku, we also use Kafka to power our metrics and resource usage displays.
00:33:05.660 Gathering information from millions of apps and processing it for efficient performance monitoring is crucial, and Kafka handles this well.
00:33:24.850 Finally, the Heroku API Event Bus is an excellent example of Kafka in application. When actions are taken via API calls—like scaling up or down—it's important to manage and order those messages.
00:33:38.140 Different teams rely on this event bus for insights, ensuring they maintain operations without needing to synchronize with one another.
00:33:53.390 I hope this talk has provided insights into the capabilities of Kafka and potential applications within the Rails ecosystem.
00:34:10.030 As businesses shift to rely more on durability, scalability, and speed, utilizing technologies like Kafka becomes increasingly relevant.
00:34:22.360 The issues you might have faced scaling with Redis could be remedied with Kafka's design.
00:34:36.340 That will conclude my talk, and since the organizers have informed me that I'm out of time, thank you for your attention!
00:34:49.210 I will be available until Sunday, so if you have any questions, please feel free to come and talk to me.
00:35:05.020 Thank you, Terence!
00:35:16.369 You!