Talks

Karafka - Place Where Ruby, Rails and Kafka Meet Together

wroc_love.rb 2017

00:00:11.690 Hi everyone! Okay, everything is working great.
00:00:17.449 About the panels, it's kind of funny that we have a Ruby conference and instead of having Ruby questions, we'll be talking about an excellent JavaScript framework. But that's okay.
00:00:30.540 Yes, let's start. My name is Maciej Mensfeld.
00:00:36.719 As mentioned, I'm a software engineer. For the past ten years, I've been commercially working mostly with Ruby, specifically Ruby on Rails, but also in working with plain Ruby.
00:00:51.530 I'm the creator of the Kafka framework, and the past year I've spent building a platform called Code, which provides comprehensive insights into employee programmers' evaluations.
00:01:05.939 If you have any questions, you can drop me an email. I also run a blog primarily about Ruby, and here’s my Twitter account.
00:01:14.759 Please notify me if I speak too fast. If you have any questions, just tell me. If you want me to explain something better, don't hesitate to let me know. I think it's always better to have a bit of interaction between us, so feel free to ask questions. Otherwise, you may forget them by the end of the talk.
00:01:36.840 Here is the agenda: We'll talk a bit about Kafka itself. I will provide a short introduction to what Kafka is and how it can help you with Rails-based applications, along with some interesting use cases. I hope you find them fascinating. So, let's begin.
00:02:01.829 First, how many of you have heard about Kafka? Awesome! And how many of you use Kafka on a daily basis? It's a great technology.
00:02:24.530 Just to remind you, Kafka is a distributed streaming platform and a messaging system. Kafka is tricky because you can refer to it by many names based on how you use it. It's designed to handle a lot of data and to be extendable without any downtime. It was designed with broadcasting in mind, which is one of the greatest features of Kafka.
00:02:54.959 It allows you to build event-driven systems and can act as a backbone for larger organizations, making it usable with multiple applications and projects simultaneously. It scales exceptionally well. Many major players, like LinkedIn, utilize Kafka to manage vast amounts of data. For instance, LinkedIn transfers around 600 gigabytes of data per day using Kafka; that’s a lot!
00:03:39.720 Shopify is another example; they use Kafka effectively. The initial design is really simple: you have a Kafka cluster, and you can send any type of message, be it binary, JSON, or string messages into it. You can take messages out, and you can even stream data directly from the database using connectors. This means you can instruct your PostgreSQL database, for example, to send any data changes to Kafka.
00:04:22.620 The whole Kafka concept revolves around topics, which you can think of as namespaces where messages go. Topics are divided into partitions, depending on your case. You don’t need to use partitions; a single partition will work quite well. However, if you need to scale due to large amounts of data, you can create partitions.
00:05:05.550 You can create consumer groups to ensure that only one process will receive a given message from Kafka. This is really useful when you have multiple applications, as each application can act as a separate consumer group.
00:05:41.910 Now, moving on to Ruby: while it may not be the fastest language, it works really well with Kafka. There are a few implementations of Kafka clients for both writing to and reading from it, and which one you use will depend heavily on your use case.
00:06:07.669 The first and oldest is Kafka-rb, which is merely a wrapper around the native Java client for Kafka. Its biggest advantage is its support from the Kafka team, ensuring it won’t be outdated and will be maintained. However, it does not work with MRI, meaning that if you have an MRI application, you will need to rewrite your application in JRuby, stepping into the Java world.
00:06:44.010 There used to be a library called Apple Cider, which was great, but it's no longer maintained. People still use it, although it’s not advisable to use unmaintained software. I don't recommend it, as it has many downsides and only a few advantages. The newest maintained option is Ruby-Kafka, which covers almost the entire API of Kafka, specifically version 0.10. It’s also the default driver for the Kafka framework.
00:07:38.570 There is a tool called Phobos, built on top of Ruby-Kafka, which is a micro-framework designed to simplify Kafka-based application development. It allows developers to have a Rails-like feel while working with Kafka. Kafka works differently than standard HTTP requests, but Karafka lets you maintain that familiar development performance.
00:08:49.529 Developers should focus on business logic instead of getting caught up in low-level details of working with Kafka. Karafka, as I mentioned, is designed to enable faster Kafka-based application development and to manage event processing more efficiently than with traditional HTTP APIs.
00:09:08.610 As I was saying, there are too many things developers need to understand to work directly with Kafka using Ruby or any other driver. When adding a new team member, you expect them to do their job without spending weeks figuring out how your software works.
00:10:38.310 Many of you have heard about Kafka, yet few regularly use it. This raises the question of whether it’s worth the effort; perhaps standard HTTP APIs are sufficient for most cases. I believe the biggest disadvantage of HTTP at the architectural level is its lack of broadcasting.
00:11:06.640 Producers and consumers of messages must be aware of each other, which complicates things. If I make an HTTP request, I need to know where I'm sending the message. In contrast, using a single broker means you can send messages without needing to know who will receive them.
00:11:50.200 We constantly operate asynchronously in life; for example, we send emails and don't wait idly for replies. Thus, I question why our applications don’t function similarly. Why do we constrain ourselves to synchronous operations, waiting for answers before proceeding?
00:12:38.029 Kafka was designed to address these issues. Its architecture draws from concepts like Ruby-Kafka, Celluloid, and Sidekiq, which help to maintain a rail-like feel for larger applications. When developing a smaller job, it can follow a simple pattern, utilizing a single file and configuration.
00:13:32.770 Karafka emulates some aspects of Rails with its routing engine, application controller, application worker, CLI, and responders, which operate similarly to Rack responders. When a message is received, it can be processed inline or through Sidekiq depending on your setup.
00:14:30.010 Karafka operates as a gem, which you can easily install by creating a Gemfile. It also offers extended options beyond simple routing to accommodate different data types, allowing for more complex message handling.
00:15:34.640 Kafka's design allows for a high level of versatility with message types and configurations. You can dynamically create groups, ensuring that each message is sent to just one application out of many. This makes systems easier to manage under the hood.
00:16:53.140 Within the controller, the primary action is called 'perform,' where the code processes incoming messages from Kafka. The parameters are typically JSON formatted, so if you transition from HTTP APIs to Kafka, maintaining functionality at this level is simple.
00:18:02.650 You can replicate your existing code and ensure it works without making significant changes. Karafka also mimics Rails actions like before filters for message management.
00:19:06.500 With Kafka, you can handle multiple outputs for messages, making it particularly beneficial for complex systems built on Karafka, allowing for easy event reaction without needing to know the event's source.
00:20:40.490 In terms of performance, it relies heavily on your usage. Karafka is designed for asynchronous processing without added complexity, enabling you to manage operations efficiently.
00:21:26.880 With Kafka, you can scale through threads, partitions, and multiple processes, enhancing processing speed as needed. Sidekiq aids in managing re-entry and retries for failed jobs, integrating smoothly with Rails.
00:22:48.180 As companies grow from using a simple model to more complex architectures, they often face challenges in maintaining scalability and feature delivery. Kafka provides a pathway for a gradual transition to microservices, allowing you to introduce new features incrementally.
00:23:59.580 You can start sending crucial data to Kafka without changing everything at once. If new requirements arise, setting up a new Kafka application can isolate new functionalities while maintaining the existing architecture.
00:25:45.940 Karafka includes a simple application wrapper called WaterDrop, which enhances message handling by fixing issues present in Ruby Kafka. It simplifies sending messages, letting you choose the format you want, be it JSON, binary, or XML.
00:27:25.850 More practical examples include Shopify using Kafka for system performance monitoring and my startup, Code, which processes large volumes of commits and evaluations daily, efficiently utilizing Kafka as a communication backbone.
00:29:04.000 Kafka allows for a dynamic environment where applications can be taken offline for maintenance without losing data. This flexibility is crucial in maintaining operational efficiency.
00:30:11.580 Kafka can also assist in migrating monolithic applications by allowing for the gradual extraction of functionalities into independent systems, thus simplifying complex architectures.
00:31:23.600 If you're interested in contributing, we're on GitHub and welcome stars to encourage community use and awareness of the framework.
00:32:05.370 Thank you! Are there any questions?
00:32:13.720 A question arises regarding brokering choices when using Kafka compared to others like RabbitMQ. When considering a medium, I chose Kafka primarily for its broadcasting capabilities.
00:34:54.300 While considering such tools, Kafka consistently proved user-friendly in terms of monitoring and scalability. Sidekiq was chosen for task management due to its superb handling of retries and UI.
00:36:01.000 Operational support for Kafka was discussed; in case of crashes, data remains safe and can be processed post-recovery. It enables efficient recovery strategies, leveraging offsets for seamless operations.
00:38:34.660 Thank you for your engagement and great questions!