Distributed Systems
Karafka - Ruby Framework for Event Driven Architecture

Summarized using AI

Karafka - Ruby Framework for Event Driven Architecture

Maciej Mensfeld • May 31, 2018 • Sendai, Miyagi, Japan

In the presentation titled "Karafka - Ruby Framework for Event Driven Architecture" at RubyKaigi 2018, speaker Maciej Mensfeld discusses the implementation and benefits of the Karafka framework, which is designed for event-driven architectures using Kafka. The session emphasizes how Karafka simplifies application development, allowing developers to focus on core business logic rather than the complexities of Kafka itself.

Key points discussed include:

- Introduction to Event-Driven Architecture (EDA): EDA focuses on sending and consuming events to reflect the state and history of the system. Immutability is vital in maintaining data integrity.

- Sagas: A technique for managing complex business transactions using long-running processes. Sagas handle events as they happen rather than as single snapshots.

- Benefits of EDA:

- Reflects reality by preserving the whole history of events rather than a current state snapshot.

- Enables the analysis of data for insights like customer behavior patterns.

- Avoids direct communication between applications, promoting scalability and fault tolerance.

- Drawbacks of EDA:

- Asynchronicity can complicate debugging and lead to data inconsistencies.

- Events cannot be easily removed or modified once created; they require additional event generation for corrections.

- Overview of Kafka: Kafka serves as a distributed streaming platform that facilitates high-throughput messaging, allowing applications to send and receive messages easily. Key concepts include producers and consumers, message immutability, and efficient data processing via partitions.

- Karafka Library: Karafka is aimed at simplifying Kafka integration in Ruby applications. It manages connections, threading, and routing, allowing developers to implement event-driven microservices without diving into Kafka’s complexities. The framework can handle high message volumes and integrates well with existing Ruby applications, emphasizing flexibility and performance.

- Castle Example: Mensfeld shares how his company, Castle, uses Karafka to monitor user behavior and detect anomalies in large data sets, processing about 100 million events daily while maintaining user safety.

In conclusion, Karafka enables developers to leverage the power of event-driven architecture seamlessly while maintaining a focus on delivering business value. The talk encourages exploration of EDA and the practical applications of Karafka within the Ruby ecosystem.

Karafka - Ruby Framework for Event Driven Architecture
Maciej Mensfeld • May 31, 2018 • Sendai, Miyagi, Japan

Karafka allows you to capture everything that happens in your systems in large scale, providing you with a seamless and stable core for consuming and processing this data, without having to focus on things that are not your business domain. Have you ever tried to pipe data from one application to another, transform it and send it back? Have you ever wanted to decouple for existing code base and make it much more resilient and flexible? Come and learn about Karafka, where it fits in your existing projects and how to use it as a messages backbone for a modern, distributed and scalable ecosystem.

RubyKaigi 2018 https://rubykaigi.org/2018/presentations/maciejmensfeld

RubyKaigi 2018

00:00:01.250 All right, this is working. That's good, and this is working. That's even better.
00:00:07.309 Ohayo gozaimasu, thank you for coming.
00:00:12.450 I don't know why I have a borrow top of it... Oh yeah, now it's bigger, now it's not.
00:00:17.910 Good. Yeah, I'm Maciej Mensfeld.
00:00:24.380 I'm a Ruby programmer, Software Architect, whatever you call it. It doesn't really matter, does it?
00:00:29.519 I do open source, and I do it a lot.
00:00:34.590 You can find my code in many projects, not only the one that I will talk about. You can find my code also in Dry-RB,
00:00:41.879 libraries in Trailblazer, and a couple of other places.
00:00:49.170 I work at Castle, and that will be my proof-of-concept example of how using Kafka can work efficiently.
00:00:55.820 It took me 24 hours to get here, so I have a small jet lag.
00:01:01.620 I live in Krakow. I had to go from Krakow to Warsaw, then from Warsaw to Tokyo, and from Tokyo here.
00:01:11.310 But it's fun, and I hope you had a nice experience traveling here as I did.
00:01:17.340 I'm also an organizer of the Krakow Ruby User Group, which is one of the biggest Ruby user local groups in Europe.
00:01:23.250 We have over 1200 members, and we meet on a monthly basis, always with two technical talks and a lot of beer.
00:01:30.329 Feel free, you're invited always.
00:01:35.520 We meet with around 100 to 150 people. We'll be talking about beer, you know, the usual stuff.
00:01:41.340 This is my first time at RubyKaigi, and my first time in Japan.
00:01:49.380 I was joking that I’m a Kafka practitioner and I love Ruby, but I've missed it.
00:01:58.079 Funny fact: I actually own a tree in Japan. I took part in a contest and I won a tree somewhere here.
00:02:04.680 So, in this place in a forest, there is a tree with my name on it for the next 12 years.
00:02:12.060 It's super cool! My wife and I run a blog mostly about Ruby stuff and Kafka-related things.
00:02:17.670 If you have any questions here or just want to email me, I'll be going to every single after-party and event that happens during RubyKaigi.
00:02:31.650 So if you want to talk with me about virtually anything, just buy me a beer, and it'll be fine, I promise.
00:02:41.430 But mostly I do this stuff, and yeah, I tend to get nervous. I have jet lag.
00:02:46.709 So if I speak too fast, just throw something at me.
00:02:52.739 If you have any questions, please just ask. I hope you get something interesting out of my talk. I think it's interesting, so that's good.
00:03:07.590 I'll go through some basics. I need to keep it on time. I will detail some small things about Kafka: how it works and how it has changed over the past 5 years.
00:03:29.519 What is Karafka and what is the mindset behind it? It's a bit different than doing typical Ruby on Rails stuff, but I will get to that.
00:03:34.620 Here are some usual, boring definitions just to remind us about event-driven architecture.
00:03:43.680 Event-driven architecture is super simple: you just send events and consume them down.
00:03:50.250 Immutability is crucial. If you create something, just preserve it, even in human design.
00:03:57.630 How can I explain this easily? I want to mention Anton, who will have a talk today. Please go to his talk; he will explain this better than I do.
00:04:09.180 He has some great animations. To just summarize, you create events, and you don't remove them.
00:04:18.390 One more complex thing that I really love about event-driven architecture is sagas, or process managers.
00:04:27.440 It's a technique for tackling complex business transactions in a more manageable way. You build up entities that are responsible for handling long-running transactions.
00:04:35.450 And by transactions, I don't mean database transactions; I mean business logic transactions.
00:04:43.680 A simple example of event-driven shopping carts... usually... when people build it...
00:04:44.960 If I ask you to build me a shopping website in Ruby on Rails, you would just build a shopping cart.
00:04:51.380 Let's say it includes items like books. Each time you add a new book, you would just increment the counter by +1.
00:04:59.540 But that's not how it works with event-driven design shopping carts.
00:05:05.600 You just add events that something happened or that something changed; you materialize it if you need the current state.
00:05:11.930 So you just push it: you have the current representation of reality, but behind the scenes, you have the overall history.
00:05:19.130 If you want to learn more about that, go to Anton's talk. I will be talking more in the Ruby context and in the Kafka context.
00:05:26.360 There are certain benefits to event-driven architecture. The first one is that it reflects reality. Even if I were to ask someone here, 'How are you?' you wouldn't tell me, 'I'm fine now,' but you would tell me about your day.
00:05:43.600 You would say things like, 'I woke up, had a headache, didn't eat anything, and I got stressed because I have a presentation.' You give the whole picture. The same occurs in event-driven architecture: you get the whole picture.
00:05:55.080 The problem is we don't build our software that way. I don't know why, but people tend to have this snapshot of reality as it is now, rather than how it was and how it is now.
00:06:05.600 We don't care about history that much, obviously. Here, we have some logs for production; we have statistics on how the system performs.
00:06:10.610 But why don't we apply the same logic to our business logic? Truth be told, data is the new oil.
00:06:14.350 You may not be aware of the future things you could do with the data you have now. If you throw it away, you won't be able to do anything with it.
00:06:22.309 You may end up wishing you still had it. Even if you don't use it, store it. Storage isn't that expensive.
00:06:30.890 Let's return to our shopping cart example: if you maintained only the current state, you wouldn't be able to ask yourself some useful questions.
00:06:35.780 When you start doing machine learning, you may be able to detect correlations between products that people were willing to buy together.
00:06:44.960 If you have the knowledge, you can ask yourself why it takes so long for customers to buy something on your website.
00:06:50.024 There must be a problem because they started the whole process five hours ago and are finishing it now, so something is suspicious.
00:06:56.320 But if you only see successful transactions, you can't answer that question. You might have five times more customers buying if you tracked that.
00:07:04.820 Another benefit of event-driven architecture is that thanks to this approach, you don't have to communicate between applications directly.
00:07:10.100 That’s super useful because if you communicate directly, it creates many problems.
00:07:16.130 You have to maintain things on both ends, and you can’t just shut one down for a moment. If you need to shut it down for an hour or a day, you might forget to do that.
00:07:24.000 Thanks to immutability, it’s great for performance, availability, and scaling.
00:07:31.540 You can decompose your problems, challenges, features, and functionalities, and maintain them separately.
00:07:40.210 I know microservices aren't always the best; long-lived huge monoliths, but within a monolithic ecosystem, you can extract really small functionalities.
00:07:50.100 These will provide you certain benefits without the overhead of having them in the main code base.
00:07:54.850 The quality is much better, believe me on that. If not, get a beer after the talk.
00:08:04.400 But there are downsides, and they are pretty serious. One of them is the asynchronicity, which makes debugging a bit harder.
00:08:15.720 They aren't all bad, but it can still happen somewhere in the background.
00:08:24.460 If you have events that trigger others, you might end up creating huge loops; this causes a lot of noise in troubleshooting.
00:08:34.020 Eventual consistency is also a problem because there are systems that don't work well with it.
00:08:42.270 Most applications I build are fine with eventual consistency, but if you cannot manage it, you might do better without it or stick to transaction-based systems.
00:08:53.590 Kafka provides transactions, but it's still eventual consistency, and your data may remain inconsistent for a while.
00:09:02.650 For example, when you shut down an app for maintenance and later restart it, the materialized view will not reflect what you have in other systems until you replay all the data.
00:09:12.900 The problem with immutability and not removing data is that if you want to correct something, you're required to generate more events.
00:09:20.740 You can't just go into a database and patch something up; you need to trigger events to update the current state.
00:09:29.960 If you start altering the database directly, you'll create inconsistencies in your data.
00:09:37.300 This doesn’t apply if you’re behind an aggregation route. Then it’s fine.
00:09:44.400 If you have something simple to do and you have a month to build a prototype, please don't even try setting up Kafka for production.
00:09:52.240 Just use Rails with Active Record; build stuff to validate your business. It works.
00:09:58.910 Think about it; if it doesn’t work, you save yourself a lot of time.
00:10:02.750 How many of you have heard about Kafka? Please raise your hands.
00:10:06.990 Yeah, about half the people? Okay, I will just give you an overview.
00:10:12.370 Kafka is defined as a distributed streaming platform and a high-throughput distributed messaging system.
00:10:20.000 It provides broadcasting, and it’s much easier to set up than it was a couple of years ago.
00:10:28.190 There's documentation available, so I won't read it since you can access it yourself.
00:10:35.740 Think of Kafka as a messaging app for applications, similar to how Slack is for teams.
00:10:42.300 Each application sends messages to channels and reads from them.
00:10:50.380 Kafka is widely used, with companies like LinkedIn and Shopify utilizing it.
00:10:57.810 Kafka works with producers and consumers. A single application can act as both.
00:11:05.300 This means you can produce messages and also consume them.
00:11:12.850 Sometimes you just have applications that consume messages without sending them back.
00:11:17.500 Those are the best because you don't have to worry about them. Just process the data.
00:11:24.330 There are connectors that act as both producers and consumers, allowing you to send every single change from your database to Kafka.
00:11:31.070 Kafka allows you to utilize this data directly.
00:11:40.000 There are stream processors that transform streams of data. Kafka is about topics.
00:11:47.930 Topics can have multiple partitions, and within each partition, the order is always maintained.
00:11:56.280 You append data to the end of the topic partition.
00:12:02.080 Consumer groups share a common name, and this concept is used for load balancing.
00:12:09.320 Kafka guarantees that a single message will be delivered to at most one consumer within a consumer group.
00:12:19.220 Messages in Kafka are immutable; you set a written offset, which means they might be lost.
00:12:25.080 This is important in the GDPR context because in Europe, you are required by law to remove all private data if requested.
00:12:35.950 With Kafka, you can't just remove a single message from within the partition.
00:12:42.700 If something happened, you can’t change it unless you reside in certain countries.
00:12:50.900 Messages are versatile; you can use strings, JSON, Apache Avro format, or anything else that suits your existing system.
00:12:58.650 It's great for scaling: the more partitions you have, the more data you can process simultaneously.
00:13:06.470 Each partition must fit on a single server; I’ve never encountered this issue.
00:13:11.900 Order is crucial, and strong ordering with partition keys is an essential feature.
00:13:17.610 It gives you extreme flexibility and freedom in doing in-memory computation.
00:13:27.950 You don't have to store data in a database; it can simply exist in memory, in the Ruby process.
00:13:34.070 And if you kill the process, just spin up a new one to confirm the data.
00:13:42.170 This is useful because memory is faster than hard drives.
00:13:49.550 Using keys allows you to tell Kafka to place all messages with the same key on the same partition.
00:13:56.140 For instance, you can send all events related to a single user interaction using their ID as a key.
00:14:01.850 Kafka will automatically pick a partition, ensuring that all user data always goes to the same partition.
00:14:08.470 As a result, a single consumer can continuously process that data, with no risk of switching the partitions.
00:14:15.240 This functionality is useful for load balancing.
00:14:22.200 If you don't have enough computing power, just spin up another process. It only takes around 20 megabytes.
00:14:30.270 Kafka's fault tolerance means that if one process goes down, it will rebalance the workload evenly.
00:14:37.420 That’s extremely useful and helpful for testing new ideas.
00:14:44.890 You can spin up a new consumer with a new consumer group to see how it behaves and whether it meets expectations.
00:14:52.230 If the business logic is invalid, you can shut it down without worry.
00:15:00.530 If it meets expectations, you can replace processes one after another.
00:15:08.520 This mechanism also allows you to rewind and replay services.
00:15:19.230 It's saved me multiple times because my business logic sometimes doesn’t work as expected.
00:15:27.740 However, this approach allows me to go back 12 hours in history and reprocess all data.”
00:15:35.480 Let’s talk about Ruby and the libraries available for working with Kafka.
00:15:43.700 There are currently three libraries.
00:15:46.550 There’s a low-level library for JRuby: it’s Kafka for JRuby. It’s posted on GitHub.
00:15:54.040 You can only use it with JRuby. You need to read the Java docs, though.
00:16:00.630 I mention it, but please don’t use it if you’re building something else.
00:16:06.220 Kafka is maintained by Zendesk, which is great.
00:16:13.090 From time to time, I help fix bugs.
00:16:19.680 The majority of low-level functionalities come from Ruby Kafka, which is solid.
00:16:26.710 It provides everything you need to start working with Kafka.
00:16:35.720 On the high-level side, we have Karafka, along with a couple of other smaller frameworks.
00:16:42.940 Karafka was built to simplify application development, so you don't have to worry about the details.
00:16:50.150 Just focus on your business and you don't have to dive into Kafka's inner workings.
00:16:56.480 I wanted to introduce everyone to Kafka, so I built Karafka to help teams and developers work faster.
00:17:02.770 The alternative would be introducing all the intricacies of Kafka to people.
00:17:08.590 That way, they can easily hook their applications into it.
00:17:15.060 Developers in a company shouldn't concern themselves with threading signals, load balancing, connection management, and so on.
00:17:22.040 Instead, their primary focus should be on delivering business value.
00:17:29.370 In a company, we didn't solely rely on REST and HTTP; we utilized Kafka.
00:17:39.140 We didn't have broadcasting, and broadcasting is a key feature of Kafka.
00:17:47.470 If you had asked me what the most important feature of Kafka is, I'd say it's broadcasting.
00:17:55.460 There’s no batch processing with HTTP; you send a single request.
00:18:04.910 With Kafka, you're almost forced to think in streams and batches, which is extremely performant.
00:18:13.830 You can replace microservices; if Ruby isn't sufficient for a task, you can implement a different technology.
00:18:22.170 Spin it up as part of a consumer group and test it out if messages are processed properly.
00:18:30.960 You can replace one process after another within a single consumer group.
00:18:39.210 This ability to scale allows Ruby to work together alongside other technologies.
00:18:46.590 You can transition between them in real-time production.
00:18:52.530 The world is indeed about messaging. I’m talking to you, and I’m not waiting for your feedback after each sentence.
00:19:03.900 With HTTP, you send a request and await a response. It’s embedded in how we think about programming.
00:19:12.390 Most applications do things synchronously - we want answers immediately.
00:19:20.940 Yet the reality is that the world operates as a stream.
00:19:27.200 For some reason, we don't model our software that way, and that’s somewhat understandable.
00:19:35.680 For example, when someone opens a website, they wait. However, why apply that same logic to back-end processing?
00:19:44.710 When we execute actions, we shouldn't have to wait!
00:19:53.150 Instead, we could delegate processing and receive responses later.
00:20:01.050 With these thoughts, we built a system around this model; we utilized existing tools to make it happen.
00:20:08.710 As I mentioned, there are great resources in Ruby such as Karafka and other tools from the Ruby ecosystem.
00:20:16.040 Concurrent Ruby helps manage multi-threading, allowing Kafka to operate within multiple threads.
00:20:22.790 The main thread processes data while other threads handle different tasks.
00:20:30.400 The framework is straightforward and designed with various components.
00:20:36.790 It includes routers for consumer group description and event processing consumers.
00:20:43.120 There are also responders for sending messages and a CLI for management.
00:20:50.750 The lifecycle of a message or message bus is quite simple.
00:21:00.000 Messages are fetched from Kafka, routed, and sent through responders.
00:21:05.300 Installation requires adding Karafka to your app. You can clone an example from our repository.
00:21:11.960 Karafka requires a server instance, meaning it needs to run as a separate process.
00:21:19.060 I know one person from Japan who runs it inside a Puma process, but I can't understand why you'd want to do that.
00:21:29.950 There are great features that make Life easier, with validations at every step.
00:21:38.330 If you attempt to set a premature shutdown timeout, Karafka won't let you; its validations help prevent bad configurations.
00:21:46.040 The default configuration is stable and works in most cases.
00:21:53.020 If you want to use it, just provide the app name and the location of your Kafka brokers.
00:22:00.440 You should use multiple brokers to avoid single points of failure.
00:22:09.860 Karafka’s routing engine is straightforward; it fetches data from specified topics.
00:22:15.090 No magic routing needed: we prioritize explicit consumer definitions for clarity.
00:22:20.740 Just tell it which topic to fetch from and which consumer to handle it.
00:22:28.260 For batch consumption, just implement that. It's the default setting.
00:22:36.930 If you're processing JSON data or any specific type of information, define your consumer names and topics.”
00:22:44.980 You can utilize a parent batch for processes where you can work on single messages within that parent.
00:22:52.360 If you're using Active Record for instance, you can directly look up records and create new ones.
00:22:59.200 But if you're just gathering information together after validations, you can adjust that.
00:23:06.440 Notice that responses aren't like in Rails; instead, use responders to help send messages.
00:23:11.970 Responders can set requirements using DSL to ensure reactions are triggered on certain topics.
00:23:18.720 In doing so, if you neglect certain requirements, you'll receive error notifications.
00:23:25.500 This feature proves beneficial when designing data pipelines!
00:23:31.900 You can pull data from one topic, process it, and write it into a different topic.
00:23:39.580 Kafka monitors production processes, making it easier to track flow between multiple applications.
00:23:46.660 This CLI is helpful in figuring out how data flows between applications or topics.
00:23:54.220 If you only consume data without making changes, you significantly lower risks.
00:24:02.760 These types of applications are generally very safe to implement.
00:24:11.430 As for performance, approximately 55,000 messages per second can be processed from a single process.
00:24:20.530 In situations where it's not embedded in Rails, it works similarly.
00:24:28.250 Messages can be sent in under a millisecond, and sending data in batches improves performance.
00:24:38.000 In cases of statistical data, it might be around one-tenth of a millisecond.
00:24:45.750 Two important processes in Karafka are fetching and consuming data.
00:24:53.480 Fetching involves getting data from Kafka into your Ruby process memory.
00:25:00.140 Consuming means applying your business logic to the already fetched messages.
00:25:07.950 You can fetch messages in batches or one by one.
00:25:15.280 If you fetch them in batches, restarting the process will confirm all messages again.
00:25:23.540 This means you have to manage reentrant calls, but that’s necessary.
00:25:30.330 If you're paranoid about processing errors, you can consume message by message.
00:25:39.690 This might slow things down, though.
00:25:46.410 There's a cool feature of lazy evaluation for the params, meaning messages aren't parsed until needed.
00:25:52.700 This lets you reject older messages based on their metadata, such as timestamps.
00:25:59.000 Consumers can be persistent; you can flush data every 1,000 messages, for instance.
00:26:08.930 This approach greatly aids in in-memory computation, only processing results.
00:26:17.490 Karafka is also great for managing sagas and multi-batch transactions.
00:26:26.420 With Karafka, committing occurs per individual topic partition.
00:26:35.740 For sagas, you can wait for all events across various topics before executing the logic.
00:26:44.740 The drive monitor feature is beneficial for validation.
00:26:56.700 You can subscribe to simple listeners just as easily as you do for production traffic.
00:27:03.410 Deploy it wherever you want—it's capable of running on Heroku, Capistrano, Docker, or anywhere else.
00:27:10.460 It also provides topic mapping because some providers, like Heroku, require you to have topic prefixes.
00:27:18.080 This means your name leaks into the business logic, which isn't ideal if you decide to switch providers.
00:27:25.180 You can save your topic prefixes in a mapper, which is another helpful feature.
00:27:31.020 You can also integrate it with Rails and any other web framework; it’s simple.
00:27:39.320 It serves as a good proof of concept.
00:27:46.740 Currently, I can't move to talk about the example I wanted to.
00:27:57.450 However, I’ll skip that since time is tight.
00:28:08.150 What Castle does is protect your system from account mitigation.
00:28:15.590 This means that even if a user's data, such as login passwords, gets leaked, we can detect it.
00:28:26.300 There are a lot of breaches like this every year.
00:28:33.300 But what we can fake is user behavior; we can track how users interact with the software.
00:28:41.090 Building models based on their actions helps us identify anomalies.
00:28:47.600 Even if a user has valid credentials, if their behavior doesn't match expectations, that's a red flag.”
00:28:56.320 We manage a huge amount of data, analyzing user interactions to detect irregularities.
00:29:03.690 We process around 100 million events daily, with 25 million active users.
00:29:12.640 We target around 50 million active users in total, monitoring thousands of login attempts.
00:29:23.900 Thank you very much.
Explore all talks recorded at RubyKaigi 2018
+62