Event Streaming on Rails

by Brad Urani

In the presentation titled Event Streaming on Rails at RailsConf 2022, Brad Urani, Principal Engineer at Procore, explores the concept of event streaming as a solution for scaling large Rails applications. Urani discusses how companies can transition from monolithic Rails applications to microservice architectures through effective use of event streaming technologies, primarily using Apache Kafka. The core topics addressed in the talk include:

Communication Challenges in Large Applications: Urani outlines the difficulties in making large Rails applications communicate, especially when trying to split a monolith into smaller services and facilitating data replication across different geographic regions.
Introduction to Apache Kafka: The speaker explains Kafka as a distributed event streaming platform, highlighting its capabilities, including durability, guaranteed delivery, and message ordering, which is crucial for maintaining data consistency across systems.
Integration with Ruby on Rails: Practical insights are provided on how to implement Kafka within Rails using the Ruby Kafka gem. Urani describes both synchronous and asynchronous message publishing methods, emphasizing the trade-offs between speed and message durability.
Common Use Cases for Kafka:
- Service-to-service communication, where Kafka allows for decoupled services that can operate independently, improving fault tolerance.
- Multicasting events for diverse processes, such as notifying various downstream services when a user signs up.
- Ensuring consistency between heterogeneous data stores like PostgreSQL, Elasticsearch, and reporting databases using event sourcing and change data capture techniques.
Transactional Outbox Pattern: Urani discusses the dual-write problem inherent in standard database operations and proposes the transactional outbox pattern as a solution. This technique involves writing to an 'outbox' table within a single transaction to ensure that the data published to Kafka aligns with the database entries, providing a reliable mechanism to maintain consistency.
Operational Considerations: The necessity for operational strategies like Kafka Connect to simplify the integration of various data sinks and sources was also discussed, particularly for maintaining historical data.

In conclusion, Urani illustrates that adopting event streaming with Kafka not only boosts scalability and performance of Rails applications but also enhances their reliability through strong guarantees on message delivery and ordering. By leveraging these technologies, developers can create a more resilient architecture, paving the way for future growth and adaptability within their applications.

00:00:00.900 Welcome to the presentation.

00:00:12.440 I'm excited to be back at RailsConf. This is my first conference since 2019, and it's really nice to see everyone here. How many of you missed this?

00:00:24.060 My name is Brad Urani, and I'm a principal engineer at Procore. I live in Austin, Texas.

00:00:31.560 At Procore, we build software for the construction industry. If you're involved in building a skyscraper, a shopping mall, or a subway system, you might be using our software. It's a large suite of enterprise tools, and our tagline is, "We build the software that builds the world."

00:00:43.020 We have a strong team of around 200 to 300 Rails developers. We grew from a small shop, starting on Rails 0.9 over 16 years ago. Now, we might have one of the world's largest Rails applications. I know Shopify has a significant app, but ours is quite large too.

00:01:04.619 We have over 43,000 Ruby files, more than a thousand database tables, and somewhere between 250 and 300 developers working on it daily. With this growing complexity comes some challenges. While this model of a giant Rails app has served us well, it can't scale forever. At some point, we need to explore different architectures.

00:01:30.840 That raises important questions: How do we split the world's largest monolith? How does our existing architecture communicate with new services? It's not obvious. How do we implement multi-region strategies, so our AWS resources can communicate across North America and Europe? Additionally, how do we extract components like search and reporting that often cause significant performance issues?

00:01:49.320 These questions lead us to rethink our architecture. At Procore, we are addressing these problems through the development of a distributed systems strategy, which is fundamental for our service-to-service communication. Our solution allows for efficient data replication across zones and supports self-service creation of services that can publish and consume Kafka streams.

00:02:27.420 Kafka streams come in various iterations. You might hear about an event bus, a transaction log, or stream processing. At the core of this is Apache Kafka, a distributed log created by Jay Kreps at LinkedIn. While Kafka sits as the dominant option with the most integrated tools and libraries, alternatives like Amazon Kinesis and Google Cloud Pub/Sub exist.

00:03:12.060 Kafka allows for a publisher to send data to a topic, which acts as a single stream. It then enables a consumer to read from that topic. I'll delve into how we can read and write to Kafka from Ruby.

00:03:43.920 There's a Ruby gem called Ruby Kafka, maintained by the folks at Zendesk, which simplifies the process. To get started, you instantiate a Kafka client and configure it with broker names. Then you deliver a message specifying the intended topic and any applicable partitioning.

00:04:58.020 Kafka also supports asynchronous message delivery, where messages are queued in memory and processed by a background thread, allowing for non-blocking requests in your Rails application. However, it’s essential to note that this async mode lacks durability—messages could be lost if your server crashes. I'll discuss what to consider in terms of production consistency throughout this talk.

00:05:47.880 Kafka topics are divided into partitions to enhance scalability. Each consumer processes a batch of messages, ensuring order within a message subset based on a chosen partition key, such as user ID. This enables parallel processing by allowing multiple consumers to work on different partitions.

00:06:49.020 Reading from Kafka in Ruby again uses the Ruby Kafka gem, configuring connection settings like brokers. This involves a continuous loop subscribing to a topic and processing batches of messages. Managing exceptions in this part is critical for maintaining order guarantees.

00:07:24.060 You can make any Rails app a Kafka consumer by creating a class that initializes your Rails autoloaders, thereby using your existing model classes, skipping over the web server layer.

00:08:00.840 Kafka offers some unique guarantees compared to other messaging systems. It ensures at least once delivery, meaning that if a consumer fails to process a message initially, it will retry until successful. This is crucial for retaining the integrity of messages.

00:08:52.920 Furthermore, Kafka preserves a guaranteed order of messages within a partition, which is vital for consistent data flow. With this, we can effectively utilize a Rails app writing to a database while making sure downstream processes maintain the same updated data.

00:09:47.700 Multicasting allows multiple consumers to listen to a single topic, thus enhancing flexibility in the system. Kafka's retention mechanism permits re-consumption of messages, as long as they are retained according to a set retention schedule.

00:10:38.760 If you're looking to implement Kafka, various options exist, such as AWS's managed service MSK, Heroku, and Confluent, which offers extensive tooling for Kafka implementations.

00:11:35.880 The classic question that emerges is: What is the best way for one Rails app to communicate with another? While an HTTP POST could seem simple, it presents faults when dealing with errors, such as unavailability of the downstream service. The latency involved with synchronous communication is also something to consider.

00:12:36.300 In contrast, Kafka allows for asynchronous interactions, enabling quick write operations without waiting for downstream acknowledgment.

00:13:23.040 However, one must consider that Kafka messages can be retried, leading to potential duplicates. Therefore, implementing idempotent design in your consumer logic is necessary to avoid duplicate inserts.

00:13:59.100 Let’s delve into some use cases. The first use case illustrates service-to-service communication upon user registration. With Kafka, various actions such as sending welcome emails, adding search records, and others can occur independently, benefitting from the retry mechanism.

00:14:48.360 For scenarios requiring heterogeneity, Kafka ensures data consistency across different storage solutions. If a Rails app writes to Kafka, data flows consistently into different systems, maintaining the order.

00:15:24.600 Another concept to explore is Kafka Connect, which facilitates simpler production and consumption of data without the need to build consumer and producer code from scratch.

00:16:00.300 Companies increasingly recognize the value of having a data lake, where Kafka can back up topics to cloud storage, making it easier to retrieve historical data.

00:17:00.840 Use cases for data replay involve reprocessing periods like the entirety of previous years, facilitating better decision-making in machine learning models and indexing.

00:18:06.420 The concept of event sourcing decouples database writes from the application logic, pushing writes first to Kafka and then processing to databases, thus ensuring integrity.

00:18:54.720 This pattern promotes consistency, especially when dealing with complex applications involving multiple states. It also allows separate management of read and write models, aiding in efficiency.

00:19:55.140 Region-to-region replication is another pivotal use case, facilitating the sharing of records such as customer details across geographical boundaries while maintaining consistency.

00:20:55.080 Methods like Kafka Connect can help replicate topics across regions using built-in tools like Mirror Maker, enhancing service availability.

00:21:48.420 A critical consideration is dual writes when synchronously writing to Kafka and a database. This leads to inconsistencies, and leveraging Change Data Capture via tools like Debezium helps with this.

00:22:48.540 By tracking database transaction logs and coordinating them with Kafka, we can maintain data integrity and prevent dual write issues.

00:23:55.740 Transactional outbox patterns also assist in decoupling streams from database writes, ensuring all processes are treated as a single transaction.

00:24:18.840 Publishing methods can vary. Whether you're looking for synchronous safety or asynchronous speed, ensure the choice fits the needs of your application.

00:25:13.560 Summarily, streaming provides a globally distributed, eventually consistent means to asynchronously send messages across services in a reliable and swift manner.

00:26:18.600 Within a message are headers, a partition key, and data. Choosing a serialization format impacts your efficiency, so explore options like Avro or Protocol Buffers compared to JSON.

00:27:20.760 Additionally, tools like Apache Flink are on the horizon, offering real-time stream processing capabilities, allowing for enhanced message handling.

00:28:15.120 Thank you for listening. I’m Brad Urani from Procore, and I appreciate your time. Although I have run out of time for questions, feel free to approach me afterward.