Building Stream Processing Applications with Ruby & Meroxa

by Ali Hamidi

In this video titled 'Building Stream Processing Applications with Ruby & Meroxa,' Ali Hamidi, CTO and co-founder of Meroxa, explores the intricacies of stream processing using the newly introduced Turbine framework for Ruby. The talk focuses on the growing demand for real-time data processing applications, highlighting the challenges faced by developers in utilizing traditional, Java-centric stream processing tools. Ali explains the concept of stream processing, which involves handling continuous sequences of events in various real-time scenarios such as analytics, disaster recovery, and data enrichment.

Key Points Discussed:
- Introduction to Stream Processing: Ali defines stream processing as the computation or transformation of continuous streams of events, distinct from batch processing.
- Challenges with Traditional Tools: Existing Java-based stream processing tools present difficulties, particularly for non-Java developers, who face new paradigms and complexity such as ordering guarantees and duplicate management.
- Overview of Turbine Framework: Turbine for Ruby aims to simplify stream processing by allowing developers to write custom logic in Ruby without additional domain-specific languages (DSLs). It is designed to be idiomatic to Ruby developers.
- Demo of Turbine Application: Ali showcases a live demonstration of a Turbine app that enriches data using the Clearbit API, illustrating how the framework can process and manipulate data effectively.
- Developer Experience and Future Development: Ali emphasizes Meroxa's commitment to enhancing the developer experience, discussing plans for native stateful processing, support for stream joins, and improved CI/CD integration.
- Feedback and Developer Preview: The Turbine for Ruby is in an early developer preview mode, aiming for user feedback to refine its features. Developers are encouraged to participate and provide insights into their needs.

The session concludes with a Q&A segment, addressing how Meroxa differentiates itself from traditional serverless platforms, with a focus on managing full data pipelines and continuous applications.

Overall, the presentation not only introduces a powerful tool for Ruby developers but also encourages a community-driven approach to adapting and enhancing stream processing capabilities in Ruby applications.

00:00:00 Ready for takeoff.

00:00:18 So welcome! My name is Ali, and I'm going to talk about stream processing with Ruby.

00:00:22 Specifically, I’ll be introducing Turbine RP.

00:00:29 Here’s a quick agenda of what we’re going to cover so you know what to expect.

00:00:36 Let me start off with a bit about myself: who I am and why you should trust me.

00:00:43 I'm the CTO and one of the two co-founders of Meroxa, and the other co-founder is right here.

00:00:49 Previously, before starting Meroxa, I was a lead engineer at Heroku on the Heroku data team, mainly working on Heroku's Kafka offering. My team managed thousands of Kafka clusters for tens of thousands of customers.

00:01:05 Before that, I built a system at a targeted advertising company that queried over 2 billion user profiles in real time.

00:01:12 Even earlier, I created analytics pipelines for mobile apps, processing over 100,000 events per second.

00:01:19 Basically, I've been working in and around the data space for quite a while.

00:01:34 So, what is stream processing, and why should you care? Specifically, by stream processing, I mean taking an unbounded, continuous sequence of events and applying some form of computation or transformation to them.

00:01:40 I'm intentionally avoiding the term 'real-time' since there is no generally accepted definition for it, but essentially it's not batch processing in any sense.

00:01:55 Some common examples of stream processing include filtering events, enriching data by augmenting each event with additional information, aggregating data to perform calculations, joining data sets similar to SQL joins, and routing events to different destinations.

00:02:30 Some familiar use cases include analytics, where you take data from various sources like operational databases, support tickets, and CRM systems into a single data warehouse for analysis.

00:02:51 Another common use case is replication for disaster recovery, continuously pulling data from one place to another location, such as across geographical distances or between different database types.

00:03:02 Enrichment is another common scenario where you take user sign-up data and augment it with additional information, while integration serves as a general catch-all for moving data between different systems.

00:03:15 So, what is the big problem with stream processing today? Many of you probably love Java—it’s your favorite language—but for those who don't, traditional stream processing involves a lot of Java-based tools and frameworks.

00:03:40 If you engage in enough stream processing, you will encounter various Java-centric solutions like Kafka, Pulsar, Spark, and many others. The reality for non-Java developers is that this can be quite frustrating.

00:04:09 Secondly, stream processing introduces several new patterns and paradigms that might not be familiar, especially if you are used to building web applications that follow a regular request-response cycle.

00:04:27 For example, you'll need to think about delivery semantics, ordering guarantees, late delivery, and handling duplicates. These are all concepts you usually don’t have to deal with.

00:04:47 Moreover, thinking about partitions and topics adds another layer of complexity that can be cumbersome.

00:05:03 Once you have all these stream processing components in place, you also need to consider where to deploy these applications, which involves setting up various infrastructure components.

00:05:26 Setting up things like a VPC, subnets, security groups, and EC2 instances can be a cumbersome process that can lead to many headaches if not done properly.

00:06:01 At Meroxa, our answer to this complex problem lies in Turbine, our framework, and the Meroxa Data Platform, which offers a streamlined solution for deploying and managing data applications.

00:06:30 Turbine is the toolchain we provide for building these applications.

00:06:34 Turbine is essentially a framework, and we offer a family of them for various programming languages, starting with Go, JavaScript, and Python, and now introducing Turbine for Ruby.

00:06:53 Each Turbine framework is handcrafted for specific languages to follow idiomatic practices, ensuring that it feels familiar to developers. We also focus on providing a high-level API to simplify building stream processing applications.

00:07:10 With Turbine for Ruby, you can write custom logic in Ruby without introducing unfamiliar DSLs, and you can import Ruby gems you may already be using.

00:07:19 An example Turbine application involves creating a simple pipeline that takes data from one location, processes it, and then writes it back to another collection in the same database.

00:07:38 The platform handles the operational burden of running the Turbine apps, allowing you to focus on building functionality without worrying about infrastructure management.

00:07:58 When deploying a Turbine app, your custom logic gets packaged up into a container managed by the platform, which also takes care of scaling and monitoring.

00:08:09 In summary, Meroxa and Turbine are designed to provide developers with the tools and platforms necessary to efficiently build real-time data applications without the steep learning curve associated with traditional stream processing frameworks.

00:08:30 Now, let me attempt a live demo. Let’s see how that goes!

00:08:39 Here you can see a Turbine app that I wrote, which implements the enrichment use case using the Clearbit gem. It pulls data from a database called demo PG.

00:08:58 The app is designed to take records from a collection called events, process them using an enrichment function, and write the results to an output collection.

00:09:15 This function does a simple task of enriching the data by looking up user details via the Clearbit API and returning information such as the company name and location.

00:09:35 Turbine also provides a local development mode that simplifies testing and allows quick iterations using sample records.

00:09:52 After verifying that the feature works as intended locally, you can deploy it to the platform for operational use.

00:10:05 Thank you for your attention during this demo!

00:10:10 So, what's next for Turbine and Meroxa?

00:10:15 Currently, Turbine for Ruby is in an early developer preview, and we are eager to get feedback from users to continually improve the framework.

00:10:30 We are focused on providing an excellent developer experience and want to know how developers intend to use it.

00:10:40 Ruby 3.1 introduced the idea of value objects, which could be highly beneficial for stream processing use cases.

00:11:00 We are also working on integrating native stateful processing into the platform, making it easier to persist data.

00:11:20 We aim to support stream joins natively without relying on external resources.

00:11:37 Additionally, we're focusing on CI/CD integration so that developers can effortlessly work on their data applications alongside their Ruby applications.

00:11:56 You can access the developer preview using the QR code that directs you to a landing page where you can show your interest.

00:12:07 We’re also offering a chance to win a Meta Quest 2 by signing up.

00:12:20 We want to onboard as many users as possible and understand their needs for stream processing.

00:12:35 [Pause for questions]

00:12:49 We have plenty of time for questions, so feel free to ask anything.

00:13:10 For instance, how does our platform differentiate itself from serverless function platforms?

00:13:30 While serverless platforms require infrastructure for data delivery, Meroxa’s platform efficiently manages the full pipeline to handle continuous applications.

00:13:56 You can trigger serverless functions from within your stream processing applications, creating a seamless integration.

00:14:10 We developed the CLI using the Cobra framework, making it easy to install and compatible across platforms.

00:14:30 Our CLI not only simplifies local testing but streamlines the deployment of applications to our platform.

00:14:55 Our goal is to improve the developer experience significantly, making it possible for collaborative work while smoothly managing deployments.

00:15:20 Lastly, local development simulates databases and allows you to work with sample records, enhancing the iteration process.

00:15:39 That's all for my presentation. Thank you for your attention!