Euruko 2023

Ruby Threads (And So Can You!)

Ruby Threads (And So Can You!)

by Johnny Shields

In the presentation titled "Ruby Threads (And So Can You!)", Johnny Shields, the founder and CTO of TableCheck, explores practical implementations of multi-threading in Ruby applications. The presentation highlights the importance of enhancing performance through efficient task management, particularly in scenarios where time is of the essence, such as sending out large volumes of emails. Shields provides a clear and structured approach utilizing the producer-consumer pattern, which introduces the following key points:

  • Producer-Consumer Pattern: This model allows decoupling of data production from consumption, enhancing efficiency in task management. The producer adds customer data to a queue while consumer threads handle the sending of emails.
  • Queue Management: By creating a Size Queue class, Shields illustrates how to control the flow of tasks, ensuring that consumer threads process emails without overwhelming the system’s resources.
  • Performance Benchmarking: He compares single-threaded and multi-threaded operations, demonstrating that by using the producer-consumer pattern, tasks that initially took about 15 seconds can be reduced to less than half a second when executed with multiple consumer threads.
  • Multi-Stage Queue Pattern: To further enhance performance, Shields introduces the concept of multiple queues for more complex tasks, like downloading files and uploading them to Amazon S3, where efficiency can be gained by managing separate processes concurrently.
  • Elixir Recommendation: For tasks involving heavy multithreading, Shields suggests considering Elixir, which offers robust abstractions for managing processes, thus leading to further improvements in application performance.

Overall, Shields encourages developers to look into these multi-threading concepts to optimize Ruby applications and mentions that TableCheck is actively hiring, inviting interested individuals to explore opportunities. The presentation concludes with a Q&A session, addressing technical questions about the technology stack at TableCheck, error handling, and retry mechanisms used in their systems.

00:00:12.179 Hello, my name is Johnny Shields. I'm the founder and CTO of a company called TableCheck, based in Tokyo, Japan. I flew out from Tokyo to be here today. Our company works with restaurants, providing booking and guest management services. Currently, we are working with about 9,000 restaurants and growing. You might not have heard of us in Europe because we're mostly established in Asia, but we're planning to expand here soon, so hopefully, in a few years, people in this region will be encountering us.
00:00:35.700 We also collaborate with Cybergeizer. Sergey and his team have been fantastic engineers for us, powering all our Ruby and Elixir development for the last five years. They feel like family to us, and it has been great to come out here and meet them all.
00:01:02.820 Today, I will present on a topic called "Ruby Threads (And So Can You!)." The focus of this presentation is to demonstrate a practical way to implement multi-threading in your applications. This method doesn't rely on any gems or special magic; it's purely Ruby. So, let's get started!
00:01:10.539 We will begin with an example problem that many of you might have encountered. We'll look at a rake task that sends emails—in our case, daily reminder emails to customers about their upcoming reservations. For instance, we send notifications stating, "Hey, your reservation is coming up tomorrow," to all customers who have a reservation for the next day.
00:01:29.520 Our task looks something like this: it's a very simple version where we iterate through our Active Record table to obtain all customers and then call a mailer class to send the emails. The issue, however, is that the mail delivery API can be slow. If we are sending hundreds of thousands of emails, this task can take a significant amount of time! So, let's explore how we can speed this up.
00:02:06.420 What I will introduce is the producer-consumer pattern. The way this works is that the producer—in this case, the customer data—will be placed into a queue. Then, we will have a consumer, which is the mail sender, that will handle sending the emails. Instead of using the actual class names, let’s say we have Active Record customers, and we pull each customer from the queue to send them an email.
00:02:33.960 To start, I'll create a queue using the Size Queue class. This class prevents adding new items to the queue if it is full, effectively blocking the thread until there is space available. The producer will run on a separate thread and will insert items into the queue, while I will have a number of consumer threads—in this case, I will create 50 consumer threads. At the end of the process, I will join these threads to wait for each one to finish its job before the entire function completes.
00:03:29.760 The producer function will iterate over the customer data—as Active Record works with batching, it doesn't load all customers into memory at once. Therefore, if we have a million customers, we can manage memory more efficiently. Once the job is done, I will insert an end-of-queue symbol 'eoq' into the queue for each of my 50 consumer threads. When a consumer receives this symbol, it will know that it is finished processing, leading to orderly completion of the threads.
00:05:07.800 So, does this method make a difference? Let’s say we have a single thread processing a thousand customers, each taking 10 milliseconds: it would take about 15 seconds to complete. However, with the producer-consumer pattern, this time can reduce to less than half a second. Implementing this pattern can significantly speed up task completion.
00:05:25.380 But wait, there’s more! If we want to get fancy, we can actually set up multiple queues, which I’m calling a multi-stage pattern. For example, suppose I need to download a batch of files from an FTP server and upload them to Amazon S3. The process will involve downloading each file and then uploading it, which both take some time. By creating multiple queues, we can manage and streamline downloading and uploading separately.
00:05:54.840 In this example, my first queue will contain the file names to download, and I will put the downloaded data into a second queue for uploading. Initially, I will set up one producer inserting items into the download queue and have intermediate producer-consumer threads fetching from it to prepare data for the upload queue. Finally, I will consume the upload queue.
00:08:38.520 My implementation for the download and upload processes is similar to what we did with customers. The concept is fundamentally the same, just with an added intermediate layer. You can create as many stages as needed in this pattern to simplify and speed up processing tasks. I have mutexes implemented in my actual application to track the count of items being processed, showing progress across each queue.
00:09:48.840 Now, let’s benchmark this system. In testing with a thousand files, assuming each download and upload takes about 10 milliseconds, I found that the single-threaded approach took 30 seconds, whereas the producer-consumer pattern again finished in half a second. This demonstrates that implementing multi-threading and multiple queues can massively enhance performance.
00:10:49.140 Finally, I suggest that if you want to streamline this process further, consider using Elixir. Elixir has abstractions that can manage these tasks more efficiently. The kind of processes and patterns I’ve outlined here resemble the robust solutions that Elixir offers, leveraging micro-processes to manage operations effectively.
00:11:22.980 Please check out Elixir if you're dealing with heavy multithreaded tasks. However, I hope you can apply some of the concepts I presented today to enhance the performance of your Ruby applications.
00:12:06.000 And that's my presentation! Thank you, everyone, for listening. By the way, our company is hiring, and we do offer remote work. So if anyone is interested, feel free to check that out as well. Thank you very much.
00:12:29.220 Now, are there any questions? If you have any inquiries, please feel free to ask.
00:14:51.460 What technology stack are you using at TableCheck? For the front end, we primarily use React, and in some cases, Ember.js. Our back end is mostly a Ruby monolith, along with some Elixir microservices and data ingestion pipelines.
00:14:57.060 Regarding our team size, including Cybergeizer, we have about 15 people, with our entire engineering, product, and QA department totaling around 65 and our overall company size reaching about 200 people.
00:15:40.860 One last question about error handling: how do you manage errors or retries in your system? In our application, we utilize retry mechanisms within each consumer and producer thread. If, for example, one email fails to send due to an error, we don't want that error to stop all other threads from processing. We would implement a retry wrapper around the mailer call to ensure resilience.
00:16:17.460 With that approach, you can ensure the remaining emails are processed without being hindered. If you would like more specifics, I can also share examples of our retry logic on GitHub after the presentation.
00:16:37.940 Thank you for your time, and once again, if there are no further questions, let’s give a round of applause for Johnny!