00:00:12.179
Hello, my name is Johnny Shields. I'm the founder and CTO of a company called TableCheck, based in Tokyo, Japan. I flew out from Tokyo to be here today. Our company works with restaurants, providing booking and guest management services. Currently, we are working with about 9,000 restaurants and growing. You might not have heard of us in Europe because we're mostly established in Asia, but we're planning to expand here soon, so hopefully, in a few years, people in this region will be encountering us.
00:00:35.700
We also collaborate with Cybergeizer. Sergey and his team have been fantastic engineers for us, powering all our Ruby and Elixir development for the last five years. They feel like family to us, and it has been great to come out here and meet them all.
00:01:02.820
Today, I will present on a topic called "Ruby Threads (And So Can You!)." The focus of this presentation is to demonstrate a practical way to implement multi-threading in your applications. This method doesn't rely on any gems or special magic; it's purely Ruby. So, let's get started!
00:01:10.539
We will begin with an example problem that many of you might have encountered. We'll look at a rake task that sends emails—in our case, daily reminder emails to customers about their upcoming reservations. For instance, we send notifications stating, "Hey, your reservation is coming up tomorrow," to all customers who have a reservation for the next day.
00:01:29.520
Our task looks something like this: it's a very simple version where we iterate through our Active Record table to obtain all customers and then call a mailer class to send the emails. The issue, however, is that the mail delivery API can be slow. If we are sending hundreds of thousands of emails, this task can take a significant amount of time! So, let's explore how we can speed this up.
00:02:06.420
What I will introduce is the producer-consumer pattern. The way this works is that the producer—in this case, the customer data—will be placed into a queue. Then, we will have a consumer, which is the mail sender, that will handle sending the emails. Instead of using the actual class names, let’s say we have Active Record customers, and we pull each customer from the queue to send them an email.
00:02:33.960
To start, I'll create a queue using the Size Queue class. This class prevents adding new items to the queue if it is full, effectively blocking the thread until there is space available. The producer will run on a separate thread and will insert items into the queue, while I will have a number of consumer threads—in this case, I will create 50 consumer threads. At the end of the process, I will join these threads to wait for each one to finish its job before the entire function completes.
00:03:29.760
The producer function will iterate over the customer data—as Active Record works with batching, it doesn't load all customers into memory at once. Therefore, if we have a million customers, we can manage memory more efficiently. Once the job is done, I will insert an end-of-queue symbol 'eoq' into the queue for each of my 50 consumer threads. When a consumer receives this symbol, it will know that it is finished processing, leading to orderly completion of the threads.
00:05:07.800
So, does this method make a difference? Let’s say we have a single thread processing a thousand customers, each taking 10 milliseconds: it would take about 15 seconds to complete. However, with the producer-consumer pattern, this time can reduce to less than half a second. Implementing this pattern can significantly speed up task completion.
00:05:25.380
But wait, there’s more! If we want to get fancy, we can actually set up multiple queues, which I’m calling a multi-stage pattern. For example, suppose I need to download a batch of files from an FTP server and upload them to Amazon S3. The process will involve downloading each file and then uploading it, which both take some time. By creating multiple queues, we can manage and streamline downloading and uploading separately.
00:05:54.840
In this example, my first queue will contain the file names to download, and I will put the downloaded data into a second queue for uploading. Initially, I will set up one producer inserting items into the download queue and have intermediate producer-consumer threads fetching from it to prepare data for the upload queue. Finally, I will consume the upload queue.
00:08:38.520
My implementation for the download and upload processes is similar to what we did with customers. The concept is fundamentally the same, just with an added intermediate layer. You can create as many stages as needed in this pattern to simplify and speed up processing tasks. I have mutexes implemented in my actual application to track the count of items being processed, showing progress across each queue.
00:09:48.840
Now, let’s benchmark this system. In testing with a thousand files, assuming each download and upload takes about 10 milliseconds, I found that the single-threaded approach took 30 seconds, whereas the producer-consumer pattern again finished in half a second. This demonstrates that implementing multi-threading and multiple queues can massively enhance performance.
00:10:49.140
Finally, I suggest that if you want to streamline this process further, consider using Elixir. Elixir has abstractions that can manage these tasks more efficiently. The kind of processes and patterns I’ve outlined here resemble the robust solutions that Elixir offers, leveraging micro-processes to manage operations effectively.
00:11:22.980
Please check out Elixir if you're dealing with heavy multithreaded tasks. However, I hope you can apply some of the concepts I presented today to enhance the performance of your Ruby applications.
00:12:06.000
And that's my presentation! Thank you, everyone, for listening. By the way, our company is hiring, and we do offer remote work. So if anyone is interested, feel free to check that out as well. Thank you very much.
00:12:29.220
Now, are there any questions? If you have any inquiries, please feel free to ask.
00:14:51.460
What technology stack are you using at TableCheck? For the front end, we primarily use React, and in some cases, Ember.js. Our back end is mostly a Ruby monolith, along with some Elixir microservices and data ingestion pipelines.
00:14:57.060
Regarding our team size, including Cybergeizer, we have about 15 people, with our entire engineering, product, and QA department totaling around 65 and our overall company size reaching about 200 people.
00:15:40.860
One last question about error handling: how do you manage errors or retries in your system? In our application, we utilize retry mechanisms within each consumer and producer thread. If, for example, one email fails to send due to an error, we don't want that error to stop all other threads from processing. We would implement a retry wrapper around the mailer call to ensure resilience.
00:16:17.460
With that approach, you can ensure the remaining emails are processed without being hindered. If you would like more specifics, I can also share examples of our retry logic on GitHub after the presentation.
00:16:37.940
Thank you for your time, and once again, if there are no further questions, let’s give a round of applause for Johnny!