00:00:07.360
Hello, everyone! Thank you all for coming. Wow, there’s actually quite a lot of you here. My name is Darcy, and I’m excited to talk to you today. I really like monkey hats; they're quite awesome! I have a lot of slides for my talk, and I apologize upfront since it's not the best idea to be up against lunch. If I talk too fast, please yell out, and I’ll try to slow down. As a disclaimer, when I show code slides—and there will be a lot—don't even bother trying to read everything, as I will move on quickly.
00:00:36.280
Today, I’m going to talk about hacking Sidekiq for fun and profit. Essentially, we’ll explore how to make Sidekiq do things that it doesn't do out of the box, and it's applicable to anyone working with Sidekiq, whether for personal projects or for employers, like working with Amazon for background jobs.
00:01:06.320
Before going into detail, let’s cover what Sidekiq is and the history of background jobs. Background processing isn’t a new concept—it has existed since before Rails and dates back to early computer science. If you come from an enterprise background, it’s a given. As is the Ruby way, we've kind of reinvented it over the years. If you're not used to thinking about it, background processing primarily deals with complex logic that has many side effects. For instance, long-running operations or interactions with external APIs that you don’t want to block your request-response cycle are ideal candidates for background processing. We want to run this code in the background, which doesn't affect the immediate user experience but is necessary.
00:01:56.000
Ruby has a history of approaches to background processing, and I will skip through some of this but mention four main options we’ve had historically. The first was a project called BackgroundRB, a server-client integrated with Rails and recognized as quite old. If you see any references to it on Ruby Forge, that’s a sign you've been using Rails for a while! The next leap was Delayed Job, which realized that using a daemon with custom persistence was not a good idea. Instead, it utilized the database, which every app already has, to store job data.
00:03:18.200
The big step in popularity was Rescue, emerging from the early days of GitHub. Rescue changed the game by using Redis, allowing applications to adopt Redis as a complementary data store to their relational database. It also used built-in Redis operations to implement a fast and easy-to-run system, providing a wealth of tooling to ease job management. Finally, today’s topic, Sidekiq, was developed by Mike Perham who works for a startup called The Climb. Sidekiq introduced threading, which has historically had a bad reputation in Ruby due to the Global VM Lock in MRI.
00:04:55.000
However, Sidekiq became the first well-adopted background processing tool in Ruby that could effectively run multiple jobs concurrently in a single process using multiple threads. It does this efficiently while maintaining code readability and using an actor-based concurrency model, which simplifies handling concurrency issues and avoids the painful parts of threading. Importantly, even with the Global VM Lock, threading remains incredibly useful because of networking-heavy logic, like writing to databases or interacting with APIs.
00:05:56.000
Like Rescue, Sidekiq uses Redis for job and metadata storage, allowing for seamless integration between the two. You can push jobs into Redis using either Sidekiq or Rescue and pull them back out as needed. For example, if your application already uses Rescue workers, introducing Sidekiq is easy. Sidekiq is also compatible with multiple programming languages, allowing you to push jobs into Redis from languages like Go or JavaScript.
00:07:29.560
A major benefit of Sidekiq is that it is feature-rich out of the box. Unlike Delayed Job or Rescue, which often require adding additional gems to manage exceptions or monitor job status, Sidekiq includes built-in exception reporting, making it easier to handle failures. It efficiently tracks errors and automatically reports them if configured, offering developers insight into job execution times and resource usage.
00:08:09.160
When jobs fail, Sidekiq automatically retries them, employing exponential backoff to avoid overwhelming services when they fail, which prevents common issues that can arise when managing these processes manually. An important feature is the ability to schedule jobs, allowing developers to specify when a job should run. Rather than relying on cron jobs, you can simply create scheduled jobs with Sidekiq, which helps reframe how you think about managing background logic.
00:09:28.840
Sidekiq employs logical grouping for different types of jobs, enabling extensibility and support for multiple queues. The job management is designed to be extensible using middleware, similar to Rack, where you can influence the behavior of your code precisely before execution starts. For those interested in expanding Sidekiq's capabilities, you can also check out Sidekiq Pro, which builds on the open-source model and provides valuable features such as batching and reliable workers.
00:10:57.520
Next, let’s cover the practical aspects of how Sidekiq operates. When a job is queued, it communicates with the Sidekiq client class, which forms a hash with the job class and its arguments. This process helps ensure that we stick to standard practices while communicating with Redis. Sidekiq serializes job data into a JSON object, allowing every job to maintain metadata like job ID, retry count, and other configurations.
00:12:04.320
Job data is stored in Redis, which features various data structures like sorted sets that facilitate scheduling jobs based on timestamps. The Sidekiq manager coordinates checking for and invoking jobs, where it interacts with various components and ensures fault tolerance and efficiency.
00:13:29.760
A method called `fetch` retrieves jobs from queues based on specific strategies and configurations that you may want to implement, including strict priority handling or different sampling approaches, influencing how your queue operates. In the implementation, not only will you retrieve jobs, but you'll also have access to essential job details, while the process can be enriched through middleware.
00:14:52.520
One useful modification is ensuring jobs are unique, which can be achieved with minimal changes to Sidekiq’s core functionalities. Instead of enqueuing multiple jobs that do the same thing concurrently, you can implement middleware that checks for existing jobs and prevents duplicates. This ensures resource efficiency and cost-effectiveness.
00:16:20.400
Further enhancements can include a feature that allows clients to store and manage job uniqueness, with the ability to set specific locks, debounce jobs, or manage retries effectively. You can design your system to allow for high concurrency while maintaining reliability.
00:17:44.960
Advanced patterns can also help manage job running instances effectively. This means you can prioritize and schedule tasks while controlling how many workers operate on a job. If you're having issues with certain job types, such as legacy code causing data corruption, being able to pause job processing or dynamically allocate resources becomes critical.
00:18:56.560
You can also define job pipelines where certain queues are prioritized over others, providing better management of resources and a smoother workflow. It's essential not to restart processes when jobs are queued, and customizing the approach to managing queues allows for this level of flexibility.
00:20:16.560
Lastly, with all these optimizations in mind, jobs must be idempotent. Ensure that jobs can be run multiple times without worrying about duplication or unexpected side effects. Using transactions can significantly enhance reliability, ensuring that if something goes wrong, previous results do not corrupt new job executions.
00:21:39.760
In conclusion, leveraging Sidekiq efficiently requires an understanding of how it's structured. You can exploit its well-designed features to extend functionality and utilize Redis as a powerful backend. By knowing how to experiment and integrate with your application effectively, you can create a robust job processing system. Always be cautious as you extend and adapt Sidekiq to your projects, maintaining clean, understandable code.
00:23:02.240
I reached out to Mike to share any upcoming news, and he's informed me about two significant updates coming soon: Sidekiq 3, which will feature a dead job queue for managing failed tasks, and nested job batches, which will enhance usability for complex data imports and multi-stage workflows.
00:23:29.120
Thank you for your attention. If you have any questions, I would be happy to answer them!
00:34:00.000
Speaker Q&A
00:34:10.000
Audience Question: Can you talk about how Sidekiq handles failure scenarios particularly with spot instances?
00:34:40.000
Darcy: Sidekiq does not inherently track whether an instance gone down due to timeout or another issue. You can implement application code to deal with retries effectively.
00:34:57.000
Audience Question: Would having a backend web app for analytics help improve Sidekiq interactions?
00:35:21.000
Darcy: Yes, a mountable Sinatra app could integrate with Sidekiq and provide valuable metrics and insights into job processing.
00:35:40.000
Darcy concludes: Debugging in Redis provides a clear view of what’s happening behind the scenes, which is great for resolving issues.
00:36:00.000
Thank you!