Talks

Under The Hood And On The Surface of Sidekiq

wroc_love.rb 2022

00:00:15.599 Hello everyone! I hope you are doing well. It's great to be here today. It's also a little difficult to be a speaker after Andre and Mario, as they always set the bar very high.
00:00:22.199 Today, I'm going to talk about Sidekiq. I'll cover it both on the surface, which means discussing design patterns and good practices that we should apply, as well as the more advanced internals, including how Sidekiq communicates with Redis and how the middleware works.
00:00:42.360 Before I start, let me briefly introduce myself. My name is Paweł Dąbrowski, and I am definitely a Ruby guy. I've been working with Ruby for the last 12 years. I’m proud to be a part of Iron Tonic since the beginning of the company. By day, I work as a CTO, and by night, I write a lot of articles.
00:01:07.860 You can find my writings on my personal website as well as on the Iron Tonic blog. Sometimes, I also post guest articles on the AppSignal blog. If you prefer listening to reading, make sure you check out the "Ruby Rocks" and "My Ruby Story" podcasts, where I have had the pleasure of being invited a few times in the past. Of course, you can find me on social media, specifically on Twitter and GitHub, where I mainly share Ruby-related content.
00:01:46.079 But enough about me; let's talk about Sidekiq. I'm sure that most of you know what Sidekiq is. You probably use it on a daily basis. However, just in case some of you don’t know, Sidekiq is a library for background job processing. It’s open source and free to use, but it also offers paid versions like Pro and Enterprise.
00:02:05.460 These versions come with additional features, such as Enterprise rate limiting. However, they can be quite expensive, especially if you are a solo developer on a side project or in a small startup team. The good news is that you can easily replace those paid features with open-source alternatives.
00:02:28.700 Sidekiq is simple; it allows you to easily get started and build workers. It's also very efficient, enabling you to scale from a few jobs to millions of jobs. The scalability largely depends on how you design your processes. Importantly, Sidekiq works outside of Rails, which is likely great news for many of you!
00:02:51.900 I have split this presentation into two parts, as I mentioned before, so let me start with the first part: Sidekiq on the surface. First, let's discuss why you should care about good practices. Firstly, you don't want to annoy your customers; imagine accidentally sending thousands of duplicate emails to them—this is something to avoid.
00:03:27.180 Secondly, you don't want to waste your money. Imagine you have a process where you expect some errors and want to retry them. In most cases, you don't want to log these errors into a monitoring service; you should only log the error once, if the job did not succeed. I will also discuss how we can avoid overspending on a monitoring service for errors that do not need to be logged.
00:04:03.720 The last concern is time; we want to debug efficiently and not waste time searching for parameters that we should pass to a job when queuing it manually. Let's start with the importance of naming things properly. As developers, we know this can be challenging. In the past, many of us used the term "workers," but about a year ago, the creator of Sidekiq decided to rename it to "jobs." Currently, we can use both terms: "sidekiq worker" and "sidekiq job," but in Sidekiq 7, we will only have "job" available.
00:04:58.620 So, make sure you remember that the next time you upgrade Sidekiq. The second point is to always use proper naming for your classes. Generally, it's a good practice to name classes based on their responsibilities. I have seen many poor examples in the past where a developer used the term "mechanisms". It’s clear that they did not understand the purpose of the class after a few weeks.
00:05:32.400 For instance, naming a class 'delete users' is too generic; we don’t even know that it's a background job. Instead, a more meaningful name would be 'remove outdated users job.' This way, we can immediately tell that this is a background job that removes users that are no longer active.
00:05:54.000 Similarly, the name 'resume processor' is also quite generic. I highly encourage you to review your codebase to ensure you're using meaningful names. If your jobs are related, creating an additional namespace or putting them in a separate directory is also advisable, as it provides better context. In my examples, I used 'stripe' as the connecting factor for all the jobs, hence the additional namespace.
00:06:30.420 Next, let's discuss keeping parameters simple. This principle is fundamental for background jobs. The basic concept of using simple parameters is not just to conserve memory, but also to simplify job creation. You can pass hashes, use many arguments, or even pass objects with Active Jobs, but with Sidekiq, you often have to implement serializers by yourself. While this is possible, it doesn't necessarily mean you should.
00:07:07.460 The intention behind using simple parameters is that they are easier to queue, either manually or automatically. This also makes it easier to find them when looking at the Sidekiq dashboard and logs, provides better isolation for testing, and reduces memory consumption in Redis, which is an in-memory data store.
00:07:36.240 As you can see in this example, instead of passing four arguments directly, you can refactor it by passing just a reference. This means that inside the job, you can pull the actual data, ensuring that you won’t end up with potentially outdated data.
00:08:06.300 For example, if you queue a job with an email, but the email changes while you're processing it, you might end up sending emails to the wrong recipients. To avoid this situation, always pass references; this way, you are guaranteed to pull the most current data.
00:08:21.300 Another key point to remember is that queuing jobs inside transactions is not good practice. First, you can queue a job, and if the transaction rolls back, the job won’t execute. Secondly, even if you place the queue process at the end of the transaction, you cannot be certain it will commit before executing the job. This problem has been present in Sidekiq, but with Sidekiq 7, there are improved solutions.
00:08:45.120 Keeping your logic simple applies to all classes. If our jobs are straightforward, they will naturally be smaller, which is vital for Sidekiq. For example, if we fetch all websites from our database and then scrub the title to save it back, problems may occur if the scraping process fails partway. Instead of one large job, we can split the process into smaller jobs.
00:09:21.420 This makes it easy to retry only the failed job without affecting the whole process. Smaller jobs can also be queued manually from the console if needed. The concept of smaller jobs also allows for faster processing through concurrency, where a job can run concurrently with others.
00:09:50.880 Faster processing means not only easier retries and simpler tracking but it also aids in implementing progress tracking, especially if you are using Sidekiq Pro or Enterprise, which offer batching features with useful callbacks.
00:10:22.320 Now, let's discuss the connection to Redis. If you have a separate instance for your Sidekiq and application, everything is fine. However, if you have a single instance, which is very common, and if your application is unaware of Sidekiq, you may run into problems when there are no more connections.
00:10:35.400 A simple solution is to use a connection pool for both your application and Sidekiq. Fortunately, Sidekiq provides a straightforward interface to manage this. You can use a block to ensure that you won’t run out of available connections.
00:11:01.320 In most cases, avoid using inheritance for background jobs. Inheritance presents a significant challenge, as you can only have one parent class, and managing multiple similar jobs can lead to a large, unmanageable, and non-testable parent class.
00:11:08.880 Instead, you can use modules and prepend them to better share responsibilities. As most developers know how to use `include`, `prepend` works similarly, placing the module at the beginning of the chain while `include` places it after the class definition.
00:11:54.300 One practical application here is error handling: you can put your error handling logic in a module, which makes your jobs cleaner. When it comes to errors, it's essential to implement proper retrying mechanisms.
00:12:32.280 Firstly, ensure you can roll back to the initial state. Smaller jobs allow you to retry without duplicating any operations, like unnecessary API calls that can harm performance. Regarding retrying, sometimes you don't want to log errors in your monitoring service.
00:13:38.760 The idea is that you should only log the error if it's a final error because the job might succeed later on. To manage this, create an error wrapper that inherits from the standard error class, saving the reference to the original error. This allows you to raise the error without immediately logging it. You should only log errors once you've exhausted all retry attempts.
00:14:27.960 When it comes to logging, focus on capturing meaningful information. Ensure that logs are saved properly, and consider logging job execution details. As of Sidekiq 6, you need to manage log redirection yourself, so make sure to save job logs to a file for future reference.
00:15:13.740 You can even use middleware to automatically log job arguments, which is helpful for debugging. Offering job-level grouping makes it easier to track occurrences through unique job identifiers.
00:15:54.540 Now, let’s shift gears to discuss the Sidekiq internals. Why should we care about how Sidekiq works under the hood? Understanding the internals not only helps us extend Sidekiq more effectively, but it also aids in efficient debugging.
00:16:31.020 Additionally, I firmly believe that by reviewing others' code, we become better developers. My approach to learning tools has evolved: I initially learn enough to build something functional, gradually improve it for production use, and finally delve into good practices for refactoring.
00:17:04.320 It’s crucial to consult documentation at every step of your learning journey, regardless of your level of expertise. As previously mentioned, Sidekiq heavily relies on Redis, an open-source in-memory data store based on key-value pairs.
00:17:49.200 Redis supports various data types, such as sets, sorted sets, lists, and hashes. The use of Redis by Sidekiq can be divided into two main steps: adding a job to the queue and picking a job from the queue.
00:18:15.840 Let's start with adding a job to the queue. The typical flow involves passing parameters, validating the data, assigning default parameters, executing middleware, and finally verifying the JSON.
00:18:58.080 As I mentioned earlier, passing simple parameters is crucial; complex objects might not pass JSON verification, resulting in errors, especially in Sidekiq 7. When JSON is verified, we push our queue to Redis.
00:19:32.700 Breaking down each step in detail, the job call translates to a hash. If you're planning to execute jobs later, extra arguments will be added to mark when the job should be executed. Sidekiq then validates the job and assigns default parameters before invoking the middleware.
00:20:17.160 This middleware allows you to reject jobs from being saved in Redis. The subsequent steps involve verifying JSON integrity and ultimately pushing the job data into Redis.
00:20:54.720 When saving data to Redis, you can choose to perform jobs immediately or later. For delayed jobs, Sidekiq uses 'add member with score' in a sorted set. The score represents the execution time, ensuring jobs are executed in the correct order.
00:21:38.880 To execute jobs immediately, Sidekiq triggers the 'add member to set' command and performs an 'L push' command for the payload. When picking jobs from the queue, Sidekiq employs two mechanisms: the puller and the manager.
00:22:10.920 The puller retrieves jobs from Redis when it's time for execution, passing the payload to the manager, which translates the parameters into a job instance and calls the perform method.
00:22:52.740 The puller uses the Z range by score command in Redis to take jobs scheduled for execution. The score indicates the execution time converted to float, allowing easy retrieval of jobs that should be executed at the current time.
00:23:20.520 Once the job is passed to the manager, it uses the BR pop command, a blocking list primitive that pulls jobs from the queues and executes them. If no jobs are available, it will wait for new jobs.
00:24:01.740 The manager flow involves decoding the payload, verifying its integrity, invoking the middleware for rejection opportunities, and finally executing the job by instantiating its class and calling its perform method.
00:24:51.180 Lastly, let's touch on the Sidekiq dashboard: a simple Rack application that displays job views in an easy-to-read format. The Pro and Enterprise versions are also gems that you can buy licenses for to get credentials for your gem files.
00:25:45.720 That covers Sidekiq for my talk. If you have any questions, please feel free to ask!
00:26:01.680 Thank you!
00:26:06.960 I have a question. You had a slide where a job was scheduled inside a transaction and you mentioned a race condition was present, yet you also noted that the new version of Sidekiq fixes this race condition.
00:26:44.400 Just let me find it...yes, this one! I’m curious, how does it handle that?
00:27:24.540 Exactly how will Sidekiq 7 deal with that problem?
00:27:46.440 To be honest, I don't know yet. The pull request is still open, and the release date has not been confirmed. Nevertheless, it has been a critical focus for the team.
00:28:49.680 So to clarify, it will ensure that the job does not execute until after the transaction commits successfully, is that right?
00:29:28.740 Yes, indeed! The job will wait until the transaction commits in the database.
00:30:02.280 Thank you for the insightful presentation!
00:30:39.960 Regarding keeping jobs simple, I have observed conflicting recommendations.
00:31:03.300 One suggests always using jobs only as wrappers for your service object call, while another recommends treating the job as a Ruby object and performing logic there.
00:31:45.480 Do you have a preference for either approach?
00:32:01.920 It all depends on the situation. I typically prefer performing basic checks before calling a service object to avoid duplication.
00:32:53.520 Though using service objects is common in practice, especially with Sidekiq's delayed extension. While the delay feature will be removed in Sidekiq 7, you can expect a separate gem offering similar capabilities.
00:33:41.520 Regarding scheduling numerous jobs at once, I generally recommend utilizing the batching feature, which allows for batch processing of thousands of jobs and you can also receive callbacks once they are all finished.
00:34:35.520 Implementing such techniques is crucial for common scalability and managing job loads.
00:35:15.720 Lastly, if you find yourself in a situation where jobs consume messages from a queue and handle the same record simultaneously, it's important to consider properly managing job execution.
00:36:06.590 Ensuring you maintain consistency prevents unnecessary duplication and keeps your jobs streamlined.
00:36:18.140 That wraps up my presentation, thank you for your time!