Talks

Inside Active Job

by Jerry D'Antonio

In this video titled "Inside Active Job," Jerry D'Antonio presents a deep dive into the internals of Active Job, a framework that supports performant, thread-safe, asynchronous job processing in Ruby on Rails. As part of the RailsConf 2016 event, he explains how Active Job unifies several background job processors and facilitates the execution of tasks without blocking web requests.

Key Points Discussed:

  • Introduction to Active Job:
    • Active Job provides an abstraction layer over various background job processors (e.g., Sidekiq, Delayed Job) enabling Rails developers to choose different processors seamlessly.
  • Purpose and Advantages:
    • It allows for the scheduling of tasks that can be run later, thus enhancing application performance by freeing up web request threads.
    • Background job processing is essential for handling long-running tasks, like sending emails, without impacting user experience.
  • Implementation Details:
    • Active Job utilizes job metadata to store essential information about enqueued jobs, allowing for both immediate and scheduled task execution.
    • The discussion addresses key terms such as queues, job serialization, and scheduling.
  • Concurrency in Ruby:
    • D'Antonio discusses Ruby's concurrency model and how blocking I/O tasks work effectively in a multi-threaded or forked environment, which is critical for performance.
  • Building a Custom Job Processor:
    • The presentation includes a practical example where attendees will build a simple asynchronous backend job processor using thread pools—highlighting the role of queue adapters, job runners, and the need for persistence.
    • The basic class structure for job processing is demonstrated, including methods for enqueueing jobs.
  • Key Methodologies:
    • Emphasis on the critical serialization and deserialization methods within job metadata, which are essential for maintaining job integrity through various states.
  • Testing and Development vs. Production:
    • The simplicity of the discussed custom processor is more suited for testing and development, rather than production, due to the lack of job data persistence.
  • Conclusion:
    • Active Job enables Rails applications to manage and execute background tasks efficiently, with the flexibility of changing job processors as application needs evolve.

Overall, the talk provides valuable insights into how Ruby on Rails facilitates background job processing through Active Job, making it easier for developers to handle asynchronous job execution without diving into the specifics of each processor.

00:00:09.500 This time to get started, this is Inside Active Job. This is the Beyond the Magic track.
00:00:15.150 My name is Jerry D'Antonio, and let's get started. So first, a tiny bit about me.
00:00:20.490 I live and work in Akron, Ohio. If you're an NBA fan, you've probably heard of Akron. There's a local kid who is a basketball player who's done pretty well for himself in the NBA.
00:00:26.789 I went to school about 10 minutes away from where I live, and I work at Test Double. You may have heard of Test Double; Justin Searles, one of our founders, was on the program committee for RailsConf this year.
00:00:34.230 He's speaking tomorrow. Test Double's mission is to improve the way the world builds software. I know that sounds pretty audacious, but we truly believe that every programmer has it in themselves to do that, and I believe every person here has it in themselves to do that, and that's why you're here.
00:00:40.050 So definitely spend great company at work, and I'm very proud to represent Test Double here. Personally, one thing I've done—my biggest claim to fame lately—is I created a Ruby gem called Concurrent Ruby.
00:00:52.199 You may have heard of Concurrent Ruby because it started to be used in some very well-known projects. For example, Rails uses Concurrent Ruby as a dependency of Action Cable in Rails 4 and Rails 5. It's also used by Sprockets, as well as gems like Sidekiq and Sucker Punch. It's also used by Active Elasticsearch and the Logstash utilities, and by Microsoft Azure Ruby tools.
00:01:09.330 So much of what I'm going to be talking about today draws on that, but this is not going to be a sales pitch for that. This is going to be about Active Job and Rails. Because this is a Beyond the Magic track, this is not going to be an introductory topic. This is going to be a deep dive into the internals of Active Job.
00:01:35.369 To do this, I need to make a couple of assumptions. I'm assuming that if you're here, you've used Active Job, probably in production. You have used one of the supported job processors and have some understanding of concurrency and parallelism. If you need a better introduction to Active Job itself, I highly recommend the Rails Guides.
00:01:55.500 The Rails Guides are excellent and provide a lot of great information regarding concurrency within Ruby itself. Shameless plug: I did give a presentation last fall at RubyConf called 'Everything You Know About the GIL is Wrong.' That video is available on YouTube and could be an introduction to that.
00:02:13.810 So with that, let's jump into what Active Job is. First, I need to briefly remind us of what it is and where it came from. According to the Rails Guides, the definition of Active Job is as follows: Active Job is a framework for declaring jobs and making them run on a variety of queuing backends. Jobs can be everything from regularly scheduled cleanups to billing charges, mailings, or anything that can be chopped up into small units of work and run in parallel.
00:02:40.599 A couple of key terms there: it's a framework. We're going to talk more about this, but asynchronous job processing pre-existed Active Job. There were things like Backburner, Delay Job, Queue, Rescue, Sidekiq, and Sucker Punch—many of these things existed before Active Job was created. Active Job came along as a way of unifying those.
00:03:12.340 Active Job helps us schedule tasks to be run later, which was briefly mentioned this morning in the keynote. When you don't want to block currently running web requests and you want something to happen later, you use Active Job to make that occur. This can happen through ASAP processing, which is where I'll get to this as soon as you can, or by scheduling it at a later date and time.
00:03:43.209 This also allows us to support full parallelism. That's why some of the job processors are multi-threaded. However, many of them actually fork. I'll talk about forking later and how it can run multiple processes on a single machine and scale across multiple processors, and in some cases across multiple machines.
00:04:01.239 The impetus for Active Job is that background job processors exist to solve a problem. We have these long-running tasks that we don't want to block web requests. We want to respond back to our user and get the page rendered for them, while some of these tasks can then occur afterward. For example, if I'm sending an email, the email takes time; it's asynchronous to begin with.
00:04:24.820 Why should I block the request to ensure that an email is sent when I can send the response back and have that processed shortly thereafter? Active Job supports that, and the processors behind that support it as well.
00:04:54.410 Active Job is important because all of these job processors existed, and each one, even if they did virtually the same thing, had slightly different capabilities and approached it differently. They all saw the same problem, right? Active Job was created to provide a common abstraction layer over those processors that allows the Rails developer not to worry about the specific implementation. This sounds familiar; this is not dissimilar from what Active Record does.
00:05:40.560 Relational databases existed, and Active Record created an abstraction layer over that, which allows us to use different databases quite freely, switching between different databases if necessary. Most importantly, we can use different databases in test, development, and production environments.
00:06:14.710 Active Job does the same thing; it provides that abstraction layer that allows us to choose different processors and change different processors as our needs evolve, while running different processors in test, development, and production—all while supporting the existing tools that people are already using.
00:06:47.730 So because we're looking at some code, I want to briefly remind us what the code looks like for Active Job before jumping into the internals. This is a simple job class, and this should look familiar to everyone. The important part is that this class extends Active Job Base and that it has a perform method.
00:07:11.790 Most of what Active Job does is encapsulated in the Active Job Base class, which we will eventually look at in detail. The perform method is called on your object of this class when the job actually runs, and we will look at those details shortly. As a reminder, the way we configure our backend is we use the job queue Active Job queue adapter configuration option within our application.rb.
00:07:49.470 Now inside Jobs, what I’m going to call the adapter, we are going to build here a real adapter that is functional. The supported adapters by Rails have a symbol that follows normal Rails conventions that maps the adapter name to what you set the backend value to.
00:08:02.600 So if Inside Job existed as a supported adapter in Rails, this would be how you would set that up. That's how you can configure which backend you want to use.
00:08:20.220 Then later, when you want to actually do something, you call the perform later method on your class, passing it one or more parameters. That should look familiar to everybody, and if you want to schedule the job for a certain time, then you can use the set function to specify when.
00:08:47.140 There are a number of different ways you can do that. So that's just a reminder of what we see on the front-end of Active Job—but what everybody is going to talk about is what goes on behind that when you make this perform later call.
00:08:56.850 So, like I said, we're going to build an asynchronous backend here during this presentation—one that actually works, is functional, and meets the minimal requirements of Active Job.
00:09:38.100 A couple of things to give a sense of where we're coming from: as I mentioned, there are multi-threaded adapters and forked adapters. Multi-threaded adapters run your job in the same process as the Rails app itself. The advantage of that is that they can be very fast, and you don't have to spawn separate processes to manage.
00:10:01.360 We all know that MRI Ruby does have some constraints with regard to concurrency, but it's not as bad as most people think. That's something I talked about at RailsConf last fall. MRI Ruby is very good at multi-threaded operations when you're doing blocking I/O, and most of the tasks that create these background jobs are doing blocking I/O.
00:10:30.700 They are sending emails, posting things to other APIs, and since they tend to do blocking I/O, they work very well with Ruby's concurrency model. So a threaded backend is simpler because you don't have to manage separate processes. However, many of them do spawn forking processes where you have separate worker processes, providing full parallelism, but they require active management of those processes.
00:10:59.540 For the one we’re going to build here, we’re just going to do a multi-threaded one because I can easily do that, and it will demonstrate all the things we want to do. We’re going to use thread pools for that. Most job processors will also persist the job data into some sort of datastore like Redis.
00:11:43.580 The reason for doing that is that if your Rails process exits, either on purpose or by crashing, if all of your job data is in memory, you can lose it, and those jobs will never run. Generally speaking, for production, you want to have a job processor that does store the job data in some sort of external datastore to allow it to persist beyond restarts.
00:12:17.160 We're not going to do that here mainly for simplicity. I want to demonstrate what goes on in Active Job.
00:12:20.100 We don't have to go to that level of effort, so our job processor will not persist through a datastore. It makes it good for testing and development, but we wouldn't want to use what I'm going to build here today for production.
00:12:39.289 In order to do this, we need three pieces. The first one is an Active Job Core, which is provided by Active Job itself. It is the job metadata, and I want to talk about that more, but it is the thing that defines the job that is going to need to perform later on. It's probably the most important piece of all this because it is the glue that binds everything else together.
00:13:03.360 The two pieces we go to are the queue adapter and the job runner. Remember, Active Job came about after the job runners, so the job runner is independent and provides the asynchronous behavior. The job runner exists separately; Sidekiq is a separate thing, and Sucker Punch is a separate thing. You install those separately.
00:13:23.379 The queue adapter’s only responsibility is to marshal the job data into the asynchronous processor. The job processor provides asynchronous behavior, and the queue adapter marshals between your Rails app and the job processor.
00:13:43.990 For all the job runners supported by Rails, the queue adapter is actually in the Rails codebase. If you go to GitHub during the Rails repository, look in Active Job; you will see that there is a folder of queue adapters and one queue adapter for each of the processors that Rails supports. There is also a set of limited tests as part of the Rails codebase that are run against every one of these job processors on every commit, and they ensure that all of the supported job processors meet the minimum requirements of Active Job.
00:14:31.080 The one we're going to build today will actually pass that test suite and run. So strictly speaking, the Rails core team has responsibility for the queue adapters and for that test suite, but knowing from experience, the people who create the job runners themselves work very closely with Rails to make sure that those adapters are up-to-date and work well with the processors.
00:14:50.359 Now let's jump in and talk about the Active Job metadata class. This is the glue that ties it all together, and it is not obvious. This is the job metadata object that represents all of the information about the job you’ve posted. It carries with it the proc that needs to be run, along with things like the queue and priority information.
00:15:17.059 It carries with it all of that metadata. This object provides two very important methods, which we'll discuss more in a minute, but they are the serialize and deserialize methods. These methods are very critical, and I'll talk about them shortly.
00:15:50.659 The job metadata itself has several attributes that we will look at and use internally within Active Job. These are not things that a Rails developer should have to know about, but they are very important within Active Job.
00:16:17.779 One of the attributes is the queue name. Most of us should be familiar with the concept of specifying which queue a job should run against when creating it. If you don't specify one, it defaults to the default queue.
00:16:41.860 Several job processors support prioritization, where higher priority jobs run first, but we are not going to support prioritization in ours. Additionally, if you schedule a job to run at a specific time, you will get what is called schedule_at, which tells you when to execute it. This is where we will look at how scheduled jobs are processed.
00:17:07.919 The job ID is internal to Rails and is a unique ID specific to the Rails instance that identifies each job. Rails uses that within Active Job to track each one of these jobs. The provider job ID is one that you can provide within your job processor, allowing you to have your own kind of ID system that makes sense for you.
00:17:21.459 We're not going to use a provider job ID today because it’s not essential, but it is available, and it's something we could add.
00:17:43.180 Now let's actually build a queue adapter. I'm going to go outside in; the queue adapter is responsible for marshaling data into the job processor. The job processor is the more interesting piece that we will look at in a moment.
00:18:01.779 When we start with the queue adapter, we'll sudo TDD this. Most of the queue adapters were written when Active Job was created because the job processors already existed, and they had to handle that marshaling.
00:18:28.350 In our case, because we don’t have a queue adapter or processor yet, we can decide what the API is going to look like. Within our queue adapter, we only need two very simple methods: one is enqueue and the other is enqueue_at.
00:18:49.750 The enqueue method takes that job object we looked at a minute ago and marshals that into our processor, and the enqueue_at method takes the job and a timestamp and marshals that into our job processor.
00:19:01.360 So notice that in this case, I've decided to make these very simple. We're going to create a class called Inside Job that has class methods called enqueue and enqueue_at.
00:19:28.620 These methods will take the serialized job, pass the queue name, and in the case of enqueue_at, we will pass the timestamp. This is not very complicated!
00:19:46.610 These are class-level methods we’re calling on this class, and I did that to emphasize the stateless nature of this implementation. This is very critical to understand.”},{