Rack Middleware as a General Purpose Abstraction

00:00:14.360 Okay, so I'm going to be talking about middleware as a general abstraction. I'm really excited to give this talk because I wanted to discuss middleware for a long time. I've used it extensively, and I believe it hasn't been widely adopted in the Ruby community, despite offering a lot of value. Hopefully, you can take something away from this presentation. My name is Mitchell Hashimoto, and that's my Gravatar, Twitter handle, and the username I use on GitHub and everywhere else. I created Vagrant, so if you use Vagrant, that's me. I would also like to give a quick shout-out to Engine Yard for the opportunity to be here and speak, as well as to my company for allowing me to travel extensively each year for conferences.

00:00:46.530 Diving straight into the problem I want to address: the issue of large classes. This has been mentioned a few times today in previous talks. I’ll provide a real example—specifically, the canonical example of a large class that always seems to be the user model in any Rails application. This is because users are integral to dynamic web applications.

00:01:14.549 I recently considered a high-profile Rails site, Diaspora, a distributed social network. I scrutinized their user class, which is pretty extensive. You can’t read any of this due to the tiny font, but I'll page through the class so you can see its length. Here’s page one... page two... page three... page four... page five... and I could continue, but at this point, we are about thirty percent through this class. This isn’t an isolated case; I have worked as a Rails developer for four years, and I have seen this happen with many models. There are several problems with having such a high line count in a single class. First and foremost, it becomes unapproachable. If you hire a developer to work on Diaspora and want to add an unusual feature to friend requests, it may take them a couple of days just to feel comfortable enough to modify the code, even if tests are in place. There’s just so much to manage and understand within that namespace.

00:02:20.040 It’s very unclear what each method actually needs when it’s called. If you change one method, will it create a ripple effect that breaks something else? You might argue that your tests cover this, but let's be honest, tests can miss edge cases. It’s better to address the problem before it becomes an issue. Additionally, testing a class like this can be quite difficult because so much state needs to be set up before you can even begin testing the methods. While it’s possible to test, if a failure occurs, it’s challenging to identify which state combination led to that failure and what the underlying issue is. Those are the major problems associated with large classes, and this issue is pervasive throughout the community.

00:03:40.260 Let me propose a potential solution. This is an idea I've seen gaining traction in the Ruby community over the past couple of years: mix-ins. I sought out a real-world example to illustrate this. I found a popular Rails authentication library called 'Devise.' This is their session-based class, which is essentially a user class. It’s concise—roughly thirty lines of code. However, the effective code that accomplishes anything is minimal since it primarily consists of including another module.

00:04:01.680 Before I delve into why this approach doesn’t work, I want to clarify that modules are fantastic when used correctly. They are a great feature of Ruby, and I utilize them frequently. However, for the problem I’m highlighting with large classes, I don’t think they represent the full solution, at least not by themselves. The first issue I have is that modules are intended to be reusable blocks of code. For example, a canonical instance would be the Enumerable module from Ruby’s standard library, which you can include in a class, and suddenly that class benefits from powerful methods for collection manipulation. In the case of these Rails classes, very few of the modules are actively reused elsewhere. So, the process of extracting these methods into modules merely results in a reorganization of code. Sure, you can take pride in having a lower line count and organized files, but does that genuinely help with the earlier issues I raised? No, it’s still unapproachable and still difficult to test.

00:04:58.930 Moreover, the real problems still persist. A second major issue is that the order in which these modules are included can be significant, and this isn’t always clear. The comments in the class body are the sole indicators that the order matters. When you think of reusability, there’s a risk that someone else will try to mix in modules and introduce them in the wrong order, which will cause the code to break. Thus, you’re setting yourself up for failure right there. As it relates to testing, while you might think it’s straightforward to test these modules—by creating a fake class where you include them—this process gets messy fast. Many of these modules have dependencies on Active Record and won't function correctly unless certain conditions are met in the class.

00:06:13.000 There’s no way in Ruby modules to enforce that `existence` must be included to use persistence. This creates confusion, making it increasingly difficult to determine what you are testing and why any failures occur. Essentially, you’ve wasted your time trying to refactor into modules if you haven’t solved the underlying problem of large classes.

00:07:03.450 Before I present what I believe to be a solution to this problem, let’s review what function composition is. It’s a fundamental concept that I want everyone to understand to avoid confusion later in this talk. Here’s an example of two simple methods: `f` and `g`. The method `f` adds one to a value and returns it, while `g` multiples its argument by two. When composed, as in `f(g(1))`, you’re effectively doubling 1 then adding 1 to it, resulting in 3. Conversely, `g(f(1))` first adds 1, resulting in 2, which then doubles to give 4. The key is that when composing functions, their order and dependencies are clear. This is exactly what we want.

00:07:49.850 Function composition yields cleaner, more understandable code by separating concerns effectively. Each function doesn't need to maintain any state from outside; it relies solely on input and produces output. They’re straightforward to test, enabling you to evaluate their functionality without colliding with shared state. The separation afforded by function composition makes it clear what dependencies exist, leading to higher testability. In a well-structured system, if you properly validate the input and expected output, you can rest assured that the functions will perform correctly when integrated.

00:09:01.600 So, considering all of this, let’s loop back to the original larger problem of large classes. The answer I propose is middleware. To this point, middleware has typically been linked to specific use cases, perhaps best epitomized by Rack, but it’s useful across multiple contexts. Before diving into examples, I’m curious—how many of you are familiar with the concept of middleware? Raise your hand if you don’t know what middleware is.

00:10:06.310 That’s impressive! Normally, it’s about half the audience who hasn’t encountered middleware before. Middleware originated from Python, specifically from the WSGI interface proposal—the Python equivalent of Rack. Middleware is essentially a clever method for managing request processing. In a web request, a request comes in, and you want to execute various operations as it is processed, and perhaps also record analytics on the response as it goes out. Some typical pre-processing might involve URL rewriting, while post-processing may involve tracking response codes.

00:10:51.630 Using Ruby as an example, this is how constructing a Rack middleware stack appears: it employs a DSL (Domain Specific Language), allowing you to stack various middleware components. When a request comes in, it flows down the stack from one processed middleware to the other, then returns back up. For instance, the `ShowExceptions` middleware is the last one hit before processing the request and the first one hit when the response comes back. Let’s look at a sample middleware to see how straightforward they are. The simplest middleware has an interface that comprises an initializer and a call method.

00:12:02.569 The initializer takes the next middleware component to call and stores it. Then, the `call` method is invoked with a context object comprised of a large hash that carries any necessary state. During the `call`, you manage the incoming request—until you’re ready to trigger the next middleware by invoking it recursively. This ensures that as control moves forward, you can implement preprocessing and move back to the initial state after processing the request.

00:12:56.649 So why is middleware beneficial? Firstly, the order of operations in middleware is explicitly defined. With Rack, you can be assured that, unless you deploy some incredibly unusual Ruby magic, the order is consistently maintained as the middleware processes a request down and back up. In contrast, modules obscure that clarity. Secondly, within the middleware class, dependencies are clear. For instance, when writing a URL rewriting middleware, the code might examine the current HTTP path and, if it matches a specific condition, it rewrites it before moving on. Importantly, with Ruby's dynamic nature, there’s no solid compile-time validation, but having well-structured middleware can facilitate clearer dependencies.

00:13:24.300 Testing middleware further enhances maintainability. When testing a middleware component, you have one clear method in which the state is established and passed through. You can verify the output, assess the changes in the world state, and drive assertions based on those outcomes. Like any good practice, there may be some side effects to observe, but middleware allows the capturing of external state changes gracefully since all the state resides within the middleware chain. Often referenced literature like "Growing Object-Oriented Software" efficiently demonstrates how this can be accomplished. You set up a given state, invoke the middleware logic, and assert what you expect afterward.

00:14:44.410 Astoundingly, we are now equipped with the benefits derived from function composition! Middleware can also be quite powerful, allowing for extensive subclasses. If you establish a middleware, you can create subclasses that tweak the behaviors without affecting other parts of your system. This is crucial as it provides flexibility for future enhancements. For context, Vagrant, which I’ve developed, manages a virtual machine lifecycle—bringing up VMs, configuring them, provisioning them with tools such as Chef, Puppet, etc.

00:15:53.430 I once encountered a similar problem to the user model experienced within Rails apps, where I had a Vagrant model that was inflating in size, causing unnecessary complexity. As I explored potential solutions, I stumbled upon the notion of using middleware. One example is a simple command in Vagrant: `vagrant suspend`, a command that pauses a running virtual machine. The implementation reflects the idea behind middleware; it utilizes a stack to process the command, passing a virtual machine object down the stack like a request while returning it through the middleware stack upon completion.

00:17:08.990 Let’s examine one of the middlewares that manage validations. This middleware is designed to check if certain parameters meet the necessary configuration requirements. Leveraging middleware creates a clear, streamlined path for behavioral changes—if those validations could trigger an exception, this happens consistently across the call stack. Additionally, let’s look at the `vm_suspend` command, which also exemplifies how easy it is to manage VM states. It simply checks whether the VM is currently running and, if so, proceeds to suspend it.

00:18:17.100 The `vagrant up` command, often regarded as the quintessential command in Vagrant, does a great deal. It may download multi-hundred-megabyte images, import them, set up port forwarding and shared folders, configure them with Chef, and much more. If this functionality were all lumped together in one method in a monolithic file, it could potentially become an unwieldy 500-line mess, making it challenging to maintain and navigate. Instead, `vagrant up` comprises a stack of 22 middleware components that execute these various tasks.

00:19:29.400 Additionally, notice that the middleware components serving this command are reusable. For instance, two middleware pieces are reused from the `suspend` command in `up.` This flexibility allows each of the middlewares to assert their encapsulated logic without impacting each other's functional capabilities.

00:19:50.960 Seeing middleware utilized across web requests, VM management, and user models illustrates its versatility. I want to cover a practical example of extracting middleware functionality from a large class that does not employ this abstraction yet.

00:20:18.679 Let’s revisit the user.rb file. If I were part of this project and needed to work on it, I would identify the `accept_invitation` method as a candidate for extraction into middleware. Typically, methods that signify an action—like ‘accept’—indicate a function that can be modularized.

00:20:50.720 In determining logical blocks for this method—almost like isolating different functionalities—I might identify several sections: the `self.not_invited` method stands out as a clear logical separation; likewise, the `setup` method has no return value, implying it likely produces side effects. Furthermore, the extensive block for setting up the user, saving changes, performing error handling, and finally deleting invitations could be good candidates for middleware extraction. I’ll take `setup` for simplicity.

00:21:54.170 The middleware version of the `setup` method would primarily involve adjusting dependencies. Instead of relying on `self`, which refers to the user instance, I’d identify what state it needs. In the middleware, the state is passed in as part of the environment's state bag, isolating its dependencies. Errors would raise exceptions instead of merely returning false, allowing for a more graceful halting of execution across the middleware chain.

00:22:29.060 This process makes `accept_invitation` look significantly clearer, as it becomes separated into distinct steps. Each individual `setup` operation can be reused across various points—such as user registration and password resets. Moreover, middleware execution can be run through a centralized method that manages execution passes, thereby streamlining dependency tracking.

00:23:54.330 Middleware doesn’t replace every situation within coding practices. It’s merely another tool in your toolbox; like functions or classes, avoiding oversight is critical to achieve efficient organization. Use middleware effectively when multiple components need to share responsibilities without holding unwanted shared state.

00:24:59.020 Consider the user model example: if you establish middleware for operations such as accepting invitations and signing up users, they share some functionality yet could function independently. Instead of intermingling state, it is more effective to pull common operations into middleware while ensuring clarity of behavior.

00:26:18.360 The bottom line is learning when and where to implement middleware within your system. It’s about taking a step back, reviewing your problem space, and deciding where middleware can fit based on its qualities while gaining the benefits it brings. There are plenty of fascinating avenues with middleware—like implementing error recovery that I would love to expand on—but given time constraints, I’ll pass the floor to some questions now.

00:27:29.180 Regarding the state bag, I agree that it’s less than ideal. Although I’ve experimented with more constraints in Vagrant middleware, where specific arguments are required, it often introduced complexity beyond practicality. Despite those attempts, the flexibility of the current state bag generally makes for a more manageable framework.

00:28:01.370 As far as libraries, I think one could create a standardized Rack-like format for building middleware, similar to what I’ve already developed but without dependencies. It's a straightforward implementation; you would maintain an array of classes and apply them in sequence. Implementing error recovery throughout the process expands it to some exciting functionalities.

00:29:01.370 Back to a prior question, utilizing middleware does raise questions about performance. There’s a call stack which can become substantially deep, making debugging complicated as well. However, as Vagrant logs middleware calls and can show the associated state when errors occur, tracking down issues becomes less arduous.

00:30:06.370 The key takeaway remains: middleware’s efficiency and utility grow when harnessed appropriately, striking a balance between abstraction and practicality while maintaining performance. Thank you! If there are no further questions, I appreciate your engagement.