Better Routing Through Trees

00:00:23.199 Hello everyone, it's great to be back here at MountainWest RubyConf. The first time I presented at a conference was back in 2009 here at MountainWest. I'd like to apologize to all of you in the audience who had to endure that. My last privilege of speaking at MountainWest was back in 2010, and I'm thrilled to be back in 2015 giving this presentation on better routing through trees.

00:00:34.040 My name is Jeremy Evans, and I'm the lead developer of the SQL Ruby database library and the author of several other Ruby libraries. I'm also the maintainer of Ruby ports for the OpenBSD operating system. In this presentation, I'm going to discuss an approach to routing web requests that I call a routing tree and explain the advantages that a routing tree offers compared to routing approaches used by other Ruby web frameworks.

00:00:54.320 Before I can speak to the benefits of a routing tree, I first need to discuss earlier approaches to routing. Let’s first delve into a brief history of routing web requests in Ruby. I first started using Ruby in late 2004, and at the time, there were only a few choices for web development in Ruby. One choice was the old-school approach of using CGI without a framework.

00:01:10.640 With CGI, you generally had separate files for each page, and the routing was handled by Apache, so routing really wasn't an issue. Another choice was Nitro, which is a web framework. I'm guessing many of you have never heard of it since the last release was in 2006. Nitro used static routing where the first segment in the request path specified a render class, and the second segment was a method to call on that render class. Another choice was Rails, which at the time was at version 0.8.5.

00:01:43.800 Back then, Rails did not even perform routing itself. You could use Rails to generate pretty URLs, but it could not actually route URLs by itself. If you wanted to use Rails with Apache, you had to use rewrite rules in Apache. Here's an example from the Apache configuration that shipped with Rails 0.9.1 that takes the controller, action ID path and splits it into separate parameters for the controller, action, and ID.

00:02:06.960 In this example, it implemented the static routing that Rails supported by default, but using more advanced rewrite rules in Apache, you could achieve custom routing. In addition to supporting Apache, Rails at that time also worked with Webrick; however, if you were using Webrick, you were restricted to the controller action ID static routing, which was hardcoded into the Webrick integration. It was not possible to use custom routing with Webrick.

00:02:40.320 Rails started supporting custom routing in version 0.10.0. The default was still the controller action ID supported by the previous static routing, but you could use different patterns to support custom routing. Internally, Rails stores each route in an array, and when a request came in, Rails would iterate over the stored routes checking each route to see if it matched the current request. As soon as it recognized the request path, it would return the appropriate controller class, which would then handle the request.

00:03:10.960 Rails continues to use this basic approach of iterating over an array of routes even today. This code is from Rails 2.0, which still just iterates over every route looking for a match. Around the same time, Sinatra was released, offering a radical simplification for how web applications can be developed by specifying the routes directly, with each route yielding to a block to handle the action.

00:03:39.840 While externally, Sinatra looks much simpler than Rails, internally it uses a similar process for routing, storing the routes in an array and iterating over the array when a request comes in, stopping at the first matching route. Over the years, Sinatra's basic approach has not changed much. This is the current Sinatra code for routing, slightly simplified, and the main change from the original implementation is that it now has separate arrays of routes per request method.

00:04:01.079 However, after getting the array of routes for the request method, it still just iterates over every route looking for a match. While Sinatra's approach has not changed much over the years, Rails’ approach has changed significantly. From Rails 2.1 to 2.3, Rails attempts to optimize route matching by checking for initially matching segments in the path.

00:04:24.400 If the current route's prefix and the current request's prefix do not match, it can skip subsequent routes with the same prefix. I wasn't able to find a current web framework that uses a similar approach, and modern versions of Rails use a different approach so I will not be discussing that further.

00:04:50.440 In Rails 3.0 and 3.1, Rails uses Rack::Mount to handle routing. Rack::Mount is a dynamic tree-based router that organizes routes into a tree based on the parameters you provide, such that it can also skip similar routes if the current route does not match. If you want to use Rack::Mount for routing but do not want the overhead of Rails, there's a web framework called Sinatra that acts as a thin wrapper over Rack::Mount.

00:05:10.480 Then in Rails 3.2, Rails switched to Journey for route handling, which implements a deterministic finite automata engine for request path matching in pure Ruby. Journey can take a request path and return all possible routes that could match it. Then, Journey iterates over this array of possibly matching routes checking each route to see if it actually matches the request.

00:05:33.440 The routes are sorted by priority, and in general, the first of these resulting routes will be used to handle the request. If you want to use Journey for routing but do not want the overhead of Rails, there's a web framework called Roda that is a thin wrapper over Journey.

00:05:56.480 In January 2010, while Rails was at version 2.2, Christian Neukirchen, the author of Rack, was working on a proof of concept router named Rum. The fundamental difference between Rum and the other web frameworks I've discussed is that in Rum, routing is not separate from request handling. Instead, for each request, Rum yields to a block, and routing is performed by calling the 'on' method with arguments. If all of the arguments are true, 'on' will yield to the block that is passed to it; otherwise, 'on' returns nil without yielding.

00:06:40.560 The 'get' path and 'paths' here are predicates that check against the current request. The 'get' method returns true if the current request uses the GET request method. The 'path' method with the argument returns true if the first segment in the request path is present, and the 'param' method with the argument returns true if the request has a parameter with that name. By nesting these calls to 'on', you build what is basically a tree using Rum's DSL.

00:07:01.200 At any point in any of these blocks, you can handle the current request. One issue with Rum is that it was never released as a gem, so in April 2010, Michael Martins took Rum, added support for HAML templates, and released the Cuba gem. Over the next four years, he and others improved on Rum's initial design. In July of last year, after many years of using Sinatra as my primary web framework, I tried out Cuba.

00:07:27.600 I found that using a routing tree made certain aspects of web applications significantly simpler. However, some aspects of Cuba's design and implementation had issues that made it more cumbersome to use than Sinatra. So, I forked Cuba and released Roa, which keeps the same routing tree approach introduced by Rum but otherwise tries to make it more user-friendly and natural, as well as significantly faster.

00:07:42.720 Let me summarize the routing approaches I've discussed so far. The first is completely static routing, using separate files with CGI or using very early versions of Nitro and Rails. While static routing is fast, it is also inflexible, and these days, I don't think anyone would consider using a framework that did not support custom routing.

00:08:07.600 The next approach was used by early versions of Rails and is still used by Sinatra, which involves storing routes in an array and iterating over the array when a request comes in, testing each route to see if it matches the current request. This process is fairly simple to implement and understand, but it makes routing performance decrease linearly as the number of routes increases.

00:08:29.680 Next, we have Rack::Mount, the underlying router used by Rails 3.0 and 3.1 as well as by Roda. Rack::Mount organizes routes into a tree based on the parameters you provide so that matching prefixes are shared by multiple routes, significantly increasing routing performance but increasing complexity.

00:08:51.800 Then, we have Journey, which is used by modern versions of Rails and Roda. While it is certainly faster than previous approaches used by Rails, it's probably the most difficult for the average Ruby programmer to understand. Finally, we have the routing tree approach introduced by Rum, which is currently implemented in Cuba and Roa.

00:09:12.560 There are three basic ways that these routing implementations differ. The first is a quantitative difference: the performance characteristics of these different routing implementations show different performance characteristics, especially as the number of routes increases. To determine the performance differences, you need to benchmark the implementations with varying numbers of routes.

00:09:49.760 The issue with comparative benchmarks is that they are generally biased to show the advantages of the benchmark creator's preferred choice. The benchmark I'm using is called r10k, which benchmarks each of the implementations using 10, 100, 1,000, and 10,000 routes. I wrote r10k because the only other comparative benchmark I could find only benchmarked Hello World applications with a single route.

00:10:18.520 While the structure of the sites benchmarked by r10k is friendly to a routing tree approach, it should be friendly to most other routing approaches as well. r10k is open-source on my GitHub, and I welcome external review to make sure I'm not doing anything unfair to the other web frameworks that I'm benchmarking.

00:10:29.720 Now, let's first look at the results for 10, 100, and 1,000 routes. Here are the runtime results. Pay no attention to the absolute numbers, as it's only the relative performance differences that matter. One thing to note about these numbers is that r10k benchmarks using the Rack API directly, so this does not include any web server overhead.

00:10:48.480 From this graph, you can see that at 10 and 100 routes, Rails is an outlier, taking about three times as long as the next slowest framework. However, when you get to 1,000 routes, Sinfeld and Sinatra take almost as long as Rails, with Sinfeld significantly slower than Rails.

00:11:12.640 When you go to 10,000 routes, Sinfeld and Sinatra take much more time than all of the other web frameworks. I'm not sure why Sinfeld performs so poorly in this benchmark; it's supposed to be a very thin layer over Rack::Mount, so it could either be an issue with Rack::Mount or how Sinfeld uses it. I'll take them out of the performance picture because they throw off the scale of the graph.

00:11:32.480 With those frameworks gone, the performance picture is clearer. Near the bottom is the static route implementation, which is the fastest routing you can get, but again, I don't think anyone would really consider a static routing framework these days. The next fastest is Roa, followed by Roda, and then Cuba. From this graph, you can see that a routing tree approach is not necessarily the fastest; performance is highly dependent on the specific implementation.

00:12:03.600 Part of performance is also the amount of memory used. Here are the memory results for 10, 100, and 1,000 routes. As you can see, up until about 100 routes, all the web frameworks except Rails are clustered around 20 MB of memory.

00:12:25.320 At 100 routes or 1,000 routes, there are three basic groups: the static route implementations (Cuba and Roda) are under 20 MB, Sintra-, Sinfeld, and Roda are between 30 and 40 MB, and Rails is much heavier at about 70 MB. When you go to 10,000 routes, the picture remains similar; however, Sinfeld jumps to the top of the memory list. Not only does Sinatra use about twice as much memory as the routing tree implementations, but Rails uses about four times the memory.

00:12:54.320 One of the reasons that the routing tree implementations are memory-efficient is that the routes themselves are not stored in a data structure; the tree is essentially Ruby's abstract syntax tree for the routing tree block.

00:13:07.600 To review these benchmarks, aside from the static routing approach, which isn’t widely adopted, Roa has the fastest implementation and uses the least amount of memory. Roda shows that Journey's approach to routing is also fast, and next comes Cuba, which is simply faster than Sinatra or Rails but significantly slower than Roa. Rails is a fairly heavyweight framework but doesn't see drastic performance changes even with a large number of routes.

00:13:40.440 Finally, we have Sinfeld and Sinatra, which both have significant performance issues with large numbers of routes. It's important to keep in mind that these performance numbers are purely routing performance. In many applications, routing performance will not be the bottleneck of the implementation, as the application will spend much more time handling the request than routing it.

00:14:16.040 However, in the applications I’ve converted, Roa's performance is noticeably faster compared to Sinatra and Rails, as shown by the amount of time it takes to run the tests after converting applications from Sinatra to Roa using the exact same Rack test-based integration tests. Tests ran about 50% faster after conversion from Sinatra to Roa and were about twice as fast when converting from Rails to Roa.

00:14:36.640 But I believe this performance improvement is likely due to Roa having lower per-request overhead rather than faster routing performance. So, I mentioned earlier that routing implementations differ in three ways. The second way they differ is the qualitative difference regarding the internal complexity of each implementation.

00:15:05.040 These routing approaches vary widely in their implementation complexity. Static routing is the simplest, just parsing the request path using a single regular expression and using the captures to call a method on an object. Iterating over an array of routes and checking each route to see if it matches is also fairly simple and understandable for the average Ruby programmer.

00:15:42.000 Rack::Mount's approach of analyzing the route set and building a tree is much more complex, and I think the average Ruby programmer would struggle to understand it without a lot of time spent studying it. Journey is even more complex than that, and to understand it, you should remember the compiler courses you took in college or be prepared to do some research on compiler implementation.

00:16:26.000 Our routing tree approach is similar in complexity to iterating over an array of routes. You start at the top of the routing tree block; each method call checks if the current route matches the request. If so, the process is repeated for the block that you passed to the method; otherwise, you continue to the next method.

00:16:54.000 The second difference between these routing approaches is the implementation complexity. Static routing, iterating over an array of routes, and using a routing tree all have relatively simple implementations that are easy for programmers to understand. Both Rack::Mount and Journey have more complex implementations that require more time for the average Ruby programmer to comprehend.

00:17:18.080 How does the internal complexity of the routing implementation impact users of the framework? Generally, the higher the implementation complexity, the more difficult it becomes to find other programmers who can understand the code, add features, and fix bugs. In general, more complex code is harder to debug than simpler code, and as a rule, unless there is a substantial benefit from the complexity, simplicity should be preferred.

00:17:55.040 Ultimately, most users of a framework treat internal complexity as an externality, something that does not directly affect them and thus does not influence their decision to use the framework. We return to the three types of differences between routing implementations: performance and implementation complexity were the first two.

00:18:36.000 The third way routing implementations differ is a qualitative difference in how routing integrates with request handling. Routing a request is not an end in itself; it's a means to ensure the request is handled correctly. With a routing tree, routing is not separate from request handling; the two are integrated. Therefore, while routing a request, you can also manage the request.

00:19:05.840 For all other implementations I've discussed, routing is separate from request handling. This integration may not initially sound important, but I believe it has the most significant impact. I will discuss the advantages of integrating routing with request handling and how frameworks lacking this integration offer similar functionality.

00:19:42.080 Let me start with some example Sinatra code, which is fairly simple. Here we have two routes, one for GET and one for POST, both related to a specific album. When I was using Sinatra, this was pretty typical of many of my applications. However, the main issue with this approach is that it leads to duplication. We see the path is duplicated in both routes, and the retrieval of the album from the database is also duplicated.

00:20:03.360 Using a routing tree allows simplification. Instead of duplicating the path in both cases, it can be specified once in the branch. As soon as the branch is taken, the album is retrieved from the database, making the album instance variable available for both GET and POST routes.

00:20:38.320 The primary advantage of using a routing tree is that it easily eliminates redundant code by moving it to the highest branch, where all routes underneath share it. While it's possible to do something similar in Sinatra using before blocks, you still must specify the path three times instead of just once.

00:21:05.360 In contrast, the shared behavior in Sinatra operates in a separate lexical scope, making it more difficult to understand how it connects to the two routes. Additionally, the use of before blocks negatively impacts performance, as they are processed similarly to route blocks.

00:21:31.760 In Rails, routes are specified in a config route file, and the handling code typically resides in a separate controller class. This setup leads to redundancy since the retrieval of the album from the database must happen in both methods. Rails does allow you to specify a before filter to call a method before the action, but this creates a similar hodgepodge of structure.

00:21:51.200 Sinatra and Rails, along with most other Ruby web frameworks, can use before filters to imitate the shared behavior at the top of a routing tree block. However, a routing tree block operates as standard Ruby code, allowing you to execute arbitrary Ruby code at any point during routing.

00:22:17.760 This is useful for access control; if part of your site permits anonymous access while another does not, you can position the anonymous access route first and then check for a login before proceeding to limit access.

00:22:38.720 In Sinatra, this access control is more convoluted, necessitating whitelisting specific paths or prefixes that allow anonymous access, which becomes cumbersome when numerous paths require whitelisting. Similarly, in Rails, access control is often spread across multiple controllers and requires updates to each controller when adding new routes.

00:23:18.240 By having a routing tree executed directly, you can implement arbitrary logic that influences routing. For example, your routing tree can list albums while routing other requests to an admin rack application. However, achieving this in Sinatra involves adding routes for each request method while also ensuring correct environments for admin requests.

00:23:35.320 These may seem like subtle improvements, but they greatly simplify applications. While it's possible to eliminate redundant code using before filters, most Sinatra apps I've seen do not follow this practice. The typical case involves copying code into routes that require it, leading to needless redundancy.

00:24:05.520 In a process automation system I worked on, originally built using Sinatra, switching to a routing tree eliminated all redundant code by moving it to the highest enclosing branch. The application has 79 total routes with 36 branches in the routing tree, of which 25 contain shared code for the routes.

00:24:34.880 Typically, the shared code retrieves objects from a database or manages access control. This integration means that 70% of the time I am branching in the routing tree, I am eliminating redundant code, which would require 25 separate filters in Sinatra and Rails. Using a routing tree naturally avoids redundant code.

00:25:08.200 To maximize the benefits of a routing tree, you must structure paths to be routing tree-friendly. This is increasingly common today, as naturally routing tree-friendly paths allow the routing tree to retrieve shared information upon branch access.

00:25:51.880 In contrast, Rails's controller action ID routes do not extract as much benefit from a routing tree because the segments defining the album's ID appear at the end of the path, preventing it from realizing routing efficiencies.

00:26:18.320 I've seen this firsthand with applications upgraded from pre-Rails 1.0 to Rails 4.1 while retaining the older path structure. When I switched these applications to a routing tree, they retained redundant code in many routes. If you are creating a new application, or willing to adjust your existing application's path structure, designing routes to effectively leverage routing trees is beneficial.

00:26:51.040 While discussing the advantages of routing trees, it's pertinent to note one trade-off: route introspection. Because routes aren't stored in a data structure within a routing tree—being Ruby code—you cannot introspect routes as you can in most Ruby web frameworks. In my applications, this isn't problematic, but some applications that depend on route introspection will require adjustments.

00:27:29.680 As we conclude, I'll summarize the advantages I've found from using the routing tree approach. First, and most notably, a routing tree approach makes it easy and natural to eliminate redundant code. This simplification might not seem significant, but when converting applications with redundant code, I noticed that changes were often unintentionally done to just one route rather than all relevant routes.

00:27:57.760 By utilizing a routing tree, redundant code is eliminated, and changes can be made in one place, affecting all routes under a branch. Eliminating redundancy enhances readability and thus makes maintenance easier. Having been the only programmer in my workplace for over a decade, maintaining multiple applications, I value ease of maintenance.

00:28:23.840 Moving to a routing tree approach, particularly with Roa, will also enhance your application's performance—not solely because Roa is fast at routing but also due to its lower per-request overhead.

00:28:58.560 If you’re interested in using a routing tree, I highly recommend checking out Roa, the fastest and most featureful Ruby web framework implementing a routing tree. It has a small core that includes features like template rendering, JSON capabilities, template streaming, asset packaging, and even emailing all added through plugins that remain functional as Roa evolves.

00:29:45.760 If you're interested in learning more about Roa, visit the website at rjevans.net, hop on IRC, post in the Google Group, or just come talk to me. That concludes my presentation. Thank you all for listening to me discuss routing trees and other routing approaches. If you have any questions, I'll be happy to answer them now.