Performance

Keynote: Optimization Techniques Used by the Benchmark Winners

Sequel and Roda have dominated TechEmpower's independent benchmarks for Ruby web frameworks for years. This presentation will discuss optimizations that Sequel and Roda use, and how to use similar approaches to improve the performance of your own code.

RubyKaigi 2019 https://rubykaigi.org/2019/presentations/jeremyevans0.html#apr20

RubyKaigi 2019

00:00:00.030 Welcome to the closing keynote of this conference, delivered by an amazing British speaker. Please welcome Mr. Jeremy Evans.
00:00:21.230 Konbanwa (Good evening). Today, I'll be discussing some optimization techniques used in Sequel and Roda.
00:00:27.330 I will provide some background on why these libraries are significantly faster than their alternatives, as shown in TechEmpower's independent benchmarks.
00:00:41.910 TechEmpower has been benchmarking web frameworks in many languages since 2013. They have been benchmarking Rails and Sinatra since the beginning, and in 2017, they started benchmarking Sequel and Roda. Since then, the combination of Sequel and Roda has been leading TechEmpower's benchmarks for Ruby web frameworks.
00:01:07.080 Sequel is a toolkit for database access in Ruby, and Roda is a toolkit for writing web applications in Ruby. While I'm not the original author of either library, I have been maintaining both for quite a long time and added all the optimizations I'll be discussing today.
00:01:25.320 My name is Jeremy Evans. I started writing Ruby libraries in 2005 and began contributing to Ruby development in 2009. My day job involves managing all information technology operations for a small government department. Part of that job includes maintaining applications of all sizes written in Ruby using Sequel and Roda.
00:01:38.430 While I added all the optimizations I'm discussing today to Sequel and Roda, many of these optimizations were learned from others. In my experience, it is often easier to implement optimization approaches that other developers have created than to develop your own. The goal of this presentation is to demonstrate some optimization techniques, principles, and approaches that you can use to improve the performance of your own Ruby code. Hopefully, this will save you time if you wish to optimize your libraries or applications.
00:02:15.960 Now, the first optimization principle is that the fastest code is usually the code that does the least. If you want fast code, as much as possible, avoid unnecessary processing during performance-sensitive code paths. An old Ruby web framework named 'Dam Herb' had a great motto related to this: 'No code is faster than no code.' In other words, if you can achieve the same result without executing any code, any approach that requires executing code will be slower.
00:02:39.220 A major reason Sequel and Roda are faster than alternatives is that they try to execute less code, at least by default. Here is the class method that Sequel uses to create new model objects. The method name is 'call,' which is an unusual choice for a method that creates objects. I will discuss later why 'call' is used as the method name, as it relates to a different optimization.
00:03:04.950 Notice how this method does very little: it takes the values hash retrieved from the database, allocates a new model instance, sets the values hash to an instance variable, and returns the instance. Here's a comparison with a similar instance method that Active Record uses to create instances using hashes retrieved from the database.
00:03:45.750 When you compare these side by side, it should not be a surprise that Sequel is faster; it simply does much less in this performance sensitive code path. So how is Sequel able to avoid executing most of this code? Let's go over the different sections in this method.
00:04:20.790 Active Record starts by initializing all these instance variables, mostly to nil or false. It sets a new record to true initially, but then it sets it back to false because the method is usually only called with one argument. The data record local variable is the second argument, which defaults to false. One controversial optimization technique used by both Sequel and Roda is that they avoid initializing instance variables to nil or false. This optimization can make both Sequel and Roda about 150% faster when you have six instance variables.
00:05:30.300 This optimization improves performance by a few percentage points in real-world benchmarks. However, the reason this optimization is controversial is that accessing an uninitialized instance variable generates a warning in verbose mode. This warning can slow down all instance variable access even if all instance variables are initialized.
00:06:01.320 There is one variable in Active Record that is set to a value that is not nil or false, and that is the start transaction state. This instance variable is only used for transactions, so if you are just retrieving a model instance and not saving it, it is unnecessary to set this instance variable during initialization. Setting it allocates a potentially unnecessary hash, which hurts performance. In similar cases, Sequel will usually delay allocating an instance variable until it is actually needed. This is another general optimization principle: unless there is a high probability that you will need to execute something, it is best to delay execution until you are sure you will need it to avoid doing unnecessary work.
00:06:55.706 After setting the instance variables, Active Record asks its class to define instance methods for all of the model's attributes. This needs to be called for the first instance retrieved because Active Record does not define the attribute methods until then. However, after the first instance has been retrieved, this method just returns without doing anything. So, asking the class to define the attribute methods is slowing down all model instance creation after the first instance.
00:08:01.230 Sequel avoids this performance issue by defining the attribute methods when the model class is created. That way, all model instances can assume that the attribute methods have already been created and don't need to ask the model class for them, speeding up all model instance creation. This represents another general optimization principle: anytime you have code that is called many times, see if you can run that code once instead of many times. Applying this principle allows you to save time while processing requests during application initialization.
00:08:37.020 The last thing that Active Record does during model instance creation is to run the find and initialize hooks for the model instance. However, if the model does not have any find or initialize hooks, then this slows down model instance creation. It would be best to only run this code if the model actually had a need to use find or initialize hooks. Sequel avoids the need for models to check for initialize hooks by moving the initializer to a plugin. Both Sequel and Roda share the idea of doing the minimum work possible by default while being flexible enough to solve all the same problems.
00:09:36.840 Both libraries use similar plugin systems, designed around the same basic idea: each has an empty base class with no class or instance methods; the class is extended with a module for the default class methods, and a module for the default instance methods is included in the class. Loading the plugin extends the class with the plugin's class methods module and includes the plugin's instance methods module in the class. This is how part of Sequel's after_initialize plugin is implemented.
00:10:58.410 The class methods module defines the 'call' method. This method first calls super to get the default behavior, which returns the model instance with a hash of values. Then it calls the after_initialize method to run the initialization hooks on the instance and returns the instance. By using a plugin to implement initialization hooks, Sequel ensures that only the users that need these hooks have to pay the cost for them. Most users do not use initialize hooks and do not have to pay the performance cost.
00:12:31.020 Even for applications that use initialize hooks, they are often only used in a small number of models. With Sequel, you only load the plugin into the models that need the initialize hooks, so it does not slow down initialization for all of your other models. By calling super to get the default behavior, it becomes easy to implement new features using plugins as well as to extract rarely used features into plugins.
00:13:33.420 In both Sequel and Roda, the majority of new features are implemented as plugins. Using plugins for most features not only improves performance but also saves memory by reducing the number of objects allocated. This is another general optimization strategy in Ruby; most objects created in Ruby require time to allocate, time to mark during garbage collection, and time to free, even if they are not used.
00:14:25.880 Sequel and Roda attempt to reduce object allocations, and string allocations are among the easiest to reduce. When you just need to use frozen string literals, both Sequel and Roda have used these since shortly after they were introduced in Ruby 2.3. Frozen string literals did improve performance. Years before their introduction, I had stored all strings used to generate SQL as frozen constants because that was faster than using literal strings. After 2.3 became widely used, I removed the constants and inlined the strings, which improved SQL building by a few percent.
00:15:35.680 This change also made the code much easier to read and made it clearer which strings could be combined. Combining those strings reduced the number of string operations and further improved SQL building performance. Additionally, Sequel tries to improve performance by reducing hash allocations. Previously, Sequel had methods where the default argument value was a hash. The issue with this style of code is that every call to the method with a single argument allocates a hash.
00:16:36.200 While allocating a single hash doesn't sound bad, when you have many methods doing this, you end up with a lot of unnecessary hash allocations. Therefore, Sequel started using an empty frozen hash constant named 'ops.' 'Ops' is used as the default value for most arguments that expect a hash, and using the frozen ops hash is almost twice as fast as allocating a new hash. This approach saves on allocations.
00:17:38.950 Interestingly, both Sequel and Roda use option hashes instead of keyword arguments. One reason for this is performance. From a performance standpoint, keyword arguments may be faster than option hashes in simple cases, but they perform substantially worse when using keyword splats. Using a keyword splat as a method argument or while calling a method can incur significant overhead. To maintain good performance, one must avoid these splats, adding complexity to method calls.
00:18:59.040 Reducing proc allocations is another important performance consideration. In performance-sensitive code, you should avoid allocating procs unless necessary. Here is a simplified example from Roda's 'different params' plugin. Notice that this proc has no dependencies on the surrounding scope. The only local variables accessed are the arguments yielded to the proc.
00:20:33.870 This practice allows it to be extracted into a constant and passed as a block argument to 'hash new.' Moving this block to a constant can make this code over three times faster. Extracting objects to constants, particularly procs, can provide significant performance benefits. Procs are relatively heavy to allocate, so ensuring that they are not created unnecessarily is vital.
00:21:48.050 If you're not using the proc as a block and you're only calling it using the call method, you may be able to avoid allocating procs completely. Sequel data sets support a 'row proc,' which is a callable object. Originally, Sequel models used this approach for sending the data set row proc where 'self' is the model class.
00:22:34.590 However, this caused additional indirection. I improved performance by aliasing the load method to call and then assigning the model class itself as the data sets' row proc. This turned out to be measurably faster and is the reason 'call' is the method used to create new model instances for objects retrieved from the database.
00:23:21.610 This brings me to another optimization principle: to minimize indirection in performance-sensitive code. Sequel has numerous instances where it uses objects that respond to call. It seeks to utilize the fastest implementation with the least amount of indirection. Many of these cases involve converting strings retrieved from the database to the appropriate Ruby type.
00:24:26.100 If you need a callable for converting a string to an integer, it may seem natural to use a lambda. Previously, Sequel utilized something similar for type conversion. However, calling the integer method inside the lambda introduces extra indirection. Instead, creating a method object for the integer retains the efficiency of calling. Though this approach is about ten percent faster than using the lambda, there's still room for improvement.
00:25:38.390 It's faster to create a plain object and then define a singleton call method. However, there is still that indirection where you're calling the integer method from inside the call method. Removing that indirection by aliasing the integer method to call greatly increases performance. I made this change in Sequel recently, which improved some real-world benchmarks of Sequel's SQLite adapter by over ten percent.
00:26:56.590 Another important consideration is how you define methods in Ruby can affect their performance. When defining a method, you would typically use 'def' to create it. However, using 'define method' and passing a block is notably slower than using 'def', about 50% slower.
00:28:09.000 In situations like dynamically defining methods for getter and setter methods for model columns, you need to balance between runtime flexibility and performance. The approach shown results in methods that are the fastest to call while utilizing class eval and def.
00:29:14.90 In Sequel, when a column name includes spaces, the use of define will cause syntax errors, leading to potential vulnerabilities. As a result, for valid Ruby column names or when method names can be defined with 'def,' we use 'define method.' This allows the code to remain efficient. Separating common cases from complex ones is crucial as it allows the use of the faster method in scenarios where it applies.
00:30:59.760 There is a preference for 'def' over 'define method' for performance-sensitive code due to the speed of method calls. However, 'define method' is useful for dynamic method definition, especially on naming conventions that don't fit the 'def' criteria.
00:31:54.900 In Sequel, caching mechanisms and creating cached datasets significantly enhance performance. The approach utilizes global state that is immutable to avoid common pitfalls related to thread safety while still allowing mutable local states. The dataset cache is private and only accessible through mutex-protected methods.
00:32:55.630 Sequel dramatically improved performance in generating SQL by caching, which reduced repeated computation over time by leveraging the unchanging nature of database table structures and common symbols in queries.
00:34:06.690 The success of Sequel's caching mechanisms can be attributed to its ability to recognize common scenarios, making the most-used paths both efficient and reliable. Through constant refinement, overlapping concepts of method creation and instantiation with caching maintain a strong performance baseline.
00:35:12.790 In summary, very few optimizations should be pursued without testing and profiling for efficacy. A wide range of tools exist to facilitate this, from memory profiling tools to performance libraries. For those new to optimization or seeking to improve existing code, starting small and maintaining an eye on efficiency can yield results over time.
00:36:52.700 Optimizing code is as much about understanding your audience as it is about improving performance—providing a better experience for users while preserving readability and maintainability is key. In closing, my goal is to encourage you to be thoughtful about your optimizations and to participate in a community effort to improve Ruby's performance collectively.
00:38:01.990 Now, I am sure some of you have questions. Please ask them now. Thank you for your attention.
00:39:21.010 Karen, thank you for your question. Yes, in Active Record, initializing the first one is intended to delay loading the schema. Sequel handles this differently by connecting immediately upon declaration.
00:40:07.580 Active Record can adopt an approach where they lazy-load schemas to enhance startup times for common usage patterns.
00:40:25.790 Yes, I believe there are opportunities for Ruby implementations to improve performance aspects such as keyword arguments. Developers continually search for viable solutions to optimize overhead, and I share this sentiment as the community actively seeks ways to refine performance practices.
00:41:28.710 Thank you for your insightful feedback. Are there any additional questions? Thank you for the opportunity to discuss these topics.