rubyday 2015

require() bombed my multi-threaded app!

require() bombed my multi-threaded app!

by Jonathan Martin

In Jonathan Martin's talk titled "require() bombed my multi-threaded app!" presented at RubyDay 2015 in Turin, he addresses Ruby's code loading mechanisms and the common challenges that developers face, especially those coming from a Rails background. Martin highlights how Ruby's autoloading system, particularly in multi-threaded environments, can lead to confusing behaviors and errors, especially during high-stress deployments.

Key points discussed in the talk include:
- Introduction to Ruby Code Loading:
- Ruby developers often have a shallow understanding of code loading mechanisms.
- The talk addresses the evolution of Ruby's autoloading system, especially the improvements in Ruby 2.x that made it thread-safe.

  • Different Code Loading Methods:

    • Martin breaks down methods like eval, load, and require, explaining their functions and differences.
    • He uses a fictional recipe app called "Cookery" to demonstrate these concepts practically.
  • Examples with Code Demonstrations:

    • The session includes various coding examples that visibly illustrate how require, load, and autoload work in Ruby.
    • Martin emphasizes the importance of understanding file paths and the implications of multiple load attempts, detailing how this can create constant redefinition warnings.
  • ActiveSupport Autoloading:

    • Martin discusses ActiveSupport's enhancements to Ruby's autoloading, which optimize how files are loaded without explicit requirements.
    • He highlights the significance of conventions in avoiding bugs, especially in production environments, and the necessity of following these guidelines strictly.
  • Production vs. Development:

    • The talk contrasts how code loading works in development versus production environments, noting that eager loading is vital to maintain performance and stability as applications scale.
  • Common Pitfalls and Best Practices:

    • Emphasizes the need for clean code and maintaining clarity around dependencies.
    • Martin concludes with advice on how to manage code loading effectively to prevent unnecessary errors that could complicate testing and development cycles.

Ultimately, Martin's engaging presentation underscores the complexities that can arise from Ruby's code loading mechanisms, particularly within multi-threading contexts, and stresses the importance of adherence to established conventions and best practices for smooth Ruby development.

00:00:13.610 Buongiorno! That's about all the Italian I know. According to Duolingo, I have about 200 words, so I'm working on it. Thank you all for the warm Italian welcome and for coming out to my talk. I have to beg your forgiveness in advance because I've been a really bad Italian tourist already, and this is only my second day here.
00:00:21.210 I got here yesterday, and the first thing I did was to fail to pay my toll on the autostrada. Then, I went to a British pub for dinner, which wasn't quite subtle. And thirdly, I know this is a Ruby conference, but this is my favorite shirt, and I always give my talks in this yellow shirt. I hope you'll forgive me; it's a JavaScript shirt.
00:00:38.969 But I promise you, there is not a line of JavaScript in this talk. This talk is by me; it was a strange way to introduce the first slide, but I'm Jonathan Martin from the USA, not from Italy. I like Twitter, by the way, I'm @nibbler for those of you who like to tweet. I work at a place called Big Nerd Ranch, which is in Georgia, USA. We are a consultancy; we write books, create apps, and teach intensive mobile developer boot camps out in the middle of nowhere, usually in the woods, so you can get away from all distractions.
00:01:23.820 So, come say hi to me after the talk, and I’ll be happy to take all your tips on how to navigate the Italian roadways, at no charge, because I have no more euros after paying for the autostrada. Thank you! So, a while ago, actually it's been quite a while ago, about a year ago, I was working on a talk that was going to be about code loading in multi-threaded environments, namely Sidekiq jobs, or for those of you who have used Rescue, a similar situation.
00:01:50.730 At that time, Ruby's autoloading system was not thread-safe, and Rails had a really nasty habit of crashing randomly in our Sidekiq workers, usually with a circular dependency exception. You all probably have run into that on random occasions, typically when you were running your test suite, and you eventually found the one line that caused the error, which made no sense. Today's talk was initially going to replicate that issue with multi-threading and code loading. It was all super interesting until Ruby 2.x fixed that issue.
00:02:44.880 Later versions of Ruby fixed autoloading to be thread-safe, and then ActiveSupport came along and fixed its own issues with the addition of thread locks. So, unfortunately, a great part of the humorous aspect of my talk is no longer relevant. This means that with the time we have, we actually get to discuss the useful part of the stock, which covers all of Ruby's code loading mechanisms and the best practices around those mechanisms.
00:03:18.750 Newcomers and seasoned Rubyists often have a very shallow understanding of how Ruby's magical code loading works out of the box, especially if they come from a Rails background. In Rails, you don't normally have to think about code loading. You can just do 'rails generate app' and it just works out of the box. You start typing code here, and it shows up there; you don’t even have to think about it.
00:04:03.030 That's nice, as it keeps the barrier to entry really low. But as soon as a new developer or someone from another language background stubs their toe on something, especially during a high-stress deployment, they tend to break things. They cry about it, and usually, after a good cry, they go to Twitter to publicize their stubbed toe, complete with all the right hashtags.
00:04:34.780 This is especially annoying, and then they criticize Ruby's code loading and autoloading mechanisms. They also like to rant about the confusing differences between 'require' and 'load', 'require dependency' and 'require_relative'. They question which one of these am I supposed to use? They take it a step further, often posting pictures that graphically depict their feelings.
00:05:01.660 This was my favorite reaction; it showed up in my feed with the hashtags #Ruby and #autoload. Now, when they stop getting retweets and likes, they want to give a talk about those grievances and question if Matz is really nice. This all bothered me until about a year ago when I started encountering these issues myself. So, I took to Twitter while my production Sidekiq workers were completely up in flames.
00:05:48.880 I composed my 105-character tweet with all the correct hashtags to ensure it could become a viral anti-Ruby campaign. At the end of the day, I decided not to hit the publish button, so you will not find this tweet on the Internet. Instead, I realized that I hadn’t actually taken the time to learn how Ruby's autoloading works. Perhaps my grievances were due to my own laziness regarding that low barrier to entry, not because Matz is mean.
00:06:05.410 To preserve Matz's reputation as an awesome guy, we're going to revisit Ruby's various code loading mechanisms in depth. After that, you can form your own educated opinion about best practices, and then you can go tweet at Matz that he hates autoloading too, so you’d be in the right camp.
00:06:23.920 First off, we’ll break down the built-in 'eval', 'load', and 'require' methods. Some of you, especially if you’ve watched Ruby podcasts, have probably seen a similar breakdown before. Huge thanks to Abdi Graham, who runs that particular podcast for a fantastic systemic breakdown.
00:06:59.790 We’ll be using a recipe app called 'Cookery', which will eventually be written in Ruby, and we’ll tack on some Rails stuff later. It’s a very simple project with one file, called 'main.rb', as our entry point for loading the entirety of the application. We’ll spend most of our time there and also be nesting our main app code under the 'Cookery' module, where we have a 'Recipe' and 'Ingredient' class. I chose these words because they are the only words I know in Italian, so if we have Q&A later, I’ll know how to say these words.
00:07:45.210 Before we even get into files, we’re going to start in a plain old IRB session. Thus, we can see that Ruby being an interpreted scripting language means we should be able to fire up an IRB and start coding away with little difference between that and writing a file and running it with Ruby commands. First, we will execute the 'to_s' method in a REPL with no receiver. In other words, we don't see 'self.' or some object, dot.
00:08:10.740 It’s implicitly calling 'self.to_s'. This is something we’re used to in Ruby. But if we take it a step further, we execute 'puts' with 'self' so we call 'self.to_s’. This method comes from the object instance. Whenever we run our code in a REPL, all of our code gets executed with whatever the value of 'self' is initialized to an object instance.
00:09:11.250 All objects inherit from the Kernel class, and the Kernel class has a private method called 'puts', as well as another method called 'eval'. Unsurprisingly, 'eval' can take a string of Ruby code and evaluate it in the current context, just as if we had typed 'to_s' manually. However, 'eval' can do a lot more and can open us up to security vulnerabilities by also allowing us to pass a 'binding' object as a second argument.
00:09:56.200 This binding object is not the same as passing in another object. A binding object wraps around the values of what 'self' is and what local variables can be accessed. This allows us to execute code in a completely different context. For example, if we want to execute our little 'puts.to_s' inside the 'Cookery' class, we can leak out the binding.
00:10:43.340 This code runs within that binding and instead of just printing out 'main', it actually prints a nice string now. However, we don’t want to type all of our code in an 'eval', so we would move it into a separate file. In our 'cookery/recipe' class, it just prints a message at the very top when the file is first loaded, which is useful for debugging.
00:10:59.700 Notice the file paths I’ve included at the bottom of the file for reference. In our 'main.rb' file, we’re going to load that class. To load it, we’ll take the '__FILE__' variable, expand that to figure out what directory this file is located in, then tack on a relative path to that recipe, which will be 'cookery/recipe.rb'. Now, 'eval' just takes in Ruby code.
00:11:52.840 So we’ll call 'File.read' to get those contents, passing in 'binding'. Notice we're just calling 'binding'; we aren't getting it from somewhere else. This is because the Kernel also has a binding method that returns the binding we are currently in, which is the default whenever 'eval' is called. The third argument will be the relative path, and the fourth argument will be the line number where this file starts on, which is typically 1.
00:12:45.460 The only useful part about this is stack traces whenever our code fails. If a method happens to belong to the 'Recipe' class, we now know where to look, as our logs will indicate this error occurred in 'Cookery::Recipe'. If we run the code, we expect to see that it prints out the helpful message when we're debugging whether or not a file was loaded.
00:13:23.600 While it's somewhat irritating to have to do all of that from scratch, we do have the 'load' method, also a Kernel method. 'Kernel.load' takes an absolute file path and loads that file, executing it in the current binding. However, it can be irritating to build that path from scratch, so we could tell Ruby about directories where our code exists and give it just the name of a file.
00:14:08.750 Much like Bash shell, we maintain a load path, an array of all directories where we find executable binaries. Ruby is no different. We can add any directory of Ruby files to this load path global variable and call 'Kernel.load' with 'cookery/recipe.rb'. This is a significant improvement over 'eval', but there are many ways we could require the same file.
00:14:53.270 For instance, we might have several entry points. Not only do we have 'main.rb', but we could also have a test file needing to run and multiple test files that need to load the 'Recipe' class. This means that in the same REPL session, we might load the same file multiple times.
00:15:44.830 Now, this doesn't seem awful, but it can cause issues, particularly constant redefinition warnings. In Ruby, we want to load a source code file once, not two or three times. The way we avoid this is to keep track of whenever we load a file in memory. So next time we try to load that file, we know that it’s already loaded and don't need to do it again.
00:16:32.890 In Ruby, we have a method called 'Kernel.require', which solves this issue by maintaining a list of all Ruby files that have already been loaded. The name of the global variable is somewhat strange; it’s called '$LOADED_FEATURES' and not '$LOADED_FILES'. Require loads code into memory. We set the load path using the unfortunately named 'unshift'.
00:17:22.480 'Unshift' adds an entry to the top of the load path, as Ruby searches from top to bottom. Thus, we can require the file and observe that our loaded features list now contains the absolute path to the file we just loaded. Despite the name 'loaded features,' you can actually load more than just Ruby files; you can load up '.so' files, those static objects, or dynamically linked libraries (.dll) on Windows.
00:18:07.820 One point to note is that loaded features does not store the file path you pass into require; it resolves that path to an absolute file path, meaning you can require the same file in different ways, but it will load it only once. This is handy when different files may require the same file differently, but you ensure that the file only loads once. Now, we're ready to move to some more contentious parts of Ruby.
00:18:46.060 From now on, we will always assume that this is in 'main.rb', and we'll make sure to load ActiveSupport. ActiveSupport is part of Rails, but it can be used in other projects. It adds extensions to core functionalities, like auto-loading, making it more pleasant to work with.
00:19:30.050 The auto-loading mechanism we are all accustomed to in Rails allows you to pass in directories where your code is located. The default Ruby autoload mechanism does not work this way, so ActiveSupport provides that for us, which is incredibly useful. We will add the directory where 'main.rb' is located to the load path, so it’s easy to require 'cookery/recipe' or 'cookery/ingredients', and at the very end, we’ll print out what the last loaded feature was.
00:20:19.680 Now we’re ready to discuss the most contentious code loading mechanism in all of Ruby, which is autoloading. If you ask Matz or google his thoughts on autoloading, you’ll find a lot of very heretical commentary from him expressing his hatred for it, which is unfortunate because autoload is an excellent feature. It was quite tricky until Ruby 2, when we got multithreading support.
00:21:13.830 ActiveSupport's autoloading lets you specify directories to define classes or constants. You can reference any of those constants or classes without an explicit require statement. For example, at the very top of the file, we add the root directory, which is the parent directory of 'main.rb', to our autoload paths.
00:21:57.680 From here, we can directly reference 'Cookery::Recipe' without needing a require or load statement; simply referencing the constant triggers autoloading. This does print out a message right after we reference 'Cookery::Recipe', indicating that the class was loaded.
00:22:59.340 However, autoloading relies heavily on conventions. If you break even one convention, you will end up encountering the craziest bug issues in your production Sidekiq workers. This isn't a great way to spend your time, especially since Ruby is meant to be a developer-friendly language.
00:23:34.480 To be happy developers, we need to follow these conventions. Imagine you replace your double colons with a forward slash, and you will likely avoid 90% of autoloading issues. If a constant has yet to be loaded when you reference it, ActiveSupport will look through all the autoload paths and try to append the relative path by substituting double colons for forward slashes.
00:24:29.420 This sounds like Kernel require since we have a load path of sorts, but it is auto load paths instead. By default, autoloading uses the Kernel.load method. The first line I've commented out is the default behavior for autoloading, which is to use Kernel.load instead of Kernel.require.
00:25:27.920 If later on we decide to implement manual requires in our code, we’d have to change the loading mechanism because failing to do so will result in an issue where if we reference or use autoloading to load 'Cookery::Recipe' but then do a manual require at the bottom, it will load that file twice.
00:26:19.920 As a result, it will print that loading message again, indicating that it's evaluating the file a second time. The challenge here is that Kernel.load doesn’t check the loaded features global variable; it simply loads it each time. To address this issue, we need to switch the mechanism to require.
00:27:16.560 After we do that, the next time we manually require the same file, it ignores the duplicate load. It results in a silent operation with no output, so using require seems ideal for preventing multiple loads. However, why does ActiveSupport’s dependencies default to the load mechanism?
00:28:06.210 Another notable autoloading mechanism is part of Ruby core. If you check Matz’s tickets or tweets on the subject, they refer to Ruby's built-in autoload feature. This involves another method on Kernel called 'autoload', inherited by all modules, allowing specification of constants accessible in that class.
00:28:56.120 For instance, 'Cookery::Recipe' requires you to specify a constant that can be auto-loaded, allowing access without requiring a file path, making it tighter. However, this built-in autoload is falling out of favor, particularly since Matz dislikes it, and there's speculation it may be deprecated in Ruby 3. Still, autoload works nicely in some gems, though they've faced criticism.
00:29:46.640 Next, we're going to set up our Cookery module. We’ll extend that module with ActiveSupport extensions and call autoload. At the same time, in 'main.rb', we need to require that entry point, as the Cookery module serves as our gateway to Recipe.
00:30:30.120 Now we can call 'Cookery::Recipe'; since the module is already loaded into memory, autoloading sees that we can access Recipe on it and understand where to load that file. This results in a loading message appearing. Notably, if we look at '$LOADED_FEATURES', we see it’s included there, indicating that autoloading built directly into Ruby core always uses require.
00:31:31.570 This is a significant distinction because the autoload mechanism uses load by default, while this mechanism used in gems always requires. If you explore the source code, you’ll find no way to swap it out without monkey-patching. Breaking conventions leads to exceptions, so auto-loading needs these conventions in place to function properly.
00:32:18.430 In our Cookery::Recipe file, we’ll remove that module wrapping. This makes for a fun error. When you try to load 'Cookery::Recipe', it does find the file, but you'll receive the error: 'unable to autoload constant Cookery::Recipe.' This indicates that while it found the path, it cannot find the constant definition.
00:33:07.780 You typically won’t encounter this issue in a simple example, but in larger systems with many ways to load files, it becomes prominent. The good news is autoloading does throw an exception to prevent a nil constant from being loaded, rather than quietly allowing it to fail.
00:33:59.360 Additionally, autoload paths usually reflect two directories. Thus, if you have 'root/cookery' and 'root', you'll run into a problem, as one directory contains the files of the other. This design leads you to access the same class from different paths, which isn't desired.
00:34:43.410 Referencing 'Recipe' without its module works perfectly. It looks in all specified directories and locates it within 'root/cookery'. It attempts to load it and succeeds. When trying to reference with 'Cookery::Recipe', however, the failure shows that 'Kernel.require' avoided loading the file twice.
00:35:31.830 Although it printed the loaded message the first time, subsequent access does not trigger a second load, confirming the requirement mechanism's role in preventing multiple loads. Unfortunately, this design can create frustrating scenarios where even small lapses can result in a crash.
00:36:22.050 Developers often encounter the infamous constant redefinition warning due to mixing autoload with require. If you attempt to set up 'Cookery::Recipe', we’ll add a constant named 'constant', which is a simple string frozen. We’ll then revert to the autoloading so we can examine the reloaded feature.
00:37:16.060 To our surprise, 'loaded features' still points to what we last required—ActiveSupport. Autoloading steps in, so we have an interesting phenomenon: if we attempt to auto-load with a constant already there, it fires up an error, thwarting circular dependencies.
00:38:05.370 Autoloading's safety helps prevent an infinite loop situation in instances where you may want circular dependencies. Although it's common in modular plugin systems, you might need to reference all loaded plugins in a single array.
00:38:50.660 If you go back to 'Cookery::Recipe' and change the 'constant' name to 'Ingredient', it will print the loading message as the engine tries to access the associated class. When evaluating 'Ingredient', we find it points back to 'Recipe', creating that problematic circular dependency.
00:39:38.660 As a result, autoloading saves us by preventing an infinite loop error. However, one can argue that constant redefinition warnings are hard to tackle, especially when different developers partake in the same codebase, with some favoring explicit requires while others prefer autoloading.
00:40:22.890 You often have to choose between two loading mechanisms, but removing explicit require statements can simplify code so it will work again. If we revert back to Kernel.load, everything functions seamlessly.
00:41:03.120 Now, let's shift from plain Ruby to a Rails-like environment. We'll dive into a Rails application built from this Cookery model. The code loading system that Rails implements utilizes ActiveSupport's autoload paths.
00:41:50.770 However, on closer inspection, you'll find an entry called auto load paths, which are supposed to manage code that's set to autoload. Unfortunately, they don’t work in production environments and rely on eager load paths, which loads all necessary files upfront.
00:42:37.350 Development handles code referencing cleanly, but in production, missing the eager load paths can cause redundant loading of the same file multiple times. To mitigate this, we must use eager load paths to avoid performance degradation during web requests.
00:43:27.970 Additionally, former Java designers might frequently add eager loaders and nested queries that quickly cause headaches. A solid best practice is to avoid adding nested directories to autoload paths and maintain clear delineation.
00:44:14.260 Finally, it’s crucial to keep your code clean, eliminating ambiguity around constant redefinition. Using dependency management where necessary streamlines loading functionalities and affirms your code's performance.
00:44:30.270 To summarize, autoload works beautifully but breaks easily. Being diligent with conventions helps avoid many pitfalls. Test-driven development shouldn't suffer due to incorrect loading practices, as inconsistent loading can substantially extend testing cycles.
00:45:07.320 Therefore, when writing or refactoring code, check for instances of inconsistent loading and ensure dependencies are clear and follow proper usage of autoload to prevent unnecessary errors.