Talks

Zeitwerk Internals

Zeitwerk Internals

by Xavier Noria

The video, titled "Zeitwerk Internals" and presented by Xavier Noria at the Ruby Warsaw Community Conference Winter Edition 2024, delves into the inner workings of Zeitwerk, a Ruby gem that facilitates autoloading code in Rails applications and other Ruby projects. The presentation emphasizes understanding Zeitwerk’s functionality through a conceptual overview and implementation details. Here’s a summary of the key points discussed:

  • Introduction to Zeitwerk: Zeitwerk is a library that automates the loading, reloading, and eager loading of Ruby code. It aims for transparency in its operations, allowing projects to resemble typical Ruby applications without explicit autoload calls.
  • Role of Ruby Constants: The concept of constants in Ruby is central to understanding Zeitwerk. Constants in Ruby are tied to classes and modules, diverging from other programming languages where constants are typically treated as variables. This uniqueness allows the organization of code structures in Ruby projects effectively.
  • Autoload Mechanism: The talk covers the autoloading feature in Ruby, which allows the definition and automated loading of constants only when needed, thus optimizing performance. Zeitwerk builds on this API to effectively manage autoloading in larger projects without explicit requires in each file.
  • Implementation of Zeitwerk: Zeitwerk simplifies the autoloading setup in projects by requiring the specification of root directories and invoking the setup method. It utilizes naming conventions to link file structures to Ruby's constant definitions, automating the autoload process.
  • Efficiency in Loading: Zeitwerk is designed to use absolute paths for loading files, which reduces unnecessary lookup times. It effectively organizes files based on their directories and inflects constant names according to Ruby conventions, ensuring that file names correlate correctly with constant definitions.
  • Eager Loading: For eager loading, Zeitwerk uses a breadth-first traversal method to acquire all project constants efficiently. This approach stands out from traditional recursive loading methods, enhancing performance and organization.
  • Q&A Session: Toward the end of the talk, audience questions about autoload methodology and the potential integration of Zeitwerk into core Ruby were addressed, highlighting the intrinsic differences in design philosophies between Zeitwerk and Ruby core.

In conclusion, Zeitwerk emerges as a powerful tool for Ruby developers, optimizing autoloading while abstracting the complexities of Ruby’s object model, leading to enhanced efficiency and organization in code management.

00:00:09.920 Hello! Before we start with the presentation, I would like to mention that I happen to have two copies of an excellent book by Vladimir Defoe. It's a book about how to evolve Rails applications when they grow. It will make you think. I bought my own copy, but then I got a second copy from the publisher for some reason. This is not promotion in any way, but since I have two copies, if there's any junior developer or anyone who just started working with Rails, I would gladly give the book to that person. So, if there's anyone who just started doing Rails and wants this book, please reach out to me later, and I will give it to you.
00:01:10.320 All right, let's go. We are going to talk about Zeitwerk. So, maybe you know that Zeitwerk is this library that allows you to autoload, reload, and eager load Ruby code. It’s the one that is currently handling these tasks in Ruby on Rails, but it can also be used by any Ruby gem or Ruby project, as long as it complies with the naming conventions. You may not even realize that your dependencies are using Zeitwerk; there are over 500 gems currently using it.
00:01:49.000 One of the design goals for this library is transparency. Transparency means that your project should look like an ordinary Ruby project, except for the lack of required calls. It should just look normal. I envisioned an interface that does not reveal anything about Zeitwerk in your code; you don't have to write anything special in your classes. You have a regular Ruby project, and Zeitwerk somehow hooks into the project externally without you seeing it. So, even if you are working on a Rails application that is not using Zeitwerk, like a Rails 5 application in classic mode, some of your dependencies may use Zeitwerk without you knowing.
00:02:24.360 However, I believe it's better to understand your tools and know what they are doing. I understand this kind of interface as a catalyst; I could write this by hand, and it could be a little verbose and fragile, but I have a tool that automates this for me. That’s the idea. It can be really nice to understand what the tool is doing for you; thus, this talk is about explaining how Zeitwerk does its work.
00:03:01.560 But before going into that, Zeitwerk is all about constants in Ruby. Constants in Ruby is a big topic, and we need to be on the same page regarding a few specifics about constants in Ruby to understand the implementation we are going to see later.
00:03:34.760 The first thing is quite unique in Ruby: the class and module keywords store classes and modules in constants. What does that mean? We have here 'class C', and below it, 'class.new' gives you a class object because, in Ruby, classes are objects and modules are objects too. We are storing that in a 'C' constant. The point is that for us Ruby programmers, it is very important to understand that both things are doing the same thing. 'class C' and the 'C' constant assignment are effectively the same.
00:04:09.440 The same applies to modules. When you say 'module M', Ruby, behind the scenes, is instantiating a new module object and storing that object in the 'M' constant. Next, it's important to note that Ruby does not have syntax for types. This is significant because it can be counterintuitive, especially if you come from other programming languages. Let's take a moment to understand this. If we store a zero integer in the 'X' constant and then ask, 'Is it even?', we would expect 'Yes, it is.' This slide is straightforward: we store zero, and of course, zero is even.
00:04:56.880 What’s happening in the second line? 'X' is a constant; it holds the value of zero, which responds to the 'even?' predicate, returning true. So, there is nothing surprising here. However, if we define 'project' the same way—storing a value similar to 'X'—the point is that 'project' is also a constant that stores a class object. When defined, it just behaves as a regular constant. In Ruby, constants belong to classes and modules, and top-level constants belong to 'Object'. This is something very unique about Ruby because, in other programming languages, constants are more trivial and resemble variables.
00:05:58.239 In Ruby, constants aren't just trivial variables; they belong to the object model of Ruby. Each constant explicitly belongs to a class or module object, which is why we have an API to manipulate and ask about this collection for every class or module object. You can ask for a given constant and set the value of that constant in a specific module or class, meaning all the constants you see in your source code belong to some specific class or module object. For example, if we define a class 'C' and ask 'Object' which are your constants, 'C' will be included.
00:06:37.920 This means that when you write a string or an array or any capitalized class name, that is not a type; rather, it is a constant—a regular constant. If you list all the built-in classes and modules, 'C' will be included in object, as will other built-in classes and modules whose constants start with capital letters.
00:07:47.480 Let’s see this through the API. When you define 'class C', you have a class object stored in the 'C' constant, as we saw before. Now, suppose we wanted to assign something using the API. In this case, we can say, 'Object, please store the constant whose name is C,' and assign to that constant the class object we created. Why Object? Because top-level constants belong to the Object.
00:08:36.240 Let’s introduce nesting. We have a module 'M' with a nested class 'C'. This is how Ruby somewhat emulates the concept of namespaces. If we list the top-level constants, 'M' will be included, along with other built-in constants. The module 'M' allows us to create a 'C' constant within the module. When we ask 'M' for its constants, 'C' will be included. The constant 'C' is stored within the module defined by 'M'.
00:09:34.080 Using the API allows us to access the same behavior programmatically. We can reference the 'M' constant we just defined and assign to it a module, enabling nested constants to be structured accordingly. The API is significant because we will see its use in Zeitwerk later.
00:10:19.920 Now, let's discuss module autoload. This is an API that isn't used frequently; it allows for loading constants on demand. The concept of autoloading comes from the notion that constants are a large topic in Ruby. This is why the background module uses autoloading. The syntax we see defines a namespace for background and says to autoload a constant, such as 'Action'. This means that the background doesn’t need to require 'background/action' everywhere it's used. By adding this line, we instruct the Ruby interpreter to load the required string when 'background::action' is triggered for the first time.
00:11:17.200 When 'Action' is called for the first time, Ruby will halt execution, check for the autoload definition, fetch the required file, load it, and if everything is as expected, continue execution. That’s handled by Ruby. Zeitwerk uses module autoload because it gives facilities to autoload your project. However, Zeitwerk itself does not do all the heavy lifting; it is Ruby that manages the loading.
00:12:26.160 Again, let’s see that using the API. In our earlier slide, we saw an autoload call being executed. In this case, the method is invoked on 'self,' which, within the body of a module, corresponds to that module itself. Thus, we are effectively establishing a mechanism for autoloading in the background module.
00:12:55.280 Now that we have the basics, we can begin to explore the source code of Zeitwerk. It’s heavily edited because the existing code could take too much time to cover fully. However, we are going to focus on what I believe are the essential concepts behind it. After this talk, you should have a good understanding of how it works.
00:13:44.080 For instance, when we set up the autoloader, we ignore files but are able to collapse directories. There are various features we won't cover in detail, as they are not central to today’s talk. But we will see how it fundamentally operates. In Rails, you do not even need to interact with the low-level API because the framework does that for you. The essential API is simple; you provide a loader, specify your root directories—referred to as autoload paths in Rails—and call setup. That’s all you need to do.
00:14:50.560 It can be done in any Ruby project, not just Rails, as long as your directory structures adhere to the correct naming conventions. The directories I referred to as 'app/controllers' and 'app/models' simply serve as examples and should be familiar. Think of them as representing the top-level namespace, which corresponds to 'Object.' If we have a user file under 'app/models/user.rb', that file is expected to define the top-level 'User' constant.
00:15:38.240 So, anything that is located directly under 'models' or 'controllers' should define their corresponding top-level constants. Let's walk through a specific scenario. Imagine we have an 'Admin' namespace with a 'UsersController' and a 'RolesController,' alongside a 'User' model and 'Admin::Role.rb'. When we invoke the setup method, Zeitwerk goes to work.
00:16:05.720 It will dynamically define autoloads for us. In the top-level object, it will define autoloads for 'UsersController', 'User', and 'Admin', all corresponding to the files associated with those constants when they are first needed. Essentially, it will set up autoloads so that when we require 'UsersController', it will only load that file when it’s needed, thus optimizing performance.
00:16:49.680 One thing worth mentioning is that Zeitwerk always works with absolute paths, meaning we always know exactly which file is being loaded. This reduces unnecessary lookup times that could be slow if relative paths are used. If you require a relative path, Ruby needs to search through a list of directories to find where the file is located. By using absolute paths, Zeitwerk eliminates this overhead, improving load times.
00:17:36.640 To implement this, Zeitwerk iterates through the root directories, defined as 'app/controllers' and 'app/models'. For each one, it defines autoloads. We utilize a utility that yields only Ruby files and directories to ensure we process only relevant files. If we get a file, we perform one action; if we encounter a directory, we carry out another action.
00:18:19.680 If a file is found, we expect that file to dictate the naming convention. The key aspect here is the inflector, which converts a file name in snake case to a corresponding constant name in camel case. For example, from 'user.rb,' we expect to define the constant 'User'; from 'users_controller.rb’, we derive 'UsersController'. So, this mechanism automatically connects file names to their expected constant names.
00:19:17.680 Next, we call the 'autoload' method, passing in the constant name to define the autoload for that class, along with the absolute path. Additionally, we maintain some housekeeping metadata for tracking purposes.
00:20:05.680 On the other hand, if a directory is encountered, it indicates a namespace that might have multiple files under it. Similar to the previous step, we again derive the inflected name and set up the constant path, but there’s more complexity here. A namespace can spread over multiple directories, and we need to organize that, so we handle each case based on whether we find a file or directory.
00:20:47.960 We repeat this logic iteratively for all directory paths defined as members of the namespace. This organized yet lazy approach allows Zeitwerk to minimize unnecessary loading of unneeded constants while still ensuring quick access when they are first needed.
00:21:45.200 Now, as we summarized earlier, Zeitwerk scans the project structure one level at a time, inflecting constant names based on file structures and defining the autoloads accordingly while being able to trigger loads on demand, ultimately providing significant performance benefits. This means that when those constants are used, the autoload system triggers the necessary loads efficiently without overburdening the system with extra operations.
00:22:45.080 When a file is loaded, Zeitwerk keeps track and validates that it meets the expected definitions. If reloading is enabled, it monitors the necessary references for subsequent unloading. But remember, with constant removal, we achieve an effect similar to code reloading. This is how we handle reloading when needed.
00:23:38.160 Moreover, for eager loading, Zeitwerk main logic implemented works through a breadth-first project traversal, distinguishing itself from more traditional recursive approaches. The methodically structured loading process allows us to grab all project constants in a sensible order without being muddled by recursive loading.
00:24:12.520 In conclusion, Zeitwerk provides a powerful autoloading mechanism that prioritizes project organization and efficiency while abstracting much of the namespace complexity. It effectively uses the metadata built into constant definitions and translates the complexities of Ruby’s object model into a robust, clear structure that we can work with.
00:25:00.759 Thank you all for listening. If you have any questions, please feel free to ask!
00:25:53.360 Audience Member: "I have a couple of questions. One is about the LS method, which lists Ruby files and directories. How do you identify Ruby? Is it by extension?" Xavier: "Yes, by extension." Audience Member: "About the recursive loading—does that mean that it's not a drop-in replacement for other autoloaders?" Xavier: "It’s true that autoloading handles files based on usage while eager loading loads everything upon first access. However, it’s consistent, and the order of loading remains the same across environments. Overall, it’s designed for performance. The order in which things are eager loaded is consistent and based on filenames."
00:26:41.440 Audience Member: "Is there any plan to make autoloading part of core Ruby?" Xavier: "I’m not sure; I’m not part of Ruby core. However, Zeitwerk's approach may not align with Ruby’s design philosophy, as Ruby does not link file names to their definitions. Users can define classes and modules arbitrarily, which contrasts with stronger naming conventions that would be required for autoloading to be core to Ruby." Thank you for your questions and engagement throughout the talk!