Talks

Zeitwerk Internals

Zeitwerk Internals

by Xavier Noria

The video titled 'Zeitwerk Internals' presented by Xavier Noria at Rails World 2023 delves into the workings of Zeitwerk, a Ruby gem that manages autoloading, reloading, and eager loading in modern Rails applications. Xavier emphasizes the design goal of transparency; Zeitwerk operates behind the scenes without being visible in the user’s code. The talk is structured in two main sections: an overview of key Ruby concepts related to constants and an in-depth look at Zeitwerk's implementation.

Key points discussed include:
- Definition and Purpose: Zeitwerk serves as a code loader, technically functioning as a constant loader. It is designed to enhance development without being intrusive.
- Constants in Ruby: Zeitwerk relies heavily on the concept of constants, as Ruby uses class and module keywords to create constants that hold class and module objects.
- Loading Mechanism: The talk explains that constants belong to class and module objects, illustrating how Ruby's namespace emulation impacts how constants are accessed and stored.
- Autoloading Feature: Zeitwerk utilizes Ruby's built-in autoload method, allowing classes and modules to be loaded on demand, facilitating a cleaner codebase by reducing the need for explicit requires.
- Implementation Details: Zeitwerk operates by scanning specified directories and defining autoloads for files it encounters, organizing them into a registry that tracks which autoloads correspond to which paths. This mechanism supports efficient resource management by loading constants only when they're referenced.
- Customizability: Although reloading is off by default in Zeitwerk, developers can enable it explicitly, granting further control over resource management in development environments.
- Eager Loading: Zeitwerk also supports eager loading through an efficient traversal system that ensures necessary dependencies are preloaded without excessive calls.

The overall conclusion of Xavier's presentation is that understanding the internal mechanisms of Zeitwerk can significantly enhance a developer's experience by better utilizing the autoloading system of Ruby in Rails applications, leading to improved efficiency and cleaner code organization.

00:00:15.480 Yeah, okay. So, Zeitwerk is a Ruby library that provides autoloading, reloading, and eager loading for Ruby projects.
00:00:23.560 In case your OCD is suffering right now because 'autoloading' is one word and 'reloading' is two words, I feel your pain. That's just how it is.
00:00:30.480 But that is how current Rails applications do these things using Zeitwerk behind the scenes.
00:00:38.960 If you are new to Rails, you might not even notice, because transparency has been a key design goal of this library.
00:00:44.360 What this means is that your Ruby code has no trace of Zeitwerk. You write your normal application code, and Zeitwerk is nowhere to be seen.
00:00:51.320 It’s even more than that; there are several hundred gems nowadays that utilize Zeitwerk, so perhaps some of your dependencies are using it too, without you noticing.
00:00:57.000 Transparency is also translated to the user of the code in the case of the library, so it doesn’t show up in the public interface.
00:01:03.320 The gem may indicate to use this gem; you know, load this entry point, but it doesn’t even publish that it’s using Zeitwerk internally.
00:01:09.520 Transparency has been a goal; it’s nowhere to be seen. However, I don’t like the word 'magic'.
00:01:16.880 Instead of magic, I prefer to think of it as enhancing something, accelerating something, serving as a catalyst.
00:01:21.920 The point is that the library is going to do something that you could perhaps do manually, but it’s not meant for you to deal with; it acts as a catalyst.
00:01:27.320 Your user experience is improved if you know, in addition to the features you receive, how it is accomplishing that.
00:01:35.479 So, this talk aims to explain how it works.
00:01:43.520 You can think of this library as a code loader, but technically, it is a constant loader.
00:01:49.120 To understand how Zeitwerk works, we need to have a common understanding of certain key aspects of constants in Ruby.
00:01:55.319 We’ll have two sections in this talk. The first section will present common observations, and the second section will focus on understanding how Zeitwerk is implemented.
00:02:01.280 The first thing to note is that the class and module keywords create constants.
00:02:07.239 They store the classes and modules that get defined in constants. This is a unique feature of the Ruby programming language.
00:02:15.280 For instance, consider two pieces of code: the first is the standard declaration—'class C'—where we define a class called 'C'.
00:02:21.239 However, the class keyword does the same thing as 'class.new', creating a class object and assigning that object to the 'C' constant.
00:02:26.760 The same applies to modules; if we define a module called 'M', that’s equivalent to creating a module object and assigning it to the constant 'M'.
00:02:32.000 This brings me to the second important point: Ruby does not have syntax for types.
00:02:39.800 Let's take a look at a simple slide: we assign '0' to the constant 'X' and then ask if 'X' is even.
00:02:46.800 What happens here is that 'X' is an expression, evaluating to the value it contains, just like in a variable.
00:02:52.600 In this case, '0', when queried with 'even?', responds affirmatively.
00:02:58.639 Now, this slide is not surprising; it’s what we expect.
00:03:06.000 But the next slide is crucial to understand the presentation.
00:03:12.560 When we write something like 'Project.find', it's important to remember that 'Project' is a constant.
00:03:23.120 When we define this class with the class keyword, a constant named 'Project' is created.
00:03:30.399 A class object is stored in this constant; thus, 'Project' is simply a constant.
00:03:36.200 This is an important technical detail necessary to understand how Zeitwerk works.
00:03:43.120 The next observation is that constants belong to class and module objects, resembling how Ruby emulates the concept of namespacing.
00:03:50.639 Classes or module objects have an internal constant table mapping constant names to their values.
00:03:57.280 In the case of top-level constants, the class or module object is stored in an object.
00:04:05.760 The Module class has an API to manipulate this constant table; you can set, get, remove, or list constants.
00:04:12.560 When we define class 'C', creating a constant, if we list the constants of an object, we see the 'C' constant included.
00:04:19.600 When writing 'String' for instance, it's not a type; 'String' is a constant, a top-level constant, belonging to the object.
00:04:25.840 So if we see the actual listing, it would include everything, but we only highlighted 'C' here.
00:04:32.280 Now I want to illustrate the same thing using the API.
00:04:40.320 As far as this talk is concerned, these three methods are effectively doing the same thing: they create a class object and store it in the constant 'C' at the top level.
00:04:48.319 In the last example, we’re creating a constant using 'Module.const_set'; the second argument is the value.
00:04:55.680 The second argument is 'Class.new', which creates a class object stored in the 'C' constant.
00:05:02.560 Again, all three methods achieve the same goal.
00:05:08.160 Now, let’s introduce namespaces. We have a top-level module 'M' and a nested class 'C'; this is how to emulate namespaces.
00:05:15.080 If we list the constants of an object, we will see 'M' at the top level, but 'C' will not be there; 'C' is within the constants of 'M'.
00:05:22.560 Whenever we find a constant in Ruby, we need to think about where it’s stored— in which class or module object, which namespace.
00:05:30.630 Ruby has lookup algorithms that we won’t cover here because they are not relevant to this presentation.
00:05:38.399 Let’s stick with the API; this is the same idea using the API.
00:05:44.639 We create a module object in the first line and assign that object to the 'M' constant in the second line.
00:05:52.560 How can we refer to 'M' if we did not use the module keyword and did not assign it using the equal sign?
00:05:58.920 We can refer to it because there’s nothing else; this constant is stored in the class and module objects.
00:06:05.320 Ruby looks for constants in certain places; one of them is the object since we just created that constant in the object.
00:06:13.600 We can then call 'const_set' to define the class.
00:06:19.680 This code is doing the same thing that the first example does using the API.
00:06:27.360 The last remark is about a method called 'autoload' which resides in the module.
00:06:34.480 This allows you to load constants on demand; let’s see an example.
00:06:42.240 This is a real case from Background; if you open the entry point of Background, you will see it defines a namespace for background.
00:06:49.839 Then it calls 'autoload :action' with a string, indicating that whenever you refer to 'Background::Action', it should autoload.
00:06:55.400 So, if the constant is already defined, there’s no issue; however, if it’s not, it will trigger the autoload.
00:07:01.760 It will issue a 'require' on the second argument, so it will require 'background/action'.
00:07:08.480 Once the 'require' returns normally, the constant will be defined, and the process continues execution.
00:07:14.960 This is done seamlessly by Ruby itself.
00:07:22.240 Why does Background do this? One of the benefits is that you no longer have to include requires in your code, as Ruby will autoload on demand.
00:07:29.680 That's why we don't need to explicitly put requires in Rails applications because Zeitwerk is based on this API.
00:07:35.839 Let’s explore the same concept from a different perspective; this does the same thing but explicitly.
00:07:42.560 In the previous slide, 'autoload' is a method receiving two arguments invoked in the context of a module.
00:07:48.840 It’s similar— the module expects the name of the constant and a string to be required.
00:07:55.920 We are ready to see how Zeitwerk works; the following slides will show code that has been heavily edited.
00:08:03.680 The library is not large; it’s about 1,000 lines, but there’s much more detail to examine.
00:08:10.880 The focus here is on essential ideas, which create the context for understanding the library's implementation.
00:08:18.040 This is how you create a loader using the generic API: you instantiate an object and then push directories.
00:08:26.880 These directories represent the top-level namespace for loading.
00:08:34.480 For example, we can push directories like 'app/controllers' and 'app/models' which represent the top-level namespace.
00:08:43.200 Once configured, you can call 'setup' and use everything in your project; no additional setup is necessary.
00:08:50.639 So what does 'setup' actually do?
00:08:58.240 Let’s consider a concrete example: suppose our project has a users controller in 'app/controllers', and within 'app/models', there are user and admin roles.
00:09:05.120 Zeitwerk will simply define autoloads for these resources.
00:09:12.640 It sets three autoloads: one for the users controller, one for the user model, and one for the admin model.
00:09:19.520 Every time you refer to the users controller and it’s not loaded, Zeitwerk will load it for you.
00:09:25.840 How is this accomplished? It’s straightforward; Zeitwerk iterates through these root directories.
00:09:33.360 It calls a method that tells Zeitwerk to define necessary autoloads in this directory.
00:09:40.080 This method takes into account that the current directory represents a namespace.
00:09:46.880 The root namespace can be considered as the object itself.
00:09:54.480 As we iterate through both 'app/controllers' and 'app/models', we perform the same action for each directory.
00:10:02.319 Internally, there’s a utility that yields to the block only the elements that concern the loader.
00:10:09.520 It will only consider Ruby files with the '.rb' extension or directories, ignoring everything else.
00:10:15.680 Once we’ve filtered out everything irrelevant, we process what we found.
00:10:22.960 In the block, we’re in two scenarios: either we found a file or a directory.
00:10:28.560 When we encounter files, we call the name. If we got 'user', we convert it to a constant name.
00:10:35.519 This yields a capitalized version, 'User', which becomes the constant name.
00:10:41.840 This transformation uses an inflector internal to the loader, independent of any other transformer.
00:10:48.400 Next, we’ll make the autoload call; that's how Background operates, but Zeitwerk is managing it for us.
00:10:55.440 We say that the namespace is the current object, and we autoload 'User' with the absolute path to 'user.rb'.
00:11:01.440 Why use an absolute path? Zeitwerk prioritizes performance.
00:11:07.840 When autoloading is triggered, passing an absolute path means that 'require' goes straight to the file.
00:11:15.200 We also manage the autoloads we set in our internal registry.
00:11:22.080 The loader remembers that it is responsible for this path.
00:11:30.400 Let’s talk about directories. The same logic applies–if it's our first time encountering one of them, we set an autoload.
00:11:37.440 For example, if we see an 'admin' directory while iterating through 'app/controllers', we set autoload for it.
00:11:44.080 If we encounter another 'admin' in the 'app/models' directory, the loader skips it since it was already registered.
00:11:51.839 This registry allows the loader to manage multiple directories and their autoload states.
00:11:58.080 Now that the autoloads have been defined, they are not yet triggered; nothing is loaded.
00:12:05.920 The setup returns, and the loader stops, awaiting action.
00:12:13.560 Autoloads will only trigger when the corresponding constant is referenced.
00:12:20.960 When you reference a constant that has an autoload set, Ruby will trigger it for you.
00:12:28.040 This design choice is intentional because it leverages built-in Ruby functionality.
00:12:36.120 The process of detecting and requiring files is native to Ruby, making it seamless.
00:12:43.440 When a constant is referenced, Ruby goes through the lookup algorithm to find it.
00:12:51.120 The key advantage is that autoloads are a thin wrapper around the 'require' method.
00:13:00.000 Since it intercepts that second argument, it can carry out some housekeeping for us.
00:13:07.680 This is crucial when we consider how Namenwerk manages loading.
00:13:15.280 Remember that we also have a registry keeping track of which loader is managing which path.
00:13:20.840 The next steps are to check if this is a file or a directory.
00:13:27.040 If it’s a file, we call the original require method after ensuring that Ruby can handle it.
00:13:33.440 If the file was loaded successfully, we execute some housekeeping.
00:13:38.920 The autoload collection is dynamic, growing and shrinking based on the autoload events.
00:13:45.360 Essentially, we maintain only what is needed in memory.
00:13:52.720 This final housekeeping phase ensures that if a constant loaded successfully, we then delete its entry from the collection.
00:13:58.560 If the file wasn’t loaded, we raise an error.
00:14:04.720 Reloading is disabled by default in Zeitwerk.
00:14:11.200 To enable it, you have to explicitly turn it on.
00:14:17.280 If the reloading is enabled, we keep track of the things that need unloading.
00:14:25.760 Directories work similarly; if you refer to 'admin', the wrapper will trigger necessary actions.
00:14:35.280 Using directories also implies a collection is tracked.
00:14:42.960 When referencing 'admin', we will initiate an autoload.
00:14:48.320 The key takeaway here is that creating module objects is lazy—executed on demand when called.
00:14:54.720 For example, when you access the 'admin' module, its autoload is triggered, creating the module on the fly.
00:15:02.960 The same logic applies when you encounter nested directories.
00:15:10.120 If we have an 'admin' namespace with nested classes, Zeitwerk can handle it seamlessly.
00:15:16.560 All autoloads defined previously are kept track of with their corresponding references.
00:15:23.360 In conclusion, we scan root directories and define autoloads only at the first level.
00:15:31.440 When necessary, autoloads trigger loaders accordingly.
00:15:38.400 If you use your application, the loading will occur, otherwise no actions are taken.
00:15:46.640 When autoload triggers, we intercept the requires and can do housekeeping.
00:15:54.960 This allows us to define module objects on the fly as needed.
00:16:04.960 Reloading is straightforward; Ruby has no API for removing things, so we emulate it.
00:16:10.920 If you follow conventions, reloading happens smoothly.
00:16:19.680 We track autoloads that haven’t been triggered and remove them from memory.
00:16:27.920 After reloading, we set new autoloads which then trigger their corresponds requires.
00:16:35.520 The loader runs again from square one.
00:16:42.920 Eager loading is also simple; it uses the same general process.
00:16:51.520 Eager loading is not a recursive 'require', but rather an autoload process.
00:16:58.920 It performs a breadth-first traversal through directories.
00:17:06.080 By knowing which autoloads haven’t been fulfilled, we can proceed only to load needed modules.
00:17:14.080 Whenever we reach a defined file, the loader triggers its autoload.
00:17:22.920 That triggers its dependencies progressively using the loader system.
00:17:30.640 Despite large hierarchies, we achieve efficient loading using the autoload mechanism.
00:17:39.440 That’s all I have. I hope this clarifies how things work behind the scenes.
00:17:45.840 Thank you.
00:18:03.280 Let’s take a moment to appreciate.
00:18:04.640 Thank you.
00:18:08.340 [End of transcript]