Ruby Techniques by Example

MountainWest RubyConf 2010

by Jeremy Evans

In the talk "Ruby Techniques by Example," Jeremy Evans presents a detailed exploration of Ruby programming techniques aimed at enhancing code quality and extensibility. He emphasizes the importance of reading and understanding existing Ruby code to improve one's programming skills, a concept advocated by renowned Ruby programmer James Edward Gray. Jeremy discusses several key techniques and principles throughout the presentation:

Extensible Class Design: He starts by explaining the significance of creating easily extensible classes. By structuring method definitions within modules, users can easily override methods and call super to retain original functionality, allowing for greater flexibility in extending classes.
Handling Class-Level Data: Jeremy compares three approaches to managing class-level data in inheritance hierarchies: class variables, the delegating approach, and the copying method. He cautions against class variables due to their shared nature across subclasses and highlights the pros and cons of the other two approaches.
Safety in Method Definition: The discussion moves into metaprogramming and the careful handling of method definitions. Jeremy points out the risks associated with using string evaluations in defining methods, illustrating the need to validate inputs to avoid potential errors.
Creating Domain Specific Languages (DSLs): He describes a simple implementation of a DSL for SQL validations, highlighting the challenges and solutions faced when evaluating blocks within a basic object context.
Unified Interface for Multiple Backends: Jeremy illustrates how to develop a consistent interface for interacting with different database backends, showcasing the necessity of managing exceptions in a way that keeps the API friendly for users.
Common Pitfalls in Ruby: The talk concludes with a recap of several reminders for Ruby developers about how to avoid common mistakes, particularly in transaction management and proper usage of logical operators. He emphasizes best practices to ensure that the code remains clean and efficient.

Overall, Jeremy's presentation is a comprehensive exploration of Ruby techniques that can help developers write better, more extensible code, with safety and performance considerations at the forefront of his discussion.

00:00:15.120 Hi everybody, I'm Jeremy. My talk today is about Ruby techniques by example. I'm aiming for a slower pace than yesterday's lightning talk, so if I am going too fast, please tell me to slow down.

00:00:30.080 Now, this presentation will take you behind the scenes of production Ruby code and show you techniques that you can use in your own code. Last year, a great Ruby programmer told us that as we become better Ruby programmers, we should read more code.

00:00:38.480 For those of you who weren't with us last year, that programmer is James Edward Gray, who ran the Ruby quiz for a long time. I don't know about you, but personally, I find that reading code is often boring. I think the reason for that is that the signal-to-noise ratio of most code is quite low, at least in terms of learning from it.

00:00:53.399 Now, I don't mean that most code serves no purpose, but I do think that the majority of code you read is not going to teach you new things. I'm not suggesting that reading code is a bad idea, but I do think that it may not be the best use of your time. That's where this presentation comes in: I've read over 10,000 lines of code so that you don't have to, and I've chosen to highlight only those techniques that I think are interesting.

00:01:30.799 The first technique I'd like to discuss deals with creating more easily extensible classes. By extensible, I mean classes that are easy for other users to extend in any way that they see fit. I would like to give a warning that the rest of the presentation is very code-heavy, and I will try to give you enough time to read the code on the slides before I talk about it. However, if you want more time, please just speak up.

00:01:46.239 Now consider this quite common way of adding class and instance methods to a class. For most classes, this is perfectly fine, but it is not the most extensible way to add methods to the class. The problem with this approach is that it's difficult for someone to override one of your class or instance methods that you have defined and call super to get the default behavior.

00:02:30.440 You can have users subclass your class to override your methods, but that will not affect all instances of the class. A user may want to affect instances that they are not themselves creating. The goal here is to allow users to override the methods that you have defined, but still call super to get the default behavior, affecting all instances of the class.

00:02:48.319 Now, here's one solution that unfortunately does not work. When you define methods in a class like this, you're defining them directly on the class or the class's singleton class. Class methods defined this way cannot be overridden by modules, and instance methods defined this way can only be overridden by modules on a per-instance basis. You cannot override the instance methods for all instances at once, and this is due to how method lookup works in Ruby.

00:03:13.239 Now, I'm going to attempt to explain Ruby's method lookup quickly, with some simplification and imprecision. Basically, when a method is called on any Ruby object, it first looks in its singleton class and then any modules included in that singleton class, in reverse order of inclusion. If the method hasn't been found or if the found method calls super, it substitutes the singleton class with the singleton class's superclass and then restarts the lookup.

00:03:41.480 Given that lookup process that I've described, there are three interesting cases for classes. The superclass of the singleton class is the singleton class of the class's superclass. Unless the current class is subclassed directly from Object, in which case the superclass of the singleton class is the class itself. Now for other objects, the superclass of the singleton class is just the object's class itself.

00:04:08.680 Here's an example of the method lookup process for class methods of Bignum. Let's extend the Integer class with a new module, which includes the module in the Integer singleton class. The lookup tries the singleton classes of Bignum and Integer, then the module that extends Integer, and then the singleton class of Numeric. Since Numeric's superclass is Object, the singleton class of Numeric's superclass is Class, so it tries that next, followed by the module Object and finally Kernel.

00:04:37.320 Now, here’s the method lookup process for an instance of File extended with a module. Ruby will first look at the singleton class of that object, followed by the method, sorry, the module that you extended the class or the object with, followed by the File class and the ancestors of File, which are Class IO, Module File, Constants (which is included in IO), Module Enumerable (which is included in IO class), Object, and finally Kernel.

00:05:04.039 Now that we've finished the digression on Ruby's constant lookup, let's get back to extensible design. If your initial class definition defines all of your class and instance methods in modules, then future modules can extend or be included in the class. They can call super to call the definition of the method in the most recently included or extended module. This design approach works very well, but the simplistic approach shown here is a tad verbose.

00:05:27.360 You can structure your extensions so that an outside module encloses both the class method and instance method modules. Then you can make extending your classes as easy as a simple method call inside the class. To implement this is actually quite simple. Here's a simplified version of SQL's models plugin method. You just need to check if these submodules are present and extend them or include them in the class if so. Now you could accomplish the same thing using the extended method of the class methods module or the included method of the instance methods module, but some plugins may have only class methods and some plugins may have only instance methods, which is why a separate method is a superior solution.

00:06:13.160 With your plugin method set up like that, you can allow the user to extend the class with as many plugins as they want. If all three plugins define the same class method and call super, when the user calls `Person.all`, it will call the version in plugin three, then plugin two, and finally plugin one. This design approach makes it possible for extensions to have complete control to override any part of the class's behavior while making it easy for the user to extend with any combination of extensions for their own use.

00:06:45.360 The second technique I'd like to discuss deals with handling class-level data in inheritance hierarchies. I'm going to go over three different approaches to this, which I'm going to refer to as the class variable approach, the delegating approach, and the copying approach.

00:07:04.280 All right, this is the class variable approach, and you will rarely see it used as it’s basically unworkable when changes in subclasses need to be independent. With this code, setting the class variable inside Customer affects Person and Employee as well since class variables are shared throughout an inheritance hierarchy. Class variables should generally be avoided and are definitely not appropriate for situations where parts of the hierarchy are independent.

00:07:21.160 Here's a simplified version of the delegating approach pulled from Active Support. With the delegating approach, whenever a lookup of a class instance variable is requested, it looks in its own class. If it has not been defined there, it tries the superclass, continuing up the hierarchy until the class instance variable is defined or the top-level class is reached. The possible advantage of this approach is that if you create a subclass and later modify the value of the class instance variable in the superclass, the subclass will see the changed value unless it has been overridden in the subclass itself. The disadvantage is speed, as lookup can be significantly slower, especially for deep hierarchies.

00:07:40.560 Now this example of the copying approach is pulled from SQL's Force Encoding plugin. When subclasses are created, the superclass's inherited method is called with the subclass as an argument. You can override the inherited method in your class, call super, and then set values in the subclass. In this case, the value of force encoding is copied from the superclass to the subclass.

00:08:04.760 The advantage to this is that lookups in the subclass are just as fast as in the superclass, while the possible disadvantage is that modifications to the superclass after subclass creation are not copied to the subclass.

00:08:20.440 The next technique relates to safety, in particular, safely defining methods via metaprogramming. I'm going to pick on the delegating example I showed earlier for an example of unsafe metaprogramming. Anytime you use a value created with a string that’s created with interpolating arguments, you need to ensure that the string created is valid Ruby code.

00:08:43.659 For most cases, this will work just fine, but take a few seconds to look at this method and see if you can spot some safety issues. Let's consider what happens if a name contains a character valid in a method name but not valid in a literal, such as a space. In Ruby, it's perfectly valid to have a space inside a symbol and inside a method name, but the Ruby code produced by superclass delegator will not be valid.

00:09:06.560 Now, in this case, the string eval will raise an error. But if the code body did not raise an error, instead of defining a method named 'fubar' that takes no arguments, we would define a method named 'fu' that takes one argument. You can fix this by switching from a string eval to a block eval and referencing the objects directly. However, this has performance implications. Methods that are defined with define_method are closures and do not perform as well as methods created with eval'd strings that are not closures.

00:09:32.960 So, in general, there's a performance versus safety trade-off, with stringy evals being performant and blocky evals being safer. However, you don't have to sacrifice one for the other. You can write code that has maximum performance in the normal case using a stringy eval while still handling the abnormal case with a blocky eval. This example is taken from Sequel, which creates accessor methods for the columns it finds when introspecting the database which may contain spaces or other characters not valid in Ruby's letter rules.

00:09:56.919 Basically, you just need to check your inputs and make sure that they would create valid Ruby code. If so, you can use the stringy eval. If not, you have to use the blocky eval. Now, it is important to note here the use of a separate method for the blocky evals instead of including them in the main method. This is because blocky evals create closures, and objects created in the surrounding scope will not be garbage collected even if they are never used in the blocky eval code.

00:10:22.480 In the simplified example, it’s not a major savings, but one application using SQL measured the savings of half a megabyte of memory per process by moving blocky evals to a separate method. Our next technique is a brief but complete description of a simple DSL and some issues that you need to deal with for that your DSL works nicely.

00:10:40.720 This example is taken from the validation helpers block plugin for SQL, which provides a simple DSL for SQL validations. The DSL syntax should be fairly straightforward. Inside the validates block, methods indicate attributes, and inside those blocks, methods indicate the type of validation that you are doing.

00:10:56.560 I'm going to go over how this DSL is implemented, which is actually fairly simple. The first step is the definition for validates, which just passes along the block it receives along with a reference to the current object to a new DSL class. The DSL class is named ValidationHelpersAttributesBlock to indicate that method names inside the block are used to specify attributes. The implementation of this class is fairly simple, but the important thing to note is that it derives from SQL's basic object.

00:11:21.680 With DSLs, it's usually important to derive from a basic object class so that the methods defined on the object can still be used. There are a couple of issues with using basic object. For one, there is no basic object in Ruby 1.8.

00:11:39.919 Alright, here's SQL's basic object class. Note that it has separate versions for Ruby 1.8 and Ruby 1.9. In Ruby 1.8, the SQL basic object class is just a subclass of Object with most of the methods removed using 'undef method.' Now we will get to the Ruby 1.9 implementation in just a bit, but first, I want to take two quick surveys. A quick show of hands, who knows what this code will do in Ruby 1.9?

00:12:01.680 Does anyone use Ruby 1.9 here? Alright, a few people. Now, if you think output is zero, you are correct. How about this code? Let's get another show of hands. Who knows what this code will do in Ruby 1.9?

00:12:13.440 Mats, what will it do? Yes, so you get a name error because inside the definition of basic object you could not directly access constants to find an object, which is where all other classes are defined by default. You can work around this by adding a double colon in front of all constants inside a basic object. However, with DSL design, users are generally not going to know to do that as they aren't going to know that the blocks they are going to be using are going to be evaluated inside the context of a basic object derived class.

00:12:39.039 Here's an example of a separate SQL DSL that allows easy filtering of data sets without changing the constant lookup inside basic object. This used to raise constant lookup errors on Ruby 1.9 because the Time constant did not exist inside a basic object. The previous workaround was just to add a double colon before Time, but by changing the constant lookup, you can allow users to not worry about prefixing constants in DSLs with a double colon.

00:12:58.520 This brings us to how to fix constant lookup in basic objects. Thankfully, it's actually fairly easy: you just need to add a `const_missing` class method to the object that calls `Object.const_get`. Note that you do need to preface the reference to object with a double colon; otherwise, you should get a SystemStackError. However, when I tried this without a double colon, I got a SIGILL error in a core dump.

00:13:29.840 Now that we finished the digression on constant lookup, let's look back at the DSL implementation that we were talking about. As you can see, this is fairly simple. In initialize, you just keep a reference to the outer self in an instance variable and then instance eval the block. All method calls in the block are handled by method_missing, which passes the outer self and the method name specifying an attribute along with the block to a new DSL class.

00:13:45.960 Method missing is used here because potentially any method name is valid, and in general, you should only use method_missing if any possible method name is valid. Here's the final DSL class that handles the actual validation. In initialize, again, it's just keeping a reference to the outer self and the current attribute, and then instance evaling the block.

00:14:07.600 Note that method missing is not used here because we know in advance which validation methods exist. We only create methods for those validations. In this particular case, the methods also have different arities, so two separate classes of evals are used: one for methods that accept an additional argument and one for methods that do not. In both cases, the methods created via metaprogramming just call the appropriate validation method on the object for the given attribute.

00:14:34.920 Now, my next topic isn't really a technique. It's just a code example that shows off a little appreciated aspect of Ruby. One of the libraries I work on is Scaffolding Extensions, which is a very flexible admin front-end for multiple web frameworks. One of the things that it allows you to do is override pretty much all of the defaults.

00:14:51.920 For example, you can set the default fields to be displayed on the pages for the Person model to be name and age while also showing the position on the browse page. Now, I'm going to go over two of the internal methods that implement this.

00:15:06.200 First, all methods have a default implementation that is defined by the library, but the user can override those methods for specific cases by defining methods or instance variables that handle that case. This method takes multiple method name symbols. For each method name symbol, it first aliases the default method to a private method, and then it creates a public method that checks if the method has been overridden for the given argument. If the method has been overridden, it calls the overridden method; otherwise, it falls back to using the default method.

00:15:28.480 Now, here's what I think is the more interesting part. In order to use this in classes, you just need to extend the class with the Overridable module. The Overridable module has an extended singleton method, and in order for it to work correctly, it exploits a little appreciated aspect of Ruby, which is that virtually all objects can have singleton classes, including all singleton classes themselves.

00:15:42.840 So when a class is extended with Overridable, it adds the necessary metaprogramming methods to the singleton class of the singleton class of that class. Inside the singleton class, it calls the metaprogramming methods, which in turn override the given singleton methods on the class itself. This is the only production code I've seen that modifies the singleton class of a singleton class.

00:16:03.680 Now, our next topic is about presenting multiple backends using a unified interface. The general strategy for doing this is using separate subclasses for each backend and having a method in the parent class return an instance of the appropriate subclass. Distilled to its essence, that's what SQL's connect method does.

00:16:20.920 SQL's connect method is supposed to return an appropriate SQL database instance for the database. In order to handle the differences between databases, the connect method returns an instance of the database adapter-specific subclass. It processes its input and then calls the adapter class with the appropriate adapter.

00:16:40.960 Now, this is a simplified version of the adapter class method. It just takes the adapter name given and requires the appropriate adapter file. In each adapter file, the adapter registers the adapter's database subclass in the adapter map, and then the adapter class is just looked up in the adapter map and returned.

00:16:56.880 Back to the connect method: it just takes the class returned from the adapter class method and instantiates a new instance of that class using the given options. Now that's basically all you need to do for the initial setup to work, as long as the subclasses implement the appropriate methods. The wrapping is fairly transparent—that is, until you have to handle exceptions raised by the underlying backends.

00:17:20.080 For example, let's say you try to insert into a non-existent table. In order to treat the multiple backends as one, you don't want a PG error to be raised when using PostgreSQL or MySQL, instead, you want to wrap the underlying exception classes in your own exception classes. In this case, no matter what backend you are using, if that backend raises an exception, SQL will raise a SQLDatabaseError.

00:17:43.800 Doing this properly is actually a little bit more work than you might think. Here's a simplified version pulled from SQL 2.0's MySQL adapter. This is the simplest thing that works, but it has some unfortunate drawbacks. When the SQL error is raised, the exception message from the underlying exception is kept, but the backtrace is lost. What's also lost is the ability to tell which backend raised the message, which can be helpful when debugging.

00:18:06.080 Now, this is a simplified version of SQL's current exception class conversion method. First, note that the exception class name is included in the exception message, which easily allows the user to tell what the underlying exception class was while still only rescuing SQL's exception class. Also note that the wrapped exception is kept in its entirety, mainly to make it available for use in a case statement inside a rescue clause, so that higher-level application code can treat different backend errors differently if it wants to.

00:18:23.440 Finally, the backtrace for the newly created exception is set to the backtrace of the underlying exception, so the user can easily see which line actually raised the error. I'd like to close out the presentation with a few simple reminders to prevent some issues in Ruby code.

00:18:41.600 Now, most experienced Ruby programmers probably know about all of these, but still see these occasionally in production code. The first reminder relates to using strings with eval. Let's see if you can identify a problem with this code.

00:19:06.360 Now, the main problem is that the file and line arguments were not passed to instance eval, so if an error is raised, you aren't sure where it happens. The solution is fairly simple: just add the file and line arguments to all of your stringy evals. Now note that if you are using heredoc, you should add one to the line argument because the string starts on the line after the line that calls the eval method.

00:19:28.720 Then, when the eval code raises an error, the user can see where the error actually occurs. The second reminder relates to the appropriate use of the logical or operator in Ruby. Think about problems with this code, which is supposed to set the single threaded mode if given or fall back to whatever the class default is.

00:19:50.840 Now, the problem with this code is that if the user sets the single threaded option to false when instantiating the database, it will always use the class default. So if the class default is true, the database will be put in single-threaded mode, even though that’s not what the user requested. This is due to how the short-circuiting logical or operator works.

00:20:11.360 The solution to this is to not use a logical or operator at all, but to switch to a conditional. The basic principle is that anytime nil or false can be a valid value, you cannot use a logical or operator.

00:20:33.880 The final topic is a combination of a fairly simple technique involving creating re-entrant methods along with a reminder about the proper use of ensure. Let's first discuss reentrancy and why it is important, at least in this context.

00:20:56.880 Consider this code: the insert method inserts a hash of values into one or two tables inside a transaction. However, the fact that insert uses a transaction is not obvious to the caller. If the caller calls insert inside their own transaction, you don’t want to open up a new database connection in that transaction; you want to reuse the database’s currently open transaction.

00:21:15.440 Now, this is the actual code used for an old version of Sequel’s transaction method. These parts deal with re-entrancy. Basically, you need to store references to the threads that are currently inside the method. Then, before they begin the rescue ensure block, you check if the current thread is already inside the method, and if so, you use return yield to immediately pass execution to the block and return.

00:21:43.920 This return yield technique is used in multiple places inside of SQL. Now let's revisit the insert method we defined earlier and consider an issue with it. Think about the highlighted line and how it may not work correctly.

00:22:03.920 Now, the problem with returning inside the transaction block is that the lines of code between the yield call and the rescue block are not called. The solution to this is fairly simple: you just need to make sure that all code that must be executed is inside an ensure block.

00:22:27.680 I think the original author of this code did not want the commit query to be sent in case of an error, which is why it was not in the ensure block to begin with. Now if you only want code to be executed if no exception was raised, you should have the rescue block assign the exception to a local variable and then, inside the ensure block, only execute the code if the value of that variable is nil.

00:22:49.600 And that concludes my presentation. I hope you found some of these techniques interesting. Thank you very much for the opportunity to present here at MountainWest RubyConf!

00:23:12.080 Does anybody have any questions?

00:23:16.160 No questions? Alright, well if you don't have questions, I will attempt to show you something at least vaguely interesting to me.

00:23:19.840 Alright, who else uses Word for their presentations?

00:23:24.440 So what possible thing could go there in order to get this code to work?

00:23:29.920 Basically, this is something where you call the class method, and it'll call the class method first and then call the instance method on the same class. Anyone? Mats, that's too busy reading his email. It's okay.

00:24:32.160 Okay, Mats, where do you have those question marks? What's the only valid thing that can be there?

00:24:36.120 What’s the only thing where you call the class method and calling super will then call the instance method?

00:24:44.240 Well, the way I described Ruby's method lookup, classes are basically a superclass of Class is Object. So when you call the singleton method of Class, its superclass is Class itself, and you go to the instance methods, and that's how you end up with the same code. Thank you!

00:25:42.000 Thank you very much!