RailsConf 2018

Booleans are Easy - True or False?

Booleans are Easy - True or False?

by Craig Buchek

In the RailsConf 2018 session titled "Booleans are Easy - True or False?", speaker Craig Buchek discusses the complexities and common pitfalls of using boolean values in programming, especially within the Ruby language. Although booleans represent simple true or false states, their usage can often lead to confusion and bugs if not handled properly. The presentation aims to highlight anti-patterns of boolean implementation and suggest better alternatives.

Key points outlined in the talk include:

  • Understanding Ruby's Booleans: In Ruby, booleans are represented as instances of the true class and the false class. The absence of a boolean class and the treatment of other objects as truthy can lead to unintentional errors.
  • Method Parameters: Utilizing booleans as method parameters can create confusion, particularly if developers cannot recall the meaning of 'true' or 'false'. Buchek advocates for using named parameters to provide clearer context.
  • Representing Application State: Relying on multiple booleans to track application state can lead to nonsensical combinations (e.g., editing and saving simultaneously). A better approach involves using a single state representation, such as enums, to ensure valid states.
  • Primitive Obsession: The practice of using booleans or other primitive types can lead to code smells. For example, instead of using a boolean to signify if an object is deleted, tracking the timestamp of deletion offers more context and usability.
  • Exponential Complexity: The exponential growth of conditional cases based on independent boolean variables is a significant problem. For instance, three independent booleans can result in eight different combinations or states, which complicates logic and testing.
  • Refactoring Boolean Logic: The presentation discusses boolean algebra and techniques for simplifying boolean expressions. Applying these techniques not only improves readability but also helps prevent errors in code.

Buchek emphasizes the importance of writing code that is clear and maintainable. The motto is to optimize for understanding, as code is read more often than it is written. A significant takeaway from the session is that improving the readability and clarity of code, even if it requires more initial effort, will benefit both the original developer and future maintainers of the codebase.

00:00:10 All right, welcome! My name is Craig Buchek, and I'm going to talk to you about Booleans today. If you want to follow along, I've got slides up here in the lower right corner. You can refer to them if you need to later. I do have some references and more detailed information included in the presenter notes. You can hit 'P' to toggle the slides. My Twitter is in the upper right corner if you want to tweet at me or about me. I live in St. Louis, which means I get to go to a conference called Strange Loop without having to pay for any travel. I went last year, and it's a great conference, although there are lots of things that go over your head. It's really inspirational.
00:00:38 My favorite talk was actually the pre-conference day at Elm Camp. Elm is a programming language that compiles down to JavaScript. I attended a talk about Booleans by Jeremy Fairbank, and that talk actually inspired me to create this talk. I thought to myself how someone could talk for 40 minutes about Booleans. It turns out that’s really hard! My talk is probably going to be closer to 30 minutes than 40, but Jeremy’s talk truly inspired me; it was my favorite talk. Since Ruby is very different from Elm, this talk is going to be quite a bit different from Jeremy’s, but I’d recommend watching his as well.
00:01:16 So, onto Ruby. Booleans are pretty simple. Things are either true or false. In Ruby, true and false are instances of the TrueClass and FalseClass respectively, and there is really only one instance of each. No matter how you obtain the true value, it’s always the same object. If you use the object ID method, you will find that all true values will return the same ID, while all false values return ID 0.
00:01:38 Oddly, you can’t create a new instance of these classes, which makes sense, because if there’s only one of them, you wouldn’t want to have a second instance. Ruby doesn’t have a Boolean class; if you try to reference Boolean, it won’t recognize it. If you look at TrueClass, it has no ancestors that are Boolean—only Object, Kernel, and BasicObject. Basically, any object will have those same ancestors. This appears to be due to Ruby's Smalltalk heritage. Since Ruby is dynamically typed, you never really need to declare a variable as a Boolean. Therefore, in Ruby, you can use any object wherever a Boolean is expected. For instance, you can treat an integer, like 1, or a new object as true.
00:02:17 There are only two things that are treated as false: false and nil. You might hear the terms 'truthy' and 'falsy'; they describe values that Ruby interprets as true or false in a Boolean context. In RSpec tests, you might come across those terms, for instance, expecting 'one, two' to be truthy, and you might expect something that returns false or nil to be falsy. I wouldn’t recommend using those terms in any if-statements. It’s better to use something idiomatic and intention-revealing that explicitly returns a Boolean.
00:03:21 In Ruby, we can check if a string is empty or nil, but this can get problematic in Ruby compared to other languages like Perl, PHP, or JavaScript, which interpret an empty string and the value 0 as false. This can lead to a lot of errors. This issue is also known as the ‘billion-dollar mistake,’ introduced by Tony Hoare, who mentioned that these null pointers have caused over a billion dollars worth of damage to the industry.
00:03:59 Next, we’re going to talk about using Booleans as parameters in methods. I spend a lot of time in the Rails console, IRB, or Pry when debugging tests and often want to see the class of an object. You can simply use 'object_name.class' to find that out. I also often want to see what methods that object responds to, using either 'object.methods' or 'class.instance_methods.' These two methods return the same thing but often produce a long list of methods that can overflow off the right side of the screen. Show of hands, who’s familiar with these methods? Okay, those with your hands up, do you know they take an optional parameter? Turns out a lot fewer than three people know that they have to pass true or false to not show superclass methods. The answer is false for these methods; however, I can never remember that and end up looking at the documentation. More likely, I just use trial and error to see which one it takes.
00:05:40 To use it properly, I have to remember whether to pass true or false. Is there a way we could do it better? These are built-in methods, and theoretically, the best way to improve this API is to use named parameters to describe the argument. It would be nice if instead of 'methods(false)', I could say 'methods(superclass: false)' to not show the superclass methods. However, this method predates named parameters in Ruby, and we need to maintain backward compatibility. In older Ruby versions, we still have to use options hashes to emulate named parameters. Therefore, even to take either a Boolean or a hash, we have to use this method.
00:07:01 I actually wrote a replacement method that gives the option to use the old way by passing false or to use that named superclass parameter. Perhaps I’ll submit that to the Ruby core team. Ideally, it would be better to have two separate methods: one called 'methods' that gives you just the immediate methods and another called 'all_methods' that gives you everything. How would you describe what the original method does? I would say it shows the methods defined for this object or only defined by its immediate class. Anytime you have an 'or' in a description of a method or class, that’s a code smell, indicating that you’re probably violating the single responsibility principle.
00:07:55 I came across this example from Rails a couple of months ago when I was upgrading to Rails 5. The first method is the old Rails API where if you have a user object with an association called 'things', normally you would call 'user.things' to get all the things that user owns. If you wanted to reload, you would pass true. The second is the current API 'user.things.reload', which is more explicit. What does that true mean? The original API was deprecated as of Rails 5 and removed in Rails 5.1. Not only is the new one clearer, but the old way could lead to some very subtle bugs.
00:09:44 For instance, I found this bug in Rails issue 26413. The bug report complained that the sales were not being reloaded, and the user was asking why. Can anyone see the issue here? The problem is that the sales association doesn’t take a hash; it takes a Boolean like the previous screen. Hence, a statement like 'sales.limit(10)' is misinterpreted as true. Since anything besides nil and false is treated as true, it’s just as if calling 'sales(true)' and reloading the sales without any limiting. To fix this, you’d have to do 'sales.limit(10)' to get it to work properly.
00:10:41 Now, let's take a short detour to talk about something called 'Kinesins.' Kinesins is a term that existed before computers. These definitions come directly from Webster's Dictionary dating back to 1913, referring to common birth or production of multiple things simultaneously or the act of drawing together. In 1992, a person named Alyer Paige Jones brought this concept to the object-oriented programming community first through a paper comparing techniques based on encapsulation and kinesins. He wrote a pretty well-regarded book, 'What Every Programmer Should Know About Object-Oriented Design,' and a follow-up in 1999 on UML. He discussed Kinesins, and I've included links to those materials in the presentation notes.
00:11:41 Here’s a rough definition of Kinesins: it measures the coupling or dependencies among components within a software system, particularly in object-oriented programming. In 2009, Jim Wyrick introduced this concept to the Ruby community and gave several talks about it, the first one being called 'The Grain Unified Theory of Software Design.' In that talk, he provided an example similar to the one I gave about the two methods, discussing excessive Kinesins and how it makes systems hard to change and maintain. The argument is that Kinesins underlie many other rules of good object-oriented design.
00:12:45 Here’s a list of different types of Kinesins, which are ordered from weakest to strongest. We should prefer the ones towards the top, such as agreement on the name of something in two different parts of your code. For example, when you call a method, the name used when calling it has to match the method name when you define it, and this rule applies to any variable or constant as well.
00:13:32 Kinesins of type refers to an agreement on the type of something. In Ruby, while we don't have static types, we do employ duck typing, which allows the question of whether an object can act like a specific class. Kinesins of meaning pertains to agreement on the meaning of specific values like true or false. Meanwhile, Kinesins of position refers to an agreement on the argument order in a method that takes multiple parameters. Although, if you use named parameters, you'd shift to Kinesins of names, emphasizing the importance of using named parameters.
00:14:19 Next, I want to discuss Booleans used to represent application state. Let’s say we have an editor class that has several Booleans representing possible states. We might need to track whether the user is editing, if the file is being saved, or if there’s an error condition. The problem arises because we can end up with combinations of states that don’t make sense. What does it mean to be both editing and saving? If there’s an error, do the other fields even remain valid?
00:15:04 We should aim to ensure our code never gets into an impossible state. Richard Feldman talks about this in a great talk titled 'Making Impossible States Impossible.' To improve this, we could use a single field to represent the state. While it may not seem like a big improvement due to the presence of the case statement, it does help us avoid meaningless or invalid states. The options in our case statement don’t require a specific order, so we can do better.
00:15:48 ActiveRecord enums were introduced in Rails 4 or maybe 5; I'm not quite sure. They define possible states, which helps catch bugs. Ruby will identify an incorrect method name more easily than a typo in a symbol or string. For example, we can have a state method like 'editing?' for defined states such as editing, saving, or error, contributing to cleaner checks. There are also several state machine gems that could help, but if you can use enums, I highly recommend using those first before looking into other gems.
00:16:36 The state class is quite similar to the enum I described earlier. However, if we have a sufficiently specialized state class, it could delegate rendering responsibilities to the object itself. Using a state class can also help remove the code smell referred to as primitive obsession, where a primitive type is used when a more specialized type would suffice. Examples include using floating-point numbers to represent money or using a string to represent a URL, when a URL has various components such as the web server name, path, fragments, and query strings.
00:17:26 In Ruby, we often abuse strings this way, which is a common term referred to as stringiness. In this example, we’ve replaced a Boolean parameter with a symbol primitive, though the code still exhibits primitive obsession. Next, I’ll discuss Boolean fields. Suppose we have a Boolean attribute that tracks whether an object has been deleted. Using my tool called Virtus Active Record, I can demonstrate what attributes a model has. We have a 'deleted' field that is Boolean and indicates whether something has been deleted.
00:18:29 Instead of marking something as deleted with a Boolean, I suggest marking when it was deleted. This change proves useful and allows us to keep track of who deleted it and when. In terms of auditing, it’s typically essential to know who did something and when they did it.
00:19:39 Now, let's talk about exponential complexity related to Booleans. Let’s say we have a method that renders a document and takes each of three Boolean options. If these states are independent of each other, how many cases do we have to handle? If this method includes three independent Boolean variables, we could potentially have eight cases. If you are fortunate, you may design it simply, but if you’re not, it may appear as an extensive method with 29 lines of code, all while not accomplishing anything particularly interesting—just calling other methods for each possible case.
00:20:40 And don’t forget the eight test cases that you’ll need for validation. It’s likely you’ll forget one, and without protection, you might even overlook this error. The formula for the number of conditions is 2 raised to the power of n, where n is the number of independent Boolean variables. This signifies an exponential growth of the bad kind.
00:21:08 Our solution is to represent the state with a singular variable. By passing the state, we only need to manage one case for each possible state. This approach can reduce the cases we handle to three or four if we also account for the missing case, with a similar number of tests.
00:21:41 Now, let’s delve into Boolean operations and Boolean algebra, starting with the basics. There’s a little square corner sign that symbolizes the Boolean operation for negation. In Ruby, the exclamation point, often referred to as ‘bang,’ represents the negation operator. There’s also a tilde for binary operations, though in Boolean contexts, it's not applicable. Negation works like this: not false equals true, while not true equals false. That’s called the truth table for negation. If you think of true and false as 1 and 0—or on and off—you begin to see its relevance in electronics.
00:22:27 For example, if you have the variable 'A,' the output on the opposite end (Q) equals whatever condition 'A' fulfills. The word for ‘and’ in Boolean algebra is conjunction, represented by a caret symbol (^) or an inverted V. If you remember, the intersection in sets has a similar appearance. There’s a symmetry between sets and Boolean operations in Ruby. The syntax for 'and' uses '&&.' For mathematical or digital logic purposes, you might see different variations, such as the single & leading to confusion about precedence. The truth table for 'and' illustrates that in Boolean algebra, we multiply the variables to achieve results, therefore leading to concise outputs.
00:24:57 Now, when discussing the word ‘or’ in Boolean algebra, we refer to it as disjunction. The symbol for this operation resembles the lowercase 'v' and is reflected in Ruby as '||'. As with 'and,' we have binary options for 'or.' The truth table for 'or' aligns with Boolean rules where if either side is true, the result is true. In digital logic, this translates to an 'or' gate. Again, we see unique behaviors that govern operations like inclusive or, exclusive or, and variations thereof. There is a collection of 16 possible operators for two Boolean operands, although most are rarely employed.
00:25:39 There are laws that govern how to transform Boolean expressions into simpler forms. A friend of mine encountered this while submitting a pull request to clarify some aspects of a predicate method—essentially, a method returning true or false based on other specific methods. Notably, having an explicit true or false can indicate code smell, except for use cases like guard clauses that return early. To optimize our code for readability, we can use these transformations on Boolean expressions, such as through the law of identity or de Morgan’s law. It’s crucial to note that Ruby utilizes short-circuiting, where the code that runs might differ based on expression order, yet the resulting value will remain the same.
00:26:58 For example, we can convert an if-then-else structure into Boolean operators. This allows us to read the expression closely resembling the original intent, so we can generate more manageable code by applying logical operations. Learning to eliminate if-statements can expose duplication and expose the easier route through transformations, such as the identity law. This helps us rely less on Boolean expressions, streamlining the final expression until we construct a tautology that simplifies to true, enabling further refinement.
00:29:20 In conclusion, which iteration of code is preferable? Based on readability and comprehensibility, I would argue the latter version is easier to understand and adapt than the former. The initial code might function correctly, but we read code more often than we write it. Thus, we must optimize for readability, prioritizing understanding above all else. Abstractions are intended to facilitate comprehension. As Sandy Metz suggests, it is valuable to invest significant effort into finding the most accurate abstraction.
00:30:57 Writing quality code will require more time upfront, but will pay off in the end. I hope that I’ve demonstrated that there’s much more to Booleans than meets the eye. The essential takeaway is that we can make our code easier to read and comprehend. Take the time to ensure that the next person reading your code—often you—can do so easily. It’s the right thing to do, as it significantly supports your teammates. Thanks for attending! Thanks to Jeremy Fairbank for inspiring this talk, the examples from Amos King, my local user group in St. Louis, my team at F5, RailsConf for selecting my talk, and my employer, F5, for sponsoring my travel. We do have open web developer positions if you're interested, so please feel free to come and talk to me. A coworker shared a joke with me: 'The best thing about a Boolean is that even if you’re wrong, you're only off by a bit.' I don't know how funny that really is, but it’s one of the few jokes available about Booleans. One reason I enjoy giving talks at conferences is to spark conversations, so please do not hesitate to come and speak with me. You can find my slides online; they were created using a tool called Remark to produce HTML. Feel free to tweet at me, check out my GitHub, view the presentations there, or send me an email. Thank you for your time!