Overcoming Our Obsession with Stringly-Typed Ruby

Talks

David Copeland

#object-oriented-programming

#service-objects

#design-patterns

Overcoming Our Obsession with Stringly-Typed Ruby

by David Copeland

In the talk "Overcoming Our Obsession with Stringly Typed Ruby," presented by David Copeland at RubyConf 2014, the speaker addresses the issues that arise from excessive reliance on string types in Ruby programming. This phenomenon, humorously termed "stringly typed," leads to unnecessary complexity and ambiguity within applications, hampering maintainability and clarity.

Key Points:
- Definition of Stringly Typed: The term is introduced as a play on "strongly typed," indicating over-dependence on strings, often when better types are available.
- Common Usage of Strings: Strings are prevalent in input and output, especially in frameworks like Rails and in databases. However, misuse can lead to significant issues.
- Example of Zip Codes: The concept of zip codes demonstrates the pitfalls of treating strings as numbers. Although they appear numeric, they represent arbitrary identifiers that require validation to ensure accuracy.
- Problems with Zip Code Validation: In a sample application, a zip code is mistakenly entered as a string ("Walsh" instead of a valid zip code). This oversight illustrates how systems may accept invalid data without proper checks, potentially leading to serious flaws in functionality.
- Need for Type Enforcement: To address issues of data integrity, Copeland argues for defining clear data types in the system, using distinct classes to represent forms of data (e.g., a ZipCode class)
- Implementation Strategies: The speaker suggests implementing validity checks within these classes to ensure that only proper values are accepted, thus reducing the debt of checks throughout the code.
- Documentation and Consistency: Clear documentation and naming conventions help convey the intended types and behaviors throughout the application, facilitating better collaboration among developers.
- Balancing Flexibility and Safety: The discussion includes a consideration of the balance between the rapid development afforded by using strings and the risks posed by lack of type constraints.
- Conclusions: Copeland emphasizes that using explicit types can reduce system complexity, foster clarity, and maintain the integrity of data structures in applications, ultimately making the code more readable, maintainable, and less prone to errors.

Overall, Copeland's talk encourages developers to reassess their reliance on strings, advocating for a structured approach to type management that aligns with sound programming principles, thus enhancing software quality.

00:00:18.240 I'm Dave Copeland, otherwise known as Dave from 5000 on Twitter. I wrote this book about how to be a good software engineer, and I wrote this other book about how to build awesome command-line applications in Ruby.

00:00:24.599 While these topics aren't directly applicable to our talk, the concepts we'll learn here will help us build better applications and become better programmers.

00:00:30.320 The title of this talk is 'Overcoming Our Obsession with Stringly Typed Ruby.' So, what do I mean by 'stringly typed'?

00:00:35.920 Has anyone heard of this term before? Great. If you've ever read anything by Jeff Atwood, who's also known as Coding Horror on Twitter, then you might be familiar with it.

00:00:41.039 He writes a popular programming blog and has a fantastic post discussing various programmer jargon, including the amusing phrase 'stringly typed.'

00:00:47.480 He describes it as a riff on 'strongly typed,' used to refer to implementations that unnecessarily rely on strings when more programmer-friendly options are available.

00:01:04.680 We use strings all the time; input often comes as strings, and output typically needs to be in string format.

00:01:10.600 Frameworks like Rails love strings, and even databases often use strings. However, relying on them too much can lead to problems.

00:01:16.439 Let me give you a motivating example using zip codes. If you're not from the United States, you might know them as postal codes. Essentially, they are five-digit codes that help the post office sort mail correctly.

00:01:38.399 A zip code isn't just a number; it looks like one, but it always consists of five digits, even if it begins with zeros. For example, zip codes in Puerto Rico start with 009.

00:01:53.520 These codes correspond to specific geographical areas, and while you might notice patterns in the numeric values, like lower numbers on the East Coast compared to the West Coast, zip codes largely serve as arbitrary strings of numbers.

00:02:08.160 Considering this, let's examine a very simple application designed to send letters to people. We have a database of addresses and a third-party API to help us mail notifications, like account overdrawn messages or refund notifications.

00:02:38.920 Our role as programmers is to connect these two systems and create a service that fulfills our unique needs.

00:02:51.840 Now, here’s how the Ruby API provided by the third-party mailing service works. We configure a series of letters we want to send, each identified by a unique letter ID.

00:03:07.680 To send a letter, we find it using its ID, which returns a class containing a mail method. This method takes four parts of an address: street, city, state, and zip code, all as strings.

00:03:15.000 Calling this method will mail the letter to the provided address. To build our application, we create a database of addresses and store them as strings, which is typical.

00:03:27.319 Next, we have some code that reads address information from the command line and stores it in the database. For this talk, we can assume our database is a simplified way of managing data, so we won't be overly concerned with the database layer.

00:03:35.239 We can utilize this code to add new addresses to our database. To actually send letters, we will define two classes.

00:03:45.319 The first class wraps the third-party mailing service and makes it easier for our system to handle addresses as hashes.

00:03:58.440 This letter sender class will take the letter ID and the address hash, handling the necessary conversions internally.

00:04:11.400 The second class will retrieve addresses from the database using the address ID and letter ID, sending them to the letter sender class.

00:04:18.320 This setup is simplified, but it reflects the core functions we typically implement: integrating with other systems, fetching data from a database, and managing strings.

00:04:24.240 Let's see it in action. We store the address '45 South FS Avenue, Beverly Hills, California' with the well-known zip code '90210.' This is actually the address of the Peach Pit from the show 90210.

00:04:50.759 Now, let's send a letter to the Peach Pit. We send letter number 12, and everything looks good. No problems, everything seems fine.

00:05:00.680 Now, let's send another letter. The next address is '1675 East Altadena Drive, Altadena, California,' which is the house used on the show to represent where Brenda and Brandon Walsh live.

00:05:16.160 However, this time, instead of inputting the correct zip code, we mistakenly entered 'Walsh.' We proceed to successfully insert this incorrect address into the database.

00:05:30.400 Let's send a letter to the Walshes. The system responds that the letter has been sent, but little do we know that the address is incorrect.

00:05:41.000 Here’s where the problem lies: 'Walsh' is not a zip code, yet our system accepted it and sent the letter through without any checks during this step.

00:06:01.360 What could go wrong? The best-case scenario is that our third-party mailing service returns an error, rejecting the invalid zip code. However, in most cases, the letter will be sent to the post office, which may cause one of several issues: it could get returned, discarded, or sent to an incorrect address.

00:06:25.639 If the letter is essential, this mishap could prevent it from reaching its intended recipient, highlighting a significant bug in our setup.

00:06:46.500 You could argue that the problem originates during input collection. We’ve allowed a string to represent what should be a zip code, but not all strings qualify.

00:07:03.960 So, we might restrict inputs to only those that resemble valid zip codes. However, this doesn't resolve the issue at its core: someone might directly manipulate the database.

00:07:20.880 In reality, data imports could be handled by non-programmers, and they could inadvertently introduce faulty zip codes.

00:07:54.760 This points to a potential design flaw in our database. We could leverage database features like check constraints to ensure that only valid zip codes can be added.

00:08:10.120 While this would stop improper data at the entry point, it would not prevent direct access to our classes and allow invalid zip codes through, particularly if someone accesses our classes in an emergency.

00:08:28.960 Thus, we may need to add zip code checks everywhere in our code, leading to a maintenance nightmare where any time a zip code is involved, we must remember additional validations.

00:08:54.560 In any reasonable system, constantly checking for valid zip codes is unmanageable. So, what we need is an effective way to enforce better data integrity, ensuring we only operate on valid zip codes.

00:09:21.840 If we conceptualize our system's boundary, we notice it lacks clarity. We want to restrict what types are used across our application, but we’re currently permitting anything as strings.

00:09:46.720 The goal we aim for is to allow only zip codes. We could enhance our method accessibility by indicating specific expected types when arguments are passed, eliminating the ambiguity.

00:10:09.320 A well-defined method signature would clarify which arguments are expected, improving ease of maintenance and understanding across our applications.

00:10:29.160 In Ruby, we often opt for convenience, using hashes and untyped strings. This saves time during the exploration phase of design, allowing programmers to prototype rapidly.

00:10:49.680 While this can be advantageous, the end result is often a system that’s complicated and difficult to comprehend because the expected structures are not clearly defined.

00:11:09.840 Therefore, after determining the boundaries of our classes, we need to use data types explicitly to clarify what data must come in and out, instilling greater confidence in our system.

00:11:32.680 When we define a data type, we are establishing its values and how it should behave, which helps us model our systems more effectively.

00:11:51.000 We can better illustrate these types' behaviors and expectations through practical examples. For instance, while integers represent infinite values, a zip code only consists of five digits.

00:12:13.160 Thus, we can meaningfully define operations permitted on a data type. For example, multiplication and addition are valid for integers, but division is not applicable to strings.

00:12:34.000 Defining our data type constructs around such criteria allows us to assign meaning to the strings and clarify their purpose.

00:12:56.640 To summarize, when implemented correctly, using clearly-defined data types lets us limit the complexity of our systems.

00:13:16.000 We can represent our existing implementation of strings alongside our intended design, reinforcing the value of explicit boundaries. If only zip codes are allowed, we can safely handle systems that use this structure.

00:13:36.640 Designing a structured approach not only mitigates potential bugs but also improves our ability to develop, enhance, and adapt the application.

00:14:04.960 So, how do we create this structure? Creating a class is an effective way to establish a proper data type.

00:14:18.480 For instance, the initializer for our zip code class could accept five-digit strings, ensuring they meet the necessary criteria.

00:14:31.840 By simply introducing a validity check, we can raise an exception for invalid strings while allowing the codebase to use valid instances without further validation.

00:14:56.640 In this way, we can confidently reference zip codes throughout our system, assuring their integrity without constant checks.

00:15:11.480 In Ruby, though, it’s easy to subvert conventions, so we must ensure our implementation reflects our intentions clearly.

00:15:31.960 To retrieve the raw zip code string when needed elsewhere, we can provide an accessor method but ensure it is still a well-defined data type.

00:15:59.960 This keeps our system clean while enforcing our constraints to prevent issues down the road.

00:16:24.360 This type of work is essential during the design process, where we determine the problem we're solving and the code we’ll write to address it.

00:16:45.960 From there, we can clarify the intended boundaries of our classes and ensure they are appropriately defined, making the code easier to read and maintain.

00:17:03.800 One challenge remains: how do we communicate these expectations to other developers and ourselves?

00:17:20.520 Since Ruby’s dynamic nature doesn’t inherently define types, we rely on conventions: for example, naming variables using suffixes like 'zip_code' to indicate they are instances of our ZipCode class.

00:17:42.920 If a method name includes 'zip_code,' it probably returns a ZipCode instance. Furthermore, documenting public methods helps clarify these expectations.

00:18:05.480 Through proper consistency and documentation, we create an environment that minimizes confusion around data types and their expected behavior.

00:18:21.240 With clear documentation, developers can easily identify the types of values being passed to methods, improving overall collaboration.

00:18:38.480 Should we enforce these boundaries in our design? It’s possible, but it can add complexity.

00:19:05.000 For example, using 'is_a?' can check whether a class is an instance of another class or subclass but could complicate the understanding of the return types.

00:19:26.480 In some systems, adherents of strict type checking argue that it keeps applications more robust, while others fear it increases friction.

00:19:42.960 Ultimately, weighing the implications of this strategy against the particular needs of your application can guide your decision.

00:20:00.960 If there’s a risk involved, it’s thoughtful to carefully consider type checks, especially when it comes to handling critical data.

00:20:20.640 This is particularly relevant during significant refactorings, as thorough validations will ensure your system maintains integrity.

00:20:37.840 To sum up, developers may often find a balance between ease of use through strings and the challenges they introduce in terms of type safety.

00:20:56.320 However, adopting fundamental data types provides clarity and aids in building maintainable systems.

00:21:12.800 Finally, as we see applications requiring cleaner data structures, bridging this gap between desired design principles and implemented code is essential.

00:21:31.440 Utilizing examples like zip codes showcases how effective design can enhance overall system quality.

00:22:00.000 By implementing methods to better handle data types, we can ensure a robust system that maintains clarity and integrity.

00:22:25.040 Following these principles allows more efficient interaction with databases, making the code easier to understand and the desired outputs more predictable.

00:22:45.760 Incorporating these tighter definitions fosters confidence in our end products that rely on unique data structures.

00:23:05.320 To conclude, consider adopting strict data types in your code, enabling you to clarify intent and foster ease of understanding.

00:23:25.800 Your code benefits from being easier to read, write, and maintain, and invites fewer bugs through rigorous structure.

00:23:43.840 Thank you. Here’s a link to the slides for further reference.

00:23:53.760 Feel free to discuss job opportunities with me if you’re looking for a change.

00:24:02.640 Thank you all for your time.

David Copeland

@davetron5000

RubyConf 2014