00:00:18.240
I'm Dave Copeland, otherwise known as Dave from 5000 on Twitter. I wrote this book about how to be a good software engineer, and I wrote this other book about how to build awesome command-line applications in Ruby.
00:00:24.599
While these topics aren't directly applicable to our talk, the concepts we'll learn here will help us build better applications and become better programmers.
00:00:30.320
The title of this talk is 'Overcoming Our Obsession with Stringly Typed Ruby.' So, what do I mean by 'stringly typed'?
00:00:35.920
Has anyone heard of this term before? Great. If you've ever read anything by Jeff Atwood, who's also known as Coding Horror on Twitter, then you might be familiar with it.
00:00:41.039
He writes a popular programming blog and has a fantastic post discussing various programmer jargon, including the amusing phrase 'stringly typed.'
00:00:47.480
He describes it as a riff on 'strongly typed,' used to refer to implementations that unnecessarily rely on strings when more programmer-friendly options are available.
00:01:04.680
We use strings all the time; input often comes as strings, and output typically needs to be in string format.
00:01:10.600
Frameworks like Rails love strings, and even databases often use strings. However, relying on them too much can lead to problems.
00:01:16.439
Let me give you a motivating example using zip codes. If you're not from the United States, you might know them as postal codes. Essentially, they are five-digit codes that help the post office sort mail correctly.
00:01:38.399
A zip code isn't just a number; it looks like one, but it always consists of five digits, even if it begins with zeros. For example, zip codes in Puerto Rico start with 009.
00:01:53.520
These codes correspond to specific geographical areas, and while you might notice patterns in the numeric values, like lower numbers on the East Coast compared to the West Coast, zip codes largely serve as arbitrary strings of numbers.
00:02:08.160
Considering this, let's examine a very simple application designed to send letters to people. We have a database of addresses and a third-party API to help us mail notifications, like account overdrawn messages or refund notifications.
00:02:38.920
Our role as programmers is to connect these two systems and create a service that fulfills our unique needs.
00:02:51.840
Now, here’s how the Ruby API provided by the third-party mailing service works. We configure a series of letters we want to send, each identified by a unique letter ID.
00:03:07.680
To send a letter, we find it using its ID, which returns a class containing a mail method. This method takes four parts of an address: street, city, state, and zip code, all as strings.
00:03:15.000
Calling this method will mail the letter to the provided address. To build our application, we create a database of addresses and store them as strings, which is typical.
00:03:27.319
Next, we have some code that reads address information from the command line and stores it in the database. For this talk, we can assume our database is a simplified way of managing data, so we won't be overly concerned with the database layer.
00:03:35.239
We can utilize this code to add new addresses to our database. To actually send letters, we will define two classes.
00:03:45.319
The first class wraps the third-party mailing service and makes it easier for our system to handle addresses as hashes.
00:03:58.440
This letter sender class will take the letter ID and the address hash, handling the necessary conversions internally.
00:04:11.400
The second class will retrieve addresses from the database using the address ID and letter ID, sending them to the letter sender class.
00:04:18.320
This setup is simplified, but it reflects the core functions we typically implement: integrating with other systems, fetching data from a database, and managing strings.
00:04:24.240
Let's see it in action. We store the address '45 South FS Avenue, Beverly Hills, California' with the well-known zip code '90210.' This is actually the address of the Peach Pit from the show 90210.
00:04:50.759
Now, let's send a letter to the Peach Pit. We send letter number 12, and everything looks good. No problems, everything seems fine.
00:05:00.680
Now, let's send another letter. The next address is '1675 East Altadena Drive, Altadena, California,' which is the house used on the show to represent where Brenda and Brandon Walsh live.
00:05:16.160
However, this time, instead of inputting the correct zip code, we mistakenly entered 'Walsh.' We proceed to successfully insert this incorrect address into the database.
00:05:30.400
Let's send a letter to the Walshes. The system responds that the letter has been sent, but little do we know that the address is incorrect.
00:05:41.000
Here’s where the problem lies: 'Walsh' is not a zip code, yet our system accepted it and sent the letter through without any checks during this step.
00:06:01.360
What could go wrong? The best-case scenario is that our third-party mailing service returns an error, rejecting the invalid zip code. However, in most cases, the letter will be sent to the post office, which may cause one of several issues: it could get returned, discarded, or sent to an incorrect address.
00:06:25.639
If the letter is essential, this mishap could prevent it from reaching its intended recipient, highlighting a significant bug in our setup.
00:06:46.500
You could argue that the problem originates during input collection. We’ve allowed a string to represent what should be a zip code, but not all strings qualify.
00:07:03.960
So, we might restrict inputs to only those that resemble valid zip codes. However, this doesn't resolve the issue at its core: someone might directly manipulate the database.
00:07:20.880
In reality, data imports could be handled by non-programmers, and they could inadvertently introduce faulty zip codes.
00:07:54.760
This points to a potential design flaw in our database. We could leverage database features like check constraints to ensure that only valid zip codes can be added.
00:08:10.120
While this would stop improper data at the entry point, it would not prevent direct access to our classes and allow invalid zip codes through, particularly if someone accesses our classes in an emergency.
00:08:28.960
Thus, we may need to add zip code checks everywhere in our code, leading to a maintenance nightmare where any time a zip code is involved, we must remember additional validations.
00:08:54.560
In any reasonable system, constantly checking for valid zip codes is unmanageable. So, what we need is an effective way to enforce better data integrity, ensuring we only operate on valid zip codes.
00:09:21.840
If we conceptualize our system's boundary, we notice it lacks clarity. We want to restrict what types are used across our application, but we’re currently permitting anything as strings.
00:09:46.720
The goal we aim for is to allow only zip codes. We could enhance our method accessibility by indicating specific expected types when arguments are passed, eliminating the ambiguity.
00:10:09.320
A well-defined method signature would clarify which arguments are expected, improving ease of maintenance and understanding across our applications.
00:10:29.160
In Ruby, we often opt for convenience, using hashes and untyped strings. This saves time during the exploration phase of design, allowing programmers to prototype rapidly.
00:10:49.680
While this can be advantageous, the end result is often a system that’s complicated and difficult to comprehend because the expected structures are not clearly defined.
00:11:09.840
Therefore, after determining the boundaries of our classes, we need to use data types explicitly to clarify what data must come in and out, instilling greater confidence in our system.
00:11:32.680
When we define a data type, we are establishing its values and how it should behave, which helps us model our systems more effectively.
00:11:51.000
We can better illustrate these types' behaviors and expectations through practical examples. For instance, while integers represent infinite values, a zip code only consists of five digits.
00:12:13.160
Thus, we can meaningfully define operations permitted on a data type. For example, multiplication and addition are valid for integers, but division is not applicable to strings.
00:12:34.000
Defining our data type constructs around such criteria allows us to assign meaning to the strings and clarify their purpose.
00:12:56.640
To summarize, when implemented correctly, using clearly-defined data types lets us limit the complexity of our systems.
00:13:16.000
We can represent our existing implementation of strings alongside our intended design, reinforcing the value of explicit boundaries. If only zip codes are allowed, we can safely handle systems that use this structure.
00:13:36.640
Designing a structured approach not only mitigates potential bugs but also improves our ability to develop, enhance, and adapt the application.
00:14:04.960
So, how do we create this structure? Creating a class is an effective way to establish a proper data type.
00:14:18.480
For instance, the initializer for our zip code class could accept five-digit strings, ensuring they meet the necessary criteria.
00:14:31.840
By simply introducing a validity check, we can raise an exception for invalid strings while allowing the codebase to use valid instances without further validation.
00:14:56.640
In this way, we can confidently reference zip codes throughout our system, assuring their integrity without constant checks.
00:15:11.480
In Ruby, though, it’s easy to subvert conventions, so we must ensure our implementation reflects our intentions clearly.
00:15:31.960
To retrieve the raw zip code string when needed elsewhere, we can provide an accessor method but ensure it is still a well-defined data type.
00:15:59.960
This keeps our system clean while enforcing our constraints to prevent issues down the road.
00:16:24.360
This type of work is essential during the design process, where we determine the problem we're solving and the code we’ll write to address it.
00:16:45.960
From there, we can clarify the intended boundaries of our classes and ensure they are appropriately defined, making the code easier to read and maintain.
00:17:03.800
One challenge remains: how do we communicate these expectations to other developers and ourselves?
00:17:20.520
Since Ruby’s dynamic nature doesn’t inherently define types, we rely on conventions: for example, naming variables using suffixes like 'zip_code' to indicate they are instances of our ZipCode class.
00:17:42.920
If a method name includes 'zip_code,' it probably returns a ZipCode instance. Furthermore, documenting public methods helps clarify these expectations.
00:18:05.480
Through proper consistency and documentation, we create an environment that minimizes confusion around data types and their expected behavior.
00:18:21.240
With clear documentation, developers can easily identify the types of values being passed to methods, improving overall collaboration.
00:18:38.480
Should we enforce these boundaries in our design? It’s possible, but it can add complexity.
00:19:05.000
For example, using 'is_a?' can check whether a class is an instance of another class or subclass but could complicate the understanding of the return types.
00:19:26.480
In some systems, adherents of strict type checking argue that it keeps applications more robust, while others fear it increases friction.
00:19:42.960
Ultimately, weighing the implications of this strategy against the particular needs of your application can guide your decision.
00:20:00.960
If there’s a risk involved, it’s thoughtful to carefully consider type checks, especially when it comes to handling critical data.
00:20:20.640
This is particularly relevant during significant refactorings, as thorough validations will ensure your system maintains integrity.
00:20:37.840
To sum up, developers may often find a balance between ease of use through strings and the challenges they introduce in terms of type safety.
00:20:56.320
However, adopting fundamental data types provides clarity and aids in building maintainable systems.
00:21:12.800
Finally, as we see applications requiring cleaner data structures, bridging this gap between desired design principles and implemented code is essential.
00:21:31.440
Utilizing examples like zip codes showcases how effective design can enhance overall system quality.
00:22:00.000
By implementing methods to better handle data types, we can ensure a robust system that maintains clarity and integrity.
00:22:25.040
Following these principles allows more efficient interaction with databases, making the code easier to understand and the desired outputs more predictable.
00:22:45.760
Incorporating these tighter definitions fosters confidence in our end products that rely on unique data structures.
00:23:05.320
To conclude, consider adopting strict data types in your code, enabling you to clarify intent and foster ease of understanding.
00:23:25.800
Your code benefits from being easier to read, write, and maintain, and invites fewer bugs through rigorous structure.
00:23:43.840
Thank you. Here’s a link to the slides for further reference.
00:23:53.760
Feel free to discuss job opportunities with me if you’re looking for a change.
00:24:02.640
Thank you all for your time.