Data Integrity
A Survey of Surprisingly Difficult Things

Summarized using AI

A Survey of Surprisingly Difficult Things

Alex Boster • April 25, 2017 • Phoenix, AZ

In his presentation at RailsConf 2017, Alex Boster explores the intricate complexities surrounding seemingly simple real-world concepts that developers frequently encounter. He emphasizes that commonplace tasks like handling time, currency, and human names can reveal unexpected difficulties if not managed carefully. Boster outlines the following key points throughout his talk:

  • Time Management: Boster explains that while timestamps may seem straightforward, time zones complicate matters significantly. He highlights issues such as differences in daylight savings time and the many time zones that exist globally. Boster advises developers to store time values in UTC and use Rails’ time zone-aware methods to avoid confusion.

  • Date Handling: The speaker contrasts dates with times, stressing the importance of only using date formats when the specific time of day is not relevant. Mismanagement in this area can lead to errors in applications.

  • Human Names: Boster discusses the fallacies developers often have regarding names, such as the assumption that names are fixed and unchanging, or that all names can be represented using ASCII characters. He advises against over-validating names, advocating instead for a more flexible approach to accommodate diverse naming conventions.

  • Physical Addresses: Address modeling is addressed next, where Boster notes the significant variation in address formats even within the United States. He cautions against relying strictly on conventional validation methods and suggests using services like those provided by the US Postal Service to standardize data where possible.

  • Financial Data: Currency management is another crucial topic. Boster recommends using integers for all monetary calculations instead of floats to avoid rounding issues and inaccuracies in financial representations, particularly when interfacing with JavaScript.

  • Email Validation: He also touches on the pitfalls of over-validating email addresses and the importance of accepting diverse formats that may not align with strict validation rules.

  • Internationalization: Though he does not delve deeply into this topic, Boster encourages developers to think about internationalization from the onset by storing hard-coded strings in locale files, which can ease future translations.

  • Recurrence in Events: Lastly, Boster notes the complexities of modeling recurring events in software applications and the potential for error when handling such features.

In conclusion, Boster's main takeaways revolve around the need for awareness of the global and cultural diversity inherent in software design. He urges developers not to make assumptions based on their own experiences, adopt inclusive coding practices, and remain vigilant in thoroughly understanding these deceptively complex elements.

A Survey of Surprisingly Difficult Things
Alex Boster • April 25, 2017 • Phoenix, AZ

RailsConf 2017: A Survey of Surprisingly Difficult Things by Alex Boster

Many seemingly simple "real-world" things end up being much more complicated than anticipated, especially if it's a developer's first time dealing with that particular thing. Classic examples include money and currency, time, addresses, human names, and so on. We will survey a number of these common areas and the state of best practices, or lack thereof, for handling them in Rails.

RailsConf 2017

00:00:12.679 Alright, let's get started! I can't see very well out there, so please shout if you feel the need to ask questions or interrupt.
00:00:18.300 How's the conference for everybody so far? Good? Alright, excellent! This thing's not going to work apparently, so I'm Alex Boster. I work for AppFolio, which is a company in Santa Barbara, and I work in their San Diego engineering office.
00:00:31.080 We're hiring, so come talk to me! The last project I would have said inspired this talk, but really it's also the culmination of the kind of experience you get after a few years of doing web development.
00:00:43.649 So, this is a survey of surprisingly difficult things. Now, what do I mean by 'things'? I mean commonplace issues that you encounter in your day-to-day life.
00:00:57.809 They seem easy to model at first, and they seem great, but then it turns out that they can actually be a lot harder than anticipated. Things where the obvious implementation may very well cause problems.
00:01:11.670 These are things like timestamps, time zones, physical addresses, and human names. Now, when I mention these terms, what do you think? Does this sound easy? These are solved problems, right? Easy, no problem, no complications.
00:01:22.770 What I'm not going to talk about today are cache invalidation, distributed systems, or other surprisingly difficult things; instead, I'll focus on real-world scenarios. One of the main reasons I wanted to give this talk is that developers fall into these traps all the time.
00:01:51.570 We spend months cleaning up old buggy code, only for new bugs of the same kind to be introduced six months later by other developers. So with a laundry list of things to watch out for, hopefully, you won't fall into that trap.
00:02:05.969 Even very senior developers can benefit from an occasional reminder. If you're not a senior developer and haven't dealt with these issues much before, I hope this talk saves you some time in the future.
00:02:18.540 Another good point is that if you follow these best practices, you will likely be more inclusive. I know that when I start dealing with real-world challenges, it might drive me to drink; it could do the same for you.
00:02:33.880 So, let's start with time. There are a bunch of time and date classes available to you. The only one I really want to draw attention to is for reference later. There's no good cross-system standard for dealing with time.
00:02:49.750 It's different in every database, and it varies from Ruby to those databases. So pay a little attention to that and check out Active Support Duration. But what makes time actually hard to deal with is the time zones.
00:03:06.190 Isn't this a solved problem? You can use a sufficiently large integer to represent seconds or fractions of a second, and now you have a time value, right? No problems? Let's all go home! Well, the problem is with time zones.
00:03:41.640 How many time zones do you think exist? A few dozen? You might be right at a superficial level, but the time zone database defines three hundred and eighty-five time zones, with an additional hundred and seventy-six internal links that are aliases for different names.
00:04:47.800 Remember that there are half-hour time zones, quarter-hour time zones, and daylight savings time to account for. Places may start observing daylight savings time that previously didn’t, like Arizona, which currently does not observe it. Regions may change their schedule, as the entire United States did ten years ago.
00:05:13.169 There have been cases of temporary daylight savings time changes, like double summer time in the UK, or a place may simply switch time zones entirely. The time zone database I mentioned is used in many Unix-like systems and tracks all geographic time zones since 1970.
00:05:40.599 They define an area as a location where any two places share the same time zone. So, before the 1970s, this database didn’t pay much attention, but they do have historical data and account for daylight savings time changes.
00:06:28.610 The example of this database is updated several times a year. If you think this stuff is static, it's not. Just recently, commute changes happened, such as Mongolia no longer observing daylight savings time.
00:07:09.790 These changes are quite illustrative. Ecuador switched to observing daylight savings time on a particular day, and other areas had adjustments too. So, just remember that this stuff is complicated, and thank goodness someone’s keeping track of it.
00:07:23.800 Interesting trivia: how many time zones are in the continental United States? I count at least six! We use UTC to standardize events regardless of what you actually call that time in a particular place.
00:07:46.210 Organized by Coordinated Universal Time (UTC), which is not a time zone, but every time zone has an offset from UTC. As a rule, you should store your time values in UTC. What are possible offsets from UTC? Well, keep in mind that in 1995, Kiribati moved some of its islands from -10 to +14, creating quite a few unique offsets.
00:08:46.430 Without a specified time zone, any time value lacks context and could fall within a range of twenty-six hours with possible half or quarter-hour increments. How do you manage this? If you don't explicitly provide a time zone, the time can be interpreted using the operating system’s default, the database’s default, or the application’s default time zone.
00:09:20.000 Before I proceed, keep in mind to keep your system and database time in UTC. Rails will store its date-times in UTC, and time zone aware methods in Rails will use the application’s default unless you provide one explicitly.
00:10:08.510 For example, if you have users, make sure to store their time zone in the user model and always use that in your views if you care about when events occur for those users. Always prefer the time zone methods available in ActiveSupport.
00:10:41.080 For various time-related tasks, it's important to use the Rails classes rather than raw Ruby time. This way, the conversion is handled for you, and you get consistent output. Here are some methods you should be using: 'hours from now', 'days ago', and so forth. Always remember to use 'in time zone' methods to avoid confusion.
00:11:39.370 Dates are a bit simpler since they don't carry a time zone, but you need to be wary of whether you should be storing something as a date or a timestamp. Ask yourself if it matters what time of day it is. This may seem basic, but people make this mistake regularly.
00:12:07.470 For example, a birthday is a date that occurs on a special day, regardless of the time the person was born. Calendar events, like holidays, are similar, where the occurrence is solely based on the date.
00:12:48.030 So, don't store dates as date-times if you want to avoid complications! Be wary of converting back and forth unless you really need to, such as when switching an all-day event to a timed one.
00:13:12.620 Be sure to use 'date.current' for current date identification, as using 'time.now' can yield context issues based on the server's time zone.
00:13:49.370 When it comes to human names, people often have complex naming situations. For example, the falsehoods programmers believe about names include things like assuming everyone has one full name or that names are assigned at birth.
00:14:48.450 In reality, people can change their names, use different characters in their names, and so forth. Thus, I encourage you to validate names as little as possible and consider avoiding the first-name last-name formats whenever feasible!
00:15:40.800 Moving on to addresses, even in the United States, there is more variation in modeling those than you might expect. There are rural routes, military addresses, and areas such as Puerto Rico that have their own unique structures.
00:16:29.680 You might encounter addresses with slashes, dashes, or other special characters, and some cities have names with apostrophes. When validating addresses, don't be too rigid because physical addresses may not always align with postal addresses.
00:17:26.500 For instance, in some wealthier areas, deliveries might just go to a post office instead of residential addresses, complicating things for shipping companies.
00:18:11.720 Regarding money, don’t store values as floats. Use integers wherever possible to avoid rounding errors and only convert for display when you need to.
00:19:05.730 Email addresses are tricky. Many systems try to adhere to standards, although it is generally accepted that any address with an @ symbol is valid. Furthermore, Gmail users can add ‘+’ to their usernames, generating infinite variants of their email when testing.
00:19:49.700 Internationalization is a massive topic, but if you're starting a new application, consider putting hardcoded strings in your config locales from the beginning. It's easier to manage copy changes and allows non-developers to make edits directly.
00:20:28.420 When handling payments and credit cards, be sure to follow PCI compliance. Don’t store credit card information yourself; use services that allow you to pass tokens instead, and make sure to account for all potential timeouts in your system.
00:21:11.900 Recurring calendar events need to be treated thoughtfully, acknowledging that many recurring events might not have a specific end date. The rules for these events can be complex, and you need to handle instances individually.
00:22:09.860 In conclusion, remember not to over-validate. There's often this impulse to check values excessively, but keep in mind that every user base may have different expectations and contexts.
00:23:05.830 Be culturally aware—your experience isn't universal, and not everyone fits into rigid categories. Finally, thank you for your attention, and here are some references for further reading.
Explore all talks recorded at RailsConf 2017
+109