Data Integrity

Summarized using AI

Rails Sustainable Productivity

Xavier Shay • February 02, 2012 • Burbank, CA

In the presentation "Rails Sustainable Productivity" by Xavier Shay at LA RubyConf 2012, the speaker discusses the need for Rails applications to optimize for sustainable productivity, especially as they evolve from prototypes into larger, stable applications. He critiques the Rails framework for its shortcomings and suggests improvements for better productivity within Rails development.

Key points discussed include:

- Sustainable Productivity: Shay emphasizes that Rails' self-description claims sustainable productivity, but he argues it is not fully realized in practice.

- Prototypes vs. Stable Applications: There is a distinction between quickly created prototypes and stable applications with longer lifespans that require different considerations.

- Easy vs. Simple: He differentiates between "easy" (what's readily available and known) and "simple" (components that perform a single function well) in the context of Rails, asserting that Rails is easy but not simple, complicating developers' tasks.
- Data Integrity: Shay critically evaluates how Rails handles data integrity, illustrating problems with concurrency and the lack of built-in support for essential database constraints, which can lead to corruption and redundancy in data.

Examples include:

- The issue of duplicate usernames when multiple requests are processed simultaneously, demonstrating a lack of uniqueness validation in practice.

- Problems with associations and foreign keys resulting in data integrity issues, affecting long-term project sustainability.

  • Coding Practices: The talk shifts to coding practices, where Shay discusses separating concerns in code to increase clarity and reduce complexity, advocating for better practices in representing data and clearer documentation.

  • Error Management: He mentions the impact of high error rates on productivity and suggests that code reviews, pair programming, and automated tools (like linting) are effective ways to improve code quality, which ultimately enhances team efficiency.

  • Dependency Management: Shay asserts that managing dependencies both internally and externally is crucial and illustrates how abstracting complex dependencies can lead to simpler solutions and enhancements in code clarity.

  • Testing Practices: He notes the Rails community's struggles with test-driven development, advocating for clearer definitions and discussions surrounding different types of tests to align with best practices in other programming communities.

In conclusion, Shay calls on developers to strive towards creating Rails systems that remain enjoyable to work on over time, providing suggestions aimed at improving productivity within the Rails framework. The main takeaways emphasize the importance of simplicity, data integrity, error management, documentation consistency, and effective testing practices for achieving sustainable productivity in Rails applications.

Rails Sustainable Productivity
Xavier Shay • February 02, 2012 • Burbank, CA

I've been writing Rails for near on five years, and there are some things that really grind my goat. Rails is great, but there are so many things we get wrong, both as a framework and a community. In particular, for applications that have grown beyond an initial prototype (if you earn a salary writing Rails, this is probably you), many Rails Best Practices are actively harmful to creating solid, robust, and enjoyable applications. I'll talk about testing, data modelling, code organisation, build systems, and more, drawing from a large pool of things I have seen done wrong and also personally failed at over the last half decade. Of course I'll be providing suggestions for fixing things, also.

It's more of a freight train, see.

LA RubyConf 2012

00:00:23.279 So I actually changed the name of my talk, so if you were coming and expecting a talk about Rails, sorry. I was going to do that and it was going to be really cathartic; I was going to feel great. But as I was preparing the talk, I decided that I actually wanted to be a little bit more constructive. I didn't just want to rant about Rails; I wanted to provide some suggestions for how Rails could be better.
00:00:43.320 I have all these suggestions and ideas, and as I was trying to sort of pull apart this tangled ball of ideas, I came across this common theme. This was the idea of sustainable productivity, and this actually fits really well into some of the talks we've just seen from Steven and Mike. I'm hoping to build on what they started and go into some of my ideas on it.
00:01:07.960 This idea of sustainable productivity, or these words, I actually stole directly out of the Rails self-description. If you go to the Rails homepage, this is the description that you get. The reason I picked on these two words is because I don't actually think this is something that Rails can claim yet. I don't believe that Rails is optimized for sustainable productivity, and that's sort of the hypothesis I will be presenting in this talk.
00:01:38.520 I need to make a distinction between two types of applications: prototypes, which are things we need to get out the door very quickly, and what I call hopefully stable applications. These are the larger applications, the ones with a lifespan of one, three, or five years, ones that have many developers working on them. This is the type of application I'm concerned about in this talk. The prototyping aspect is very important, but it is not my main concern today.
00:02:21.440 I want to make another distinction between easy and simple. This is a distinction that was first introduced to me in Rich Hickey's talk, 'Simple Made Easy,' at Strange Loop last year. It's one of those rare talks that sticks in my brain and never leaves. I highly recommend you watch it if you haven't. I'm not going to do the concepts justice here, but the quick summary is that simple things are those that do one thing well and are not interleaved with one another.
00:02:42.280 What’s interesting about this definition is that it's somewhat objective; you can look at a piece of code and determine whether two bits of code are intertwined or not. In contrast, easy things are defined by Rich as things that are close at hand, things we already know how to do, things we can quickly reach for and build with. However, they are not necessarily simple. My hypothesis is that Rails is easy but it's not simple.
00:03:03.360 It doesn't have to be this way. As I said, this is a constructive talk; I'm not just picking on Rails. These are things that I genuinely believe we could do better with, and this applies both to the Rails framework itself, and to the Rails community as a whole. I think we could improve our focus on sustainable productivity.
00:03:51.319 I've structured the talk into three major topics, which are interrelated, but I want to differentiate between them somewhat. I'll be discussing data, coding in the small, and coding in the large. We'll start with data, specifically how we store our data in databases.
00:04:15.480 This is the segment where I'm most critical of Rails as a framework. This is the hypothesis I am putting forward: it’s not just that Rails doesn’t quite have the tools we need; it’s that it actively discourages the tools that we should be using. I believe this is the real problem, which is why I'm bringing it up first.
00:05:01.039 Let's start with data integrity as something to focus on. There are obvious data integrity problems, like when you have corrupt data or when you are missing data for a user—things that are clearly bad and cause breakage. I’m not interested in discussing those aspects; I think that's fairly uncontroversial.
00:05:24.680 However, I am interested in two other aspects: one, that data lacking integrity requires more code to deal with it. You need extra nil checks and sanity checks in your code, which increases the complexity of your code. Two, data that lacks integrity is harder to understand. If you’re an engineer and you come across data that lacks integrity, it just doesn’t make sense, and you end up spending your time trying to figure out how it works rather than being productive.
00:05:59.160 A lack of data integrity kills sustainable productivity. If you've been doing Rails programming for a while, you’re likely familiar with this cliche example, but I'll quickly go over it for the newcomers in the room: this is a classic situation in a web-style application where concurrency is involved. We might put a unique constraint on usernames. With one process handling one request and another process asking the database for the same username, if both processes see that there’s no existing user, they both try to write to the database, and you end up with duplicate data, thereby violating your validation.
00:06:59.440 We now have invalid data in our database despite the fact that we told Rails that we didn't want to allow this. This is a fundamental class of problem; it doesn’t just happen occasionally or theoretically—this is a comprehensive issue in web applications, and we all have to address it.
00:07:31.520 How do we deal with this? We can use a unique index. Rails tells us this in the documentation, which is great. We put a unique index on the database table, but then we still must deal with this. The first few lines of this method should look familiar to you; it's how we write Rails controllers.
00:08:09.840 Now, we introduce an extra kind of validation. What happens when the database validation throws an exception? There are good answers for what goes here, but my concern is we don't have a best practice for it. This code looks odd, and that means that you, as a developer, have to figure out how to handle it. If you're spending time figuring out what to do in this situation, you're not being productive on other areas.
00:08:54.640 This is one way that Rails doesn’t provide us with the necessary best practices for managing data integrity. This example is specified in the Rails documentation, but similar problems arise throughout the Rails framework that are not adequately addressed. For instance, we have a has_one association, and you could have two child records. This is confusing, as you need a unique index. However, in my experience, people don’t put unique indexes on has_one or has_many associations.
00:09:36.600 You can see the same issue: one process creating a child record while another is deleting the parent, resulting in a data integrity issue where a child record doesn’t have a parent. The easy solution is to use a foreign key, something most database frameworks address, but as far as Rails is concerned, this feature doesn't exist.
00:10:24.800 I'm not suggesting that we go back to putting everything into our databases or to use stored procedures. However, the examples I've shown are high-risk for data corruption. Consider, for example, when you have an email address validated that has a very low risk of corruption, since it has been validated at least once, as opposed to scenarios of duplicate data or data lacking parent associations, where you can't handle them at the Rails application server level effectively.
00:11:13.080 Database constraints are the best way to handle these issues, but Rails doesn’t support these constraints adequately. In fact, it provides very primitive support, which is embarrassing. On larger projects, this can severely kill our productivity. We simply don't have the necessary support around it to back that statement up.
00:12:01.679 This is a small list of the ways in which Rails does not support database constraints. When using fixtures, they don’t get added properly to allow for foreign keys. The Ruby schema format also ignores foreign keys and check constraints. If you switch from Ruby schema to SQL schema while using MySQL, it still doesn't support dumping your foreign keys.
00:12:50.679 Migrations are another issue where many people coming from other frameworks assume that creating a reference means a foreign key, but that's not the case in Rails—it merely gives you a user ID. People often express disbelief when I explain this. Additionally, by default, it makes all your columns nullable, which is generally a poor default option.
00:13:27.240 Rails provides techniques for polymorphic relationships that are easy, but they are not simple and don’t support our data integrity. I’ll skip over the details for brevity, but in short, we can duplicate our tables rather than having a single global comment table. This breaks proper design principles.
00:14:09.440 We can certainly work with separate tables and use mixins and loops to avoid duplicate code, but if you really have to, there are other ways to aggregate comments using methods like class table inheritance. However, there are simpler options that may not be easy to implement at first but will significantly contribute to your productivity over the life of a project.
00:15:08.160 From there, I want to segue into talking about some smaller bits of code. In this part, I’m not addressing the Rails framework but discussing some techniques that I think are useful but not particularly common. First and foremost, when we talk about data, it's not just about our database data but also about how we represent data in our code.
00:15:59.360 To illustrate this, I’m going to share a piece of code. The code checks a value read from a file and checks whether it is greater than 90. I used this code for a tool I’ll demonstrate later. In an initial version, it reads a coverage percentage, checks if it’s greater than 90, and if not, it produces a violation. If there's an issue reading the file, it generates another violation.
00:16:34.240 Now, while this code is acceptable, there are two problems: one, there’s some duplication, particularly around creating violations, but more importantly, the actual problem is that this code is complex. It brings together two different operations: reading a value from a file and processing business logic, leading to intertwining that defeats the simplicity goal mentioned earlier.
00:17:17.360 Here’s how you can separate these two concerns into two simple components that can work together without being tightly coupled. The interesting part about this separation is that I have introduced the concept of an unavailable value, which may seem counterintuitive, but we are adding more code while actually simplifying our logic at the same time.
00:18:01.000 This new model clarifies things and gives us a way to document our code. This is known as reification – turning something abstract into something concrete. It provides a concrete concept that we can talk about and document. Documentation at the class level is extremely important, and we don’t see enough of it. Unfortunately, this type of class-level documentation can't be simply replaced with self-documenting code.
00:18:44.640 For example, during my first week on call at Square, we had a background job fail, and while I reviewed the job’s self-documentation, it was clear what the code did. However, I had no quick way to find out what the current job status was, whether it was urgent, or who was involved. I had to spend much of my morning talking to various team members to determine how to handle the issue.
00:19:26.720 What we really needed was a document that could outline what the job did, its implications, and the context of the business process it was aiding. With the right documentation in place, I could have responded quickly during critical situations. Consistency in documentation provides valuable information. The same principle applies to background jobs—for each background job that requires retry logic, consistency is crucial.
00:20:06.919 This inconsistency leads to confusion and diminished productivity. To tackle this, we can create a mixin for retryable jobs, ensuring every job has a consistent implementation. The additional benefit is that the specifications for retries become easier, and we can focus on one implementation rather than being tied to potentially buggy variations across jobs.
00:20:51.680 Lastly, I want to address error rates; you could be an outstanding developer with only a 1% error rate, but with ten developers on your team, the overall success rate drops to 90%. These issues tend to compound over the length of your project. Instead of referring to these as errors, a better term might be 'stupidity,' not in a derogatory sense but as in those moments when you may write some code only to look back the next day and wonder why it was done.
00:21:42.399 There are various ways to catch this, such as pair programming or code reviews, but the most effective way to address these issues is through the use of automated tools. There are plenty of problems we can identify through code, and we should fail builds for them—like if someone checks in code with bad whitespace formatting, it should be flagged.
00:22:40.720 Avoiding reliance on tooling to catch issues can lead to wasted time. Good static analysis tools are essential. For instance, using the right linting gem can detect complex methods before they get checked in. I found it beneficial to receive warnings such as, 'Are you sure you want to check this in?' before making commits.
00:23:31.600 Incorporating consistent checkers across a team can streamline processes. Consistency of styles also helps maintain a level of coherence. Discussions arise over which job to copy for retry logic, but by having a single implementation, you avoid the introduction of bugs that can occur with multiple duplicate implementations.
00:24:09.280 Now, let’s transition to another crucial topic: dependencies. As Mike pointed out, monolithic structures can be problematic but the core issue emerges from interrelated dependencies that developers may not even recognize. A true monolithic application is fine unless those dependencies are complex and scattered across the codebase.
00:25:05.000 Dependency management is crucial not just for internal dependencies, but also external ones. For example, when sending an email, it might seem simple to directly call a library without considering the intertwined dependencies you'll create. If you use a tool like Rescue everywhere to handle jobs, it can lead to a messy dependency tree.
00:25:56.720 We've made our entire system dependent on a single tool, creating limitations. Solutions should restructure your codebase. If we abstract email delivery via a method that handles the job creation, we achieve a clearer dependency model. In the given example from Square, we streamlined our email delivery by using a single delegation point to manage these interactions.
00:26:39.320 This not only simplified our interactions but also made it clearer where to add new functionalities in the future, which enhances productivity. In another project—unrelated to Square—we had a model representing articles and research, where we initially set it up with separate tables. Initially confusing, this approach hindered refactoring.
00:27:37.720 The tangled dependencies in our codebase became cumbersome, especially when we required alterations later. If we had applied the concept of an aggregate, uniting the various nodes of data while treating the cluster as a unit, we could have avoided a lot of this confusion.
00:28:25.000 I encourage you to familiarize yourself with the concept of aggregates, which keeps dependencies more manageable by limiting how we interact with the various components in the system—talking only to a single point rather than navigating each subcomponent directly. Moreover, avoid default behaviors that might lead to unnecessarily complex associations, like always adding a belongs_to clause.
00:29:25.840 Another aspect I want to highlight before I wrap up is testing. The Rails community struggles with test-driven development; while we’re not bad at writing tests, we don’t consistently prioritize test-first approaches. The language we use to describe tests must evolve as well.
00:30:20.720 Currently, our terms for unit tests, functional tests, and integration tests suffer from a lack of proper definitions. Throughout the Rails community, these distinctions are blurred, but many other frameworks have defined these clearly. We should be able to discuss acceptance tests, integration tests, and unit tests in ways that align more closely with their foundational meanings.
00:30:59.520 Building a clearer lexicon around tests allows us to draw upon strategies and practices developed in other programming communities. I’ve covered a lot today, and if you find the slides later, they’re full of links to further reading. One quote that resonated with me was about how we assume code rots over time; however, we should aspire to create systems that continue to become more enjoyable to work on as they develop over time. I don’t have all the answers yet, but I’m actively working on it, and I hope you will be too. Thank you very much!
Explore all talks recorded at LA RubyConf 2012
+6