Talks

Rails anti-patterns: how not to design your database

Rails anti-patterns: how not to design your database

by Brad Urani

In his talk titled "Rails anti-patterns: how not to design your database" at EuRuKo 2018, Brad Urani delves into common pitfalls encountered in database design, particularly within Ruby on Rails applications. He emphasizes that the selection of design patterns is often situational and encourages careful consideration of various contexts when developing database structures.

Key points discussed include:

- Introduction to Rails Anti-Patterns: Urani explains that while certain patterns may be labeled "anti-patterns," they can be acceptable if used in the right context.
- Naive Trees: A naive tree structure, while space-efficient, can lead to complications with complex queries. Retrieving descendants typically requires multiple joins, complicating performance due to potential N+1 query problems.
- Closure Trees: This alternative is much more robust than naive trees; it simplifies querying by referencing every descendant for each parent, requiring fewer joins and offering clearer data manipulation, supported by a Ruby gem that aids in its implementation.
- Materialized Paths: Another approach discussed involves maintaining a path column to simplify descendant queries. Though it sacrifices some referential integrity, it provides faster performance and greater flexibility, albeit subject to character length limitations.
- Polymorphic Relationships: While useful for flexibility, Urani warns that polymorphic relationships can compromise data integrity, especially in industries like finance and healthcare. He suggests structured designs that utilize exclusive arcs or parent tables to maintain referential integrity.
- Developer Considerations: Urani underscores the importance of balancing developer happiness with performance, noting that as applications scale, thoughtful database design becomes crucial.
- Conclusion and Recommendations: He concludes that the decision between performance and simplicity, or integrity and rapid development, must align with the specific needs of the application. Urani recommends reading "Trees and Hierarchies in SQL for Smarties" for those interested in deeper insights into SQL design patterns.

The talk provides practical advice for Rails developers facing database design challenges, reinforcing that every design choice should be evaluated based on context and specific use cases.

00:01:20.960 We are going to get started. This is a bit louder, but it worked, so whatever works!
00:01:29.060 I need you to be super quiet because I have an announcement to make. Hi, my name is Andrew, and I've been doing a great job capturing this conference in written form.
00:01:38.660 I am absolutely awesome! Please give me a round of applause!
00:02:08.960 Nice! All right, we have two talks coming up, then there’s a break, and then we have one more talk coming up. This will be followed by the closing notes from Ramona and Pilar, and then, of course, the after-party starts at 8:00 tonight.
00:02:19.550 So, our next speaker is Brad. I’ve heard that he is quite the karaoke singer, a barbecue enthusiast, and a coder. Who would have expected that? He has about 15 years of experience with SQL under his belt, so if you are about to embark on designing your next database challenge, he is probably the best to help you with that.
00:03:01.569 Hi! How are you doing? Guten Tag! Did I say that right? I’m very happy to be here in Vienna. This is my second time; I was here back in 2003. I could not be more thrilled. I flew in from Los Angeles last night just to speak to you all, and this has been a lovely time.
00:03:12.860 Thank you to our organizers and everyone who set up this conference and invited me up here. My name is Brad Urani; I am a staff engineer at Procore in Santa Barbara, California, and this talk is about database patterns, specifically Rails anti-patterns or relational anti-patterns.
00:03:18.290 I change the title depending on who my audience is, but it’s the same talk: Rails anti-patterns—how not to design your database. So, warning: tough decisions ahead. I’m going to give you some patterns you can use in your database, but realize that I cannot give you hard and fast rules about what to use where. I hope to provide you with tools that you can add to your toolkit.
00:03:38.180 Ultimately, making decisions about design patterns is challenging and very specific to your use case. At the end of this, remember there are still choices to be made. I work for Procore, which provides software for the construction industry. We create software for people building skyscrapers and giant buildings like this.
00:04:04.730 This is an example of one of our iPad apps; it’s a blueprint management app for construction plans. Here’s a gentleman out in the field using it. The construction industry has a similar problem to software in that most workers are men, but we are implementing programs to increase gender diversity in construction.
00:04:24.770 We have modeled these programs after some Rails initiatives like Rails Girls, and we’re proud of them. We also support the Ruby community. We sponsored RailsConf and were diamond sponsors for the last two years. Thank you!
00:04:36.800 We use Postgres SQL at a very large scale—60 million rows scanned per second on our primary database instance. That’s pretty big! I gave this talk at PostgresConf in New Jersey City, and even the Postgres people were somewhat impressed. We handle about fifty thousand transactions per second, which is probably an even more impressive number—these include both reads and writes.
00:05:11.449 This involves an implicit transaction with a 14 terabyte database. It means we have to handle SQL extremely well, which also necessitates that we design our database very well. Mistakes in database design can be very costly when you're running at this scale.
00:05:36.159 For example, this is from Procore; it’s a construction budget. When you load a page like this, it can run 20 to 30 SQL queries just to render. That requires carefully tuning those SQL queries to ensure optimal performance while maintaining developer happiness.
00:05:54.560 We have a large team of Rails developers—dozens of them—and it is one of the largest Ruby on Rails apps in the world with over a million lines of Ruby, close to two million lines currently. This means not only is there a SQL problem but also a Ruby on Rails problem, which necessitates our database patterns to effectively work with our tooling.
00:06:12.440 In our case, our tooling is Ruby on Rails and Active Record. We must ensure that the patterns we choose work well with SQL queries that are both efficient and easy for our developers to use.
00:06:29.890 Let's talk about anti-patterns. Today, I will address two major anti-patterns: naive trees and polymorphic relationships. But first, a little disclaimer—there are no actual anti-patterns. An anti-pattern is only classified as such if you use it in a context where it is not the best choice.
00:07:04.159 Every pattern has a scenario where it may be the most suitable option. Even my so-called anti-patterns can be the best choice if they fit your use case well. Trees, for instance, can be tricky to work with in databases. They represent structures like a file system or an organizational chart.
00:07:40.030 If you're representing a hierarchy, such as a file system, a comment thread, or an org chart, you may use a tree structure. However, naive trees, the simplest way to implement a tree in databases, can also be the trickiest to query effectively.
00:08:05.310 In a naive tree, you typically have a primary key ID and a foreign key for the managerial hierarchy. This structure is space-efficient but makes queries to get descendants exceedingly complicated, as this requires multiple joins to retrieve deeper levels of the tree structure.
00:08:40.840 Thus, getting a person's descendants might involve multiple complex joins, with unclear depth of recursion at the time of writing your query. As a result, this approach can become inefficient very quickly.
00:09:08.790 Alternatively, starting at the top and using separate queries to get descending generations can quickly become an N+1 query issue, which can bog down performance and exhaust database connections.
00:09:35.110 Another option could be recursion, but writing recursive SQL can be challenging as it lacks readability and is not well-supported in ORM frameworks like Active Record. This request can become even more difficult for developers unaccustomed to recursive SQL.
00:10:12.610 Even if you manage to implement recursive SQL, additional queries are often necessary to display it correctly on a webpage, adding to inefficiencies. Also, handling deletions in a tree structure can be particularly complex; you have to adjust references throughout the hierarchy, which can lead to significant overhead.
00:11:00.520 In summary, naive trees have severe limitations when dealing with complex queries, rendering them suboptimal for deep trees or when frequent manipulation is required. It’s critical to find an alternative that meets your performance needs.
00:12:01.239 However, if what you're working with is relatively simple, a naive tree structure may prove adequate. It boasts referential integrity, is space-efficient, and may yield usable results with caching depending on your use case.
00:12:41.310 A closure tree is a more robust alternative to naive trees. In a closure tree, you create a table referencing every descendant for each parent, allowing for quick retrieval of all descendants with a single join.
00:13:07.509 This approach reduces the complexity of queries, requiring fewer joins and allowing easier on-the-fly data manipulation, significantly enhancing developer experience and ensuring good performance.
00:13:38.990 There is a popular Ruby gem called Closure Tree that implements this pattern in your application, helping automate the creation of necessary database queries and structures, which simplifies database interactions and maintenance.
00:14:24.220 Another alternative to represent trees is using a materialized path. In this approach, you maintain a path column that contains the entire hierarchy for each node as a string, making descendant queries very straightforward with a single lookup.
00:15:04.020 Though this approach sacrifices some referential integrity because it cannot enforce foreign key constraints, it typically boasts better performance compared to closure trees, especially for certain types of queries.
00:15:55.460 The materials path may also pose limitations in terms of depth; it is subject to character length limitations which could restrict more complex hierarchies. However, for speed and flexibility, it can stand as a solid option.
00:16:38.800 When discussing anti-pattern number two, polymorphic relationships, many developers may be familiar with this pattern from Rails. It involves having a foreign key that can refer to multiple tables based on an extra column that specifies the table type.
00:17:28.820 While it can be flexible and useful in certain contexts, it is often misused for cases where enforcing data integrity is paramount, such as financial or medical data management, where foreign key constraints become essential.
00:18:10.610 A well-structured design that utilizes exclusive arcs or separate parent tables often proves to be the better choice when dealing with critical data, ensuring referential integrity compared to the limitations of polymorphic relationships.
00:18:53.070 Using a parent table and a commentable middle table can assist in structuring complex many-to-one relationships by allowing for shared attributes across diverse data types while retaining the ability to enforce referential integrity.
00:19:49.840 In conclusion, if you face a wide variety of parent types, polymorphic relationships can provide a useful solution. However, maintaining referential integrity may offer superior peace of mind in scenarios with fewer parent entities.
00:20:29.390 Ultimately, whether you choose to go for performance or simplicity, referential integrity, or rapid development—it's crucial to evaluate your application's needs and the trade-offs you’re making with each decision.
00:21:16.530 Congratulations! You've made it through the talk! I have a few book recommendations for those interested in deeper insights into SQL hierarchy and design patterns. One is 'Trees and Hierarchies in SQL for Smarties,' which covers numerous patterns and anti-patterns.
00:22:01.810 I appreciate your attention today! I’m Brad Urani; please feel free to reach out to me on Twitter or LinkedIn. I work for Procore, located in beautiful Southern California. We are hiring for various roles, including Ruby developers, Android/iOS engineers, security experts, and machine learning specialists. Come join us!
00:23:26.940 So quick applause for Andrew too for such a quick job of taking down the talk. There’s a case of a missing laptop; it’s so exciting!
00:23:40.670 Please come back for the last talk of the day at half-past four, followed by the closing notes. The next speaker is Coraline; she is a well-known writer and speaker.