Talks

A custom design pattern for building dynamic ActiveRecord queries

A custom design pattern for building dynamic ActiveRecord queries

by Justin Daniel

In the presentation titled "A Custom Design Pattern for Building Dynamic ActiveRecord Queries" given by Justin Daniel at RailsConf 2023, the focus is on overcoming the challenges developers face in writing and maintaining performant, dynamic, and readable database queries using ActiveRecord in Rails applications. This session proposes a unique design pattern that leverages abstraction to enhance the construction of queries without relying on external libraries.

Key Points Discussed:

  • Importance of Composable Queries: Composable and maintainable queries are essential for meeting diverse data output demands across various platforms, including web pages and third-party integrations like Salesforce. Developers encounter difficulties ensuring these queries are efficient and user-friendly.
  • Challenges with Legacy Code: Relying on SQL concatenation results in cumbersome code that lacks readability and maintains complexity. Daniel illustrates this with a legacy code example totaling over 230 lines, which is hard to maintain and alters the comprehension of business logic.
  • Design Pattern Overview: Daniel introduces a custom design pattern using the Builder and Visitor patterns to create queries that follow object-oriented programming principles. This approach promotes modularity and readability by breaking queries down into smaller, manageable components, akin to Lego blocks.
    • Comparison of Query Tools: The speaker contrasts raw SQL, vanilla ActiveRecord, and external libraries like Aero tables, highlighting the pros and cons of each. While raw SQL is highly performant, it sacrifices readability and maintainability. Simple ActiveRecord queries are more comprehensible but struggle with complex conditions, and Aero tables, while effective, introduce dependency concerns and complex syntax.
  • Builder Pattern Implementation: The presenter delves into the mechanics of the query Builder and Visitor design pattern, showcasing how to encapsulate public interfaces within query classes, manage dependencies, and utilize a reducing method to construct ActiveRecord relations effectively. The pattern permits elegant handling of conditional logic through nodes that self-validate against user-driven filters.
  • Nested Nodes and Complex Queries: The capability to nest nodes allows for advanced filtering, which can significantly enhance the flexibility of queries, reviewing an example of creating conditions for legendary characters in a tabletop RPG.

Conclusion and Takeaways:

  • The Builder pattern enhances object-oriented programming principles by allowing developers to tell a story with their code, fostering better maintainability and comprehension.
  • Daniel encourages developers to adopt this powerful abstraction as it enables creating dynamic queries while avoiding the pitfalls of legacy SQL concatenation. He emphasizes the balance between performance and readability as a key goal in query design.
  • In closing, Daniel invites the audience to explore the GitHub repository shared during the talk, which provides resources to implement the Builder pattern in their applications.

Overall, the presentation aims to inspire Rails developers to explore new querying methodologies that improve developer efficiency and application performance.

00:00:18.539 I think I'm going to go ahead and get started. It's so cool seeing all these people come to my talk.
00:00:24.600 Thank you so much. Uh, can you hear me clearly in the back?
00:00:29.640 Excellent. Thank you, technician. All right, my name is Justin Daniel.
00:00:36.840 I'm a senior software engineer with G2, and I have many of my colleagues and teammates sitting in the front here with me.
00:00:44.219 Today, I will be presenting a custom design pattern for building dynamic ActiveRecord queries.
00:00:50.640 Thank you so much for being here. Just a note: I did build a demonstration repository with the help of fellow engineer John Paul Hill.
00:00:57.719 The GitHub link is on the slide, along with the QR code. In this talk, I will be using PowerPoint slides to summarize the main points.
00:01:04.799 We will also be working in the code editor and console to really dig in and understand this query pattern.
00:01:10.799 So, my agenda for today is as follows.
00:01:17.580 I will discuss the big picture and why composable queries are so important.
00:01:24.540 We will compare different tools for composing queries, including raw SQL, ActiveRecord, and Arrow tables.
00:01:31.380 Then, we will go over our own domain-specific language and introduce the query builder and visitor pattern that we've created at G2.
00:01:37.640 Finally, I'll have time for questions and answers. My big goal here is to inspire you to use this builder pattern.
00:01:50.159 I want to not only provide you with an understanding of it but convince you that this is a tool you should try out.
00:01:55.979 I believe our demonstration repository really gives you everything you need to get started.
00:02:01.320 So, let's start with the big picture here.
00:02:07.079 Being a big data company is challenging. Serving all of your customers' needs can feel like pushing a big boulder up a hill.
00:02:20.640 There are three challenges I want to highlight. First, we need to surface our data in many different systems, such as live web pages, CSV downloads, and email notifications.
00:02:32.400 We also have third-party partnerships, such as with Salesforce, where we need to present our data.
00:02:40.980 Our users want data in a customizable manner; they want to include just the fields they are interested in while filtering out irrelevant signals.
00:02:54.000 We need to achieve this while maintaining fast, agile innovations.
00:03:00.300 Some acceptance criteria for our queries are as follows: hard-coded queries are out of the question.
00:03:12.140 They need to be composable and maintainable, allowing engineers to work with them effectively. Additionally, they must remain performant even as we build higher layers of abstraction on top of SQL.
00:03:30.680 I really like the metaphor of Legos here—I have been a big Lego fan since I was a child. I love the idea of having building blocks that you can shape into whatever you need.
00:03:48.360 Our queries should be modular and constructed from smaller, interchangeable parts.
00:03:54.000 They need to respond logically to conditionals to achieve the desired output and return fields that may vary based on the filters users select.
00:04:07.040 However, composable queries can become messy quite easily. I'm sure many developers in this room have experienced struggles with this.
00:04:24.960 What I'm going to show you is an old design pattern that we used, which involves SQL concatenation.
00:04:35.820 This pattern resulted in over 230 lines of code, served by many different private methods and a dozen or so helper files.
00:04:47.340 The problem here is not SQL injection—we guard against that in other parts of the code. The issue is that it's easy to lose the plot.
00:05:01.560 You can follow that this is an SQL query with its different clauses—SELECT, JOIN, WHERE, GROUP BY, etc.—but it doesn't really tell a story, making it hard to work with.
00:05:13.139 It also complicates discovery of common parts of domain logic. I made the decision to encourage our team to adopt a different design pattern for composing queries.
00:05:25.259 I'm going to give you a sneak peek of what we achieved and what you can also accomplish using the Builder pattern.
00:05:38.280 You can return to object-oriented programming principles when composing your queries.
00:05:45.000 Here's our sneak peek—you'll find this in the repository I mentioned, and we will dive deeper into it.
00:06:03.479 For now, I want you to notice two important things: first, we're back to using named classes in our query, which enhances discoverability.
00:06:08.240 Secondly, there are no conditional branches disrupting the flow of this code, allowing the query class to tell a coherent story.
00:06:14.699 As we will see, the conditional branches come downstream.
00:06:25.259 Before we get into the details, I want to convince you that this may be the right tool for your application.
00:06:32.860 Let's compare the different tools you might use for composing queries.
00:06:41.400 First, there's raw SQL concatenation, which still has its practitioners and defenders. It has advantages when used correctly—it is the most performant option.
00:06:55.440 You can't get more performant than building an abstraction layer above your database management system.
00:07:06.000 Plus, you have a reduced dependency footprint, meaning you aren't tied to your ORM.
00:07:12.000 However, there are downsides. Your readability will generally suffer. Raw SQL is less maintainable, less composable, and less reusable.
00:07:20.940 Additionally, it’s not as familiar to most Rails developers, and you receive less automated help.
00:07:27.639 Next, we have vanilla ActiveRecord, without any libraries or higher-level abstractions.
00:07:36.960 Its advantages include being composable and more readable than SQL. After all, it utilizes Ruby on Rails and is capable of most complex queries.
00:07:44.599 The downside is that it doesn't handle conditional branching well by itself, leading you to sometimes revert to raw SQL.
00:07:53.120 Moreover, you have a dependency on your ORM.
00:08:09.600 Our next tool is Arrow tables, which many Rails practitioners rely on. Arrow tables have advantages in handling conditional branches well.
00:08:17.400 They also deal with complex queries and reduce your dependency on a specific database management system.
00:08:24.540 However, the downside comes from being dependent on the Arrow library now, which introduces another dependency.
00:08:31.800 I personally find the relational algebra syntax difficult to read, and I believe many Rails developers share this sentiment.
00:08:45.779 I prefer object-oriented syntax instead. Another major reason we chose our own builder pattern over Arrow tables is that we can express multiple concerns in a single node.
00:08:52.680 Now we arrive at our custom design pattern with ActiveRecord. It's highly composable and effectively handles conditional branches.
00:09:01.500 It uses Ruby on Rails, and no complex library is needed. If you clone our repository, you'll see how the pattern operates and feel free to adapt it to your own application.
00:09:09.720 No tool is perfect, and there are a couple of downsides to consider: you will face an additional learning curve, as you'll need to be proficient with ActiveRecord and the Builder pattern.
00:09:22.560 Also, you will still have a dependency on ActiveRecord and Rails.
00:09:30.780 Let's delve deeper into the intricacies of the query builder and visitor design pattern. From here on, I will simply refer to it as the Builder pattern.
00:09:40.680 The Builder pattern is a higher abstraction that sits atop ActiveRecord. Abstraction is what I love best about being a software engineer.
00:09:48.600 I enjoy designing systems that tell a coherent story readable by humans, not just machines.
00:09:54.840 Now, here's where we will transition into our code editor, and I will show you how this pattern works.
00:10:20.300 Let me provide some context on how we designed this repository. It required some minimum schema to effectively demonstrate our point.
00:10:28.200 For context, this schema is modeled off of a tabletop role-playing game character builder. For those who aren't familiar, that's perfectly fine.
00:10:42.120 What you need to know is that in our database schema, we have a central table called characters.
00:10:52.260 Picture yourself as a dungeon master or someone else interested in characters, wanting to search among thousands of character records.
00:10:58.800 Our characters have numerous associations with campaigns, players, spells, feats, and more.
00:11:08.300 We need to be able to query from the characters table, and this is an example of one of our query classes, the index query.
00:11:15.300 There are three main concern objects in our query builder pattern: query classes, builders, and nodes.
00:11:23.640 This is where the query class serves as your public interface, returning an ActiveRecord relation.
00:11:29.900 It contains private methods that address implementation specifics while retaining its public interface from its base class.
00:11:40.920 Thank you. Our base class has one public method called `call`.
00:11:46.680 Whenever you need to run a query, you send it the `call` message. This method serves two purposes—it applies and tells the builder to build.
00:11:55.800 We have two primary parameters: the builder, which is injected as a dependency, and filters, which is a hash of user inputs.
00:12:07.680 The filters specify what the user wants; for example, they might only be interested in characters with certain spells.
00:12:16.020 The implementation of the `call` method will depend on the specific child class being called.
00:12:29.520 Let's return to our specific child class. The apply method here performs several tasks.
00:12:40.200 First, it applies joins only to those tables that will be needed based on the filters. It also adds nodes to our builder.
00:12:47.640 The initial state will simply be characters, building an ActiveRecord query concerning those characters.
00:12:57.840 Next, we need to examine the nodes and builders, so let's take a look at one of the nodes.
00:13:07.920 You may be wondering where the conditional logic resides; the answer is in the nodes.
00:13:19.739 We're going to look at one of our simpler nodes called 'spells level n.' This produces a clause for the spells table where level is included in the specified levels.
00:13:32.760 The parameter of levels determines how valid this node is, as its validity relies on the filters passed.
00:13:42.540 If the filters do not include levels, this node will be eliminated. We use inheritance to confer a common public interface.
00:13:50.760 All nodes will possess both an accept method and a valid method.
00:13:58.320 The simplest form of the accept method will merge different ActiveRecord clauses.
00:14:12.720 Now, let’s consider our builder.
00:14:19.920 Our builder is initialized with an empty array of nodes and has two public methods: an add method and a build method.
00:14:28.680 The query class will call the add method to provide all the nodes the builder might need, and once all nodes have been added, we call the build method.
00:14:36.520 The build method contains our central algorithm, which acts as a reducer to accumulate the state of the ActiveRecord query converted from the nodes.
00:14:50.640 The builder does not directly direct this; it instead delegates to the nodes using the visitor pattern. That's the reason we pass a block to reduce, which is visit.
00:14:58.800 Let's demonstrate how this works by entering our console.
00:15:06.840 We're currently inside our query class.
00:15:11.320 Let’s start with some initial state here.
00:15:18.880 The builder has its initial state set.
00:15:23.260 The builder currently points to the characters table but has no nodes.
00:15:31.900 After applying a join, our builder now has an inner join node added.
00:15:39.240 Let’s add our other nodes.
00:15:48.360 When we call builder.build, our actual query will be constructed.
00:15:54.520 It will filter out all the irrelevant nodes that we don't need.
00:16:03.880 I hope this served as a useful introduction to how this works.
00:16:10.499 Please bear with me while I navigate back to my presentation, as working with a projector is a little different.
00:16:17.060 So, let’s summarize what we've learned so far.
00:16:22.880 It's crucial to understand that our Builder pattern consists of object-oriented programming, involving three interacting objects: the queries, the builders, and the nodes.
00:16:30.360 The central algorithm utilizes reduce.
00:16:44.760 I’ll pause for a moment to take a drink.
00:16:51.400 I like this diagram; it illustrates how the different objects work together.
00:16:56.880 Going from left to right, we have our query class, which takes in filters that customize our query.
00:17:04.740 The dependency injection of the builder provides one public method: the call method.
00:17:10.560 The call method triggers two methods on the builder: add and build.
00:17:17.080 The builder keeps a state of nodes in an array and an initial state provided by the query class.
00:17:25.040 When the builder is told to build, it loops over the nodes and visits each node.
00:17:32.560 First, it asks the node if it is valid based on the state passed in. If valid, the accept method will be called.
00:17:40.160 Typically, this involves merging ActiveRecord clauses.
00:17:47.440 The node generates a clause, and ultimately, the builder returns an ActiveRecord relation to the query class.
00:17:54.680 Now, the central algorithm utilizes the reduce method.
00:18:02.800 Thanks for your patience. Let's go back to our code editor.
00:18:10.960 Let’s closely examine the reduce method.
00:18:16.120 We can find it in our builder—the base implementation contains a reduce method.
00:18:24.520 The standard Ruby library reduce can be called on an enumerable or an array, combining different elements using any provided code block.
00:18:31.640 Our implementation is customizable from the base class and accepts a state and node as parameters while visiting each node.
00:18:39.240 Let's see what our default builder does.
00:18:45.640 We should zoom in here.
00:18:50.040 Our custom reduce method first loops over all the nodes and checks if each is valid. Invalid nodes are eliminated and not built as part of our query.
00:19:26.900 We call reduce on the nodes that are valid, utilizing our initial state and block.
00:19:35.400 As we compile our logic, let’s consolidate this again in summary slides.
00:19:43.680 The main idea is that the reduce pattern accumulates nodes while also eliminating unnecessary ones.
00:19:49.860 Essentially, we are executing ActiveRecord queries, merging them repeatedly.
00:19:59.680 We could implement this naively using conditional branches, where we'd create two separate queries.
00:20:09.400 However, this approach would complicate things with numerous conditional logics, making it difficult to manage large queries.
00:20:16.360 Motivated to improve, we require a better idea, leading to reduce.
00:20:25.520 By providing the builder with nodes and allowing the nodes to self-eliminate, we manage to create cleaner queries.
00:20:35.520 I love metaphors, and the one I find fitting here is that of a sculptor.
00:20:41.760 The sculptor receives a medium and subtracts what is unnecessary, eventually revealing a beautiful statue.
00:20:54.920 The same concept applies to our reduce pattern.
00:21:01.740 To summarize, the builder constructs queries through the reduce algorithm and visits nodes.
00:21:09.120 The code block is delegated to the node's own method, allowing for organized logic.
00:21:15.300 One of the biggest advantages of this pattern lies in its ability to manage nesting within nodes. Complex queries become easier to handle.
00:21:22.580 Consider Russian nesting dolls, where a node can envelop other nodes.
00:21:29.580 There are numerous use cases for this: or and where clauses, higher domain logic, common table expressions, and subqueries.
00:21:38.260 Our pattern effectively handles nesting with grace.
00:21:46.380 Let's revisit our query class and check out one of the nodes.
00:21:53.240 We have a legendary node that embodies higher business logic, asking for legendary characters.
00:22:00.740 Legendary characters have various attributes, such as experience level and stat boosts.
00:22:06.900 Let’s inspect this further.
00:22:14.840 Our node initializes with a filter and maintains its own sub-builder.
00:22:20.960 This insight allows clauses to delegate back to builders, adding child nodes.
00:22:28.780 Now we have our and filter with two concerns from the characters table.
00:22:38.200 Let’s observe how it looks in console.
00:22:46.480 We’re inside our nested node, the legendary node.
00:22:53.680 Let’s see what happens when we add nodes to the builder.
00:23:00.480 Now we have a nesting of nodes.
00:23:08.760 We have constructed a beautiful query that integrates into our larger query.
00:23:16.200 We want characters with experience exceeding 10,000 and one of their stats must be greater than or equal to 18.
00:23:23.460 There’s a bit of a learning curve when switching between the console and the slideshow.
00:23:27.200 To summarize, with nesting of nodes, we delegate to our own builder. One of the powerful aspects of nested nodes is their ability to execute and manage complex queries.
00:23:36.880 Finally, I’ll outline how you would implement our query builder pattern.
00:23:45.200 There are four principles of best practice to follow. First, encapsulate the public interface of your query in a class.
00:23:55.940 You want the query encapsulated in a query object class with an appropriate scope.
00:24:02.640 The query should achieve a balance—it shouldn't be a monolith trying to do everything, nor should it be overly granular.
00:24:09.800 A helpful rule of thumb is to have one query class per API endpoint.
00:24:18.740 We have the index query we've discussed, and I appreciate the clarity it provides.
00:24:27.240 You want to make your queries modular by utilizing named classes that represent nodes.
00:24:38.320 This approach leads us back to the principles of object-oriented programming, ensuring discoverability and clarity.
00:24:47.520 Nodes can be general or specifically tied to a particular table.
00:24:54.820 You should also delegate functionality to your builders and nodes, moving the main logic outside of the query class.
00:25:05.600 The node classes should have a valid method that assesses whether the builder adds them to the ActiveRecord relation.
00:25:12.820 They also need an accept method that merges them into the ActiveRecord relation.
00:25:21.540 For example, this might look at a different campaign table, allowing users to filter start dates.
00:25:28.600 This node verifies if any filters are present; if they are, it will introduce the WHERE clause.
00:25:35.960 Finally, you want to create a builder that uses the visitor pattern.
00:25:43.780 The builder abstraction should remain distinct from your query class, while your query class crafts a story that reads descriptively.
00:25:51.640 Your builder class will use the visitor pattern to visit each node and extract its clause.
00:26:00.400 We have observed this with our default builder as well as the reduce algorithm employed.
00:26:09.640 To wrap this up, abstraction allows our code to narrate a cohesive story, which I hope was evident in my presentation.
00:26:19.420 We began with our legacy file constrained by SQL concatenation.
00:26:26.920 We’ve achieved a pattern where we return to using named classes, enabling us to narrate a coherent story.
00:26:34.200 Our objective as software engineers is to select the most appropriate abstractions to create beautiful, maintainable code.
00:26:40.640 Before I conclude, just a quick shoutout to my company G2, which is growing and actively hiring, so please consider checking us out.
00:26:59.180 I want to acknowledge that our Builder pattern was developed by three engineers: myself, John Paul Hill, and Evgeni Karpov.
00:27:05.080 I greatly appreciate all the feedback I received on this presentation, which helped improve it.
00:27:12.960 Finally, I want to thank G2 leadership for supporting us in developing this pattern and providing the resources to deliver it to you at Rails Conference.
00:27:19.960 At this point, I have time for questions and answers. Thank you so much!
00:27:30.240 Absolutely. The question was about the GitHub URL. Let's refer back to my slide with the QR code.
00:27:38.160 G2Crowd/query-builders, yes.
00:27:41.820 We indeed wanted to create our own pattern to avoid dependency on anything else. We looked online to see if anyone had implemented this pattern, and to our knowledge, they have not.
00:27:58.160 Therefore, we love owning this for ourselves.
00:28:05.280 The question was why I passed all the nodes to the builder instead of merely selecting the desired nodes.
00:28:14.720 The logic for adding a node needs a location; we've moved that logic downstream to the nodes themselves, particularly into the valid method.
00:28:25.680 This approach liberates our code, allowing it to tell a story without burdening the query class.
00:28:32.080 Did that answer your question?
00:28:39.600 Wonderful, yes, how do we ensure they don't create new query patterns?
00:28:45.500 My answer is to provide them with something they prefer over alternatives. If they discover a better option, I cannot nor should I hinder that.
00:28:54.680 The next question is when do you determine whether to create a new query class versus a different nested node?
00:29:03.640 Often, as you work with specs, you'll discover an explosion of branches that can be too complicated; this signals it's time to split it out.
00:29:10.000 That's the best advice I can offer, although several factors will also come into play.
00:29:17.720 Indeed, there are no performance differences; any abstraction will have some associated trade-offs.
00:29:26.320 How would this work for CTEs? While I don't have a verbal description on hand, I have built queries utilizing this pattern that employ CTEs.
00:29:34.680 Please feel free to reach out to me for more specific examples.
00:29:42.640 Regarding specs for the query class, you can write specifications for the nodes themselves.
00:29:49.340 In practice, you'll find a mixture of integration tests and unit tests will serve well.
00:29:58.100 So, what roadblocks did we encounter while learning and building this pattern? My team faced numerous iterations before honing in on this solution.
00:30:06.720 The reduced pattern we arrived at, while relatively simple, required ample trial and error to find the balance of complexity, performance, and readability.
00:30:14.480 Thank you for all the amazing questions; I'm grateful to see such interest in this topic.
00:30:23.640 Any other questions I may have missed?
00:30:30.000 Yes, certainly, you can change sub-queries.
00:30:34.680 Are you referring to an entire query class?
00:30:41.820 I can't recall a specific instance with that offhand, but I don't see why not.
00:30:47.480 I think we've reached the end. Thank you so much, everyone!