Talks

Lightning talk: How to use Arel? Wrong answers only!

Lightning talk: How to use Arel? Wrong answers only!

by Dávid Halász

In this lightning talk presented by Dávid Halász at Helvetic Ruby 2024, the focus is on the unconventional usage of Arel in Ruby on Rails, specifically highlighting common mistakes when constructing queries.

The speaker, who is affiliated with Redhat and is also a PhD student based in the Czech Republic, aims to educate the audience on how to navigate the complexities of Arel while avoiding pitfalls. The session covers the following key points:

  • Database Model Overview: Halász presents a basic database model illustrating a one-to-many relationship between policies and their associated tailorings. The importance of querying these relationships is emphasized.
  • Query Construction: When attempting to restrict policies by a specific title, the risks of SQL injection are discussed. Although recent versions of Rails allow for safer querying with hashes, changes in semantics might lead to unintended consequences, such as aliasing issues.
  • Complex Models and Scopes: The presentation moves to more complex queries involving stricter policies with compliance thresholds. Halász illustrates the difference between conventional joins and the use of Active Record merges to optimize queries.
  • Alias Handling: A key focus is on aliasing in queries. The speaker highlights that Rails may not correctly generate aliases in some cases, leading to errors in the final query.
  • Arel Visualization: Arel, the SQL AST manager that Rails uses to construct SQL queries, is introduced as a powerful tool. The speaker provides a visual representation of Arel’s output, illustrating how it can become quite intricate and how aliases can be handled correctly or incorrectly.
  • ErrorVisitor Class: The presentation explains how the ErrorVisitor class can traverse query nodes to extract aliases, which is integral to debugging and optimizing query construction. Modifications to these nodes can lead to caching issues, prompting Halász to recommend creating new trees for handling aliases to maintain cleanliness in the codebase.

In conclusion, the talk emphasizes that while Arel can facilitate complex query construction, caution must be exercised to avoid pitfalls associated with aliasing and SQL query efficiency. The main takeaway is that constructing queries directly through Arel may often be the simplest and most effective approach.

The session concluded with applause, reflecting the audience's appreciation of the insights shared.

00:00:04.680 All right, welcome everybody! Thank you for having me for this lightning talk.
00:00:06.600 I work for Redhat and I'm also a PhD student living in the Czech Republic. Today, I would like to talk about how to use Arel the wrong way.
00:00:11.120 So, imagine a database model like this. I added the accounts just because there was an authentication talk; it's not really important. The important part is that you have some policies and some tailorings.
00:00:20.400 One policy can have multiple tailorings, which I think is pretty straightforward: a one-to-end relationship. We want to query the tailorings joined with the policy.
00:00:34.399 Something like this happens: 'SELECT tailorings FROM tailorings INNER JOIN...'. But what if we want to restrict the policies to a certain set, let's say by title? You probably don't want to do this due to the risk of SQL injection, especially if the title is a variable.
00:00:48.920 The latest versions of Rails actually allow you to do this; you can pass in a hash and that hash will be delegated, generating a query like this. However, there is a catch. If you look closer, the semantic has changed—the query has shifted from policies to policy. Rails automatically creates an alias for you, except when it doesn’t...
00:01:09.880 In the first example, it didn't. If we complicate our model a little bit more from the previous example, let's say we add a scope. I will just show a simple example, but you can imagine something more complex.
00:01:26.079 So we have strict policies where the compliance threshold is 100%. How would we query this intuitively? Well, you would do a join and then include a WHERE clause and a subquery. You select all of the strict policies, putting in the IDs, which generates a query with the subquery inside.
00:01:39.600 Of course, this is slower because you have two queries instead of one, even if they are together. However, there is something called Active Record merge, which allows you to merge two scopes, basically taking the where clause from one and putting it to the other.
00:01:49.280 As you can see here, it looks nice, right? But if you look closer, here you have 'policies' and here is 'policy'. It’s not going to work. Rails automatically creates an alias, but sometimes it doesn’t. I really wanted to use this coding style of method calls, so I started to look into what happens under the hood.
00:02:08.960 I took my query, and there is a nice tool where you can call a where clause in your scope, and it gives you a structure like the one illustrated here.
00:02:11.120 If you're ready to take a picture, go ahead! We have time.
00:02:18.000 This generates something like this, which is Arel, and you can use it to build very complicated queries. Another way of visualizing it is shown here; however, what we’re interested in is the table alias: the left side has 'policies' and the right side has 'policy'.
00:02:43.519 If we can traverse that tree, we can extract all the aliases. There’s a class for that called ErrorVisitor, which is also used for generating raw SQL. So whenever you run a query that's not written in raw SQL, it's actually using ErrorVisitor to traverse it and generate the SQL for PostgreSQL or MySQL.
00:03:32.200 When we do the tailoring joins policies, we can create an ErrorVisitor that goes through every node. If the node is a table alias, we can store the alias in a hash that will result in a mapping at the end.
00:04:20.639 This gives us a list of aliases. On the right side with the strict policy, we do the same thing but modify the nodes. This is where it goes wrong—you shouldn’t do that, but I haven’t found anything better. Actually, if you modify the nodes, they are cached.
00:04:34.040 So, you have to create a new tree to avoid messing things up in your codebase. Here, the right side will be something that swaps the aliases. This just creates aliases for all the tables, while this one changes the existing aliases if you have any.
00:04:51.680 In summary, calling this thing with the right side generates the correct query, or more simply, you could just construct your query with Arel. Thank you!