Database Performance

Summarized using AI

Sequel: When ActiveRecord is not enough

Janko Marohnić • September 16, 2020 • online

In the talk "Sequel: When ActiveRecord is not enough," Janko Marohnic discusses the limitations of ActiveRecord, a commonly used ORM in Ruby on Rails, and introduces Sequel as an alternative ORM that addresses these limitations.

Key points include:
- Complex Queries: While ActiveRecord is user-friendly for common tasks, it can struggle with complex SQL queries that require raw SQL workarounds. Sequel offers a more straightforward approach to SQL.
- Connection Management: ActiveRecord establishes connections lazily, which can lead to unpredictable behavior, while Sequel establishes connections immediately, leading to greater reliability.
- Query Handling: ActiveRecord caches query results while Sequel uses frozen data sets, requiring users to consciously decide when to execute queries.
- Model and Data Set Decoupling: Sequel separates models and datasets, allowing users to utilize one without the other. This decoupling enhances flexibility.
- Plugin System: Sequel employs a plugin system for adding features, keeping the base layer lightweight and reducing complexity. In contrast, ActiveRecord comes with various features that may remain unused.
- Expression API: Sequel's expression API is streamlined and reduces complexity compared to ActiveRecord’s Arel API, which can be verbose and is less documented.
- Eager Loading: Sequel has a clearer approach to eager loading with fewer methods that are easier to understand compared to ActiveRecord’s options.
- Performance: Benchmarks show that Sequel outperforms ActiveRecord, with lower memory usage and faster execution times.
- Connection Pool Management: Sequel handles connection allocations more efficiently than ActiveRecord, which can struggle when thread count exceeds available connections.
- Integration with Rails: Janko outlines how to integrate Sequel with existing Rails applications, suggesting a bridge gem that allows for SQL usage alongside ActiveRecord.

In summary, Sequel offers a lighter and faster alternative to ActiveRecord, especially suitable for complex SQL operations. It maintains robust PostgreSQL support and simplifies the ORM design, making it a worthy consideration for Ruby developers seeking more advanced database interaction capabilities.

Sequel: When ActiveRecord is not enough
Janko Marohnić • September 16, 2020 • online

As Rails developers, using ActiveRecord as our database library is a given. As our application grows, we start noticing that some of our new requirements aren't exactly supported by the ActiveRecord API. So, we pull up our sleeves and write that raw SQL snippet, re-create some queries without models to get that extra performance gain, or work around that bug in the library, and move on. But gaining this technical debt isn't necessary if you're using a library that supports your advanced use cases. Enter Sequel, an alternative ORM for Ruby. In this talk we will show some advantages that Sequel has over Active Record, covering topics like the query API, building SQL expressions (Arel), connection pools, immutability and more.

rubyday 2020 - Virtual edition, September 16th 2020. https://2020.rubyday.it/

Next edition: https://2021.rubyday.it/

rubyday 2020

00:00:48 Um, the next speaker is Janko Marohnic. In his career as an evangelist for Ruby off the Rails, he has been looking at how we can overcome some of the issues that we usually encounter when using Active Record. Active Record is great—it's part of Rails and provides us with an interface that is easy to use. However, sometimes, when you need to push a little bit more and create more complex queries or perform advanced SQL tasks, you have to go around the interface of the library and sometimes write raw SQL. This can be an interesting exercise. To avoid this situation, you can use other ORMs. The one Janko will talk to us about today is called Sequel.
00:01:55 Hello, Janko! Nice to see you. Hello, Monica and everyone. Welcome. Thank you.
00:02:20 This will be a live talk, so Janko will walk us through SQL. Remember to ask questions. I will be there looking and listening. In the meantime, I leave you on stage. Thank you so much, and see you in a while. Thank you, Monica.
00:02:45 Um, so as Monica announced, I will be talking about SQL. For me, it has been my ORM of choice for quite some time now, and I feel like I have leveled up in interacting with SQL databases in Ruby this way. I want to share some of its features with you today so that you can learn from them and also evaluate whether you want to use SQL for your specific needs.
00:03:15 A little bit about me: My name is Janko Marohnic. You can find me as Janko on GitHub or as JankoMarohnic on Twitter. I was born in Croatia, so I am Croatian by nationality, but for the past few years, I have been living in the Czech Republic. As Monica mentioned, I like to evangelize my generic Ruby solutions in my free time by writing a lot of articles on my blog and occasionally giving talks. I have also authored many Ruby gems, the most popular being Shrine, which is a file attachment toolkit for Ruby. In my career, I estimate I have about three years of experience using SQL in production.
00:03:51 A little bit about SQL: it was created in 2007 by Sharon Rosner and has been maintained by Jerry Evans since 2008.
00:04:02 For those of you who don't know who he is, he was a Ruby hero, I think, in 2015. He is the author of numerous Ruby gems and is a very well-regarded contributor. He committed to SQL as of one or two years ago.
00:04:22 To summarize, SQL is quite similar to Active Record in that it also implements the Active Record design pattern for accessing data, and it powers DRB SQL, which is the SQL backend for the DRB RPG.
00:04:43 This is an example of Active Record code that establishes the connection, and when we write the equivalent SQL code, you can see that it is quite similar. SQL also has a notion of models, and it provides a pretty similar interface. So, if you are using Active Record, you should be quite familiar with SQL as well.
00:05:20 Throughout the slides, I will be comparing Active Record and SQL a lot. For Active Record, I will use a blue color cue, and for SQL, I will use a teal color.
00:05:56 Let's start with the design, more specifically, the connection design. Usually, when you open the Rails console and inspect a SQL or an Active Record model, you often see the message that the connection has not yet been established. To me, this is a bit surprising considering that we called a method called establish_connection beforehand.
00:06:26 In fact, Active Record will establish the connection when the connection object is retrieved as part of a query or similar action. So, it is kind of lazy in that sense. You may also observe that it doesn't immediately define the column methods on the models; instead, it will actually define them only when you initialize the first model instance. For me, this is a big red flag because defining methods at runtime can lead to unpredictable behavior.
00:07:05 In contrast, when you look at the equivalent SQL code, the connection to the database is established at the moment you call SQL.connect, and that code will fail if the database configuration is incorrect. Furthermore, when you create a SQL model, the column methods are immediately defined on it. SQL's interface and API are also more explicit.
00:07:58 This is one typical example of Active Record code. You would typically define a query controller, and the query would get executed in views. I believe that all data should be loaded at the controller level.
00:08:38 Views should only work with loaded data. This is common practice because Active Record relation objects act as arrays and relations simultaneously. When records are loaded for the first time, the next time they are called, it will return cached results. In SQL, it is the opposite: SQL data sets are frozen, meaning executing any method on it will yield the same result no matter how often it's called.
00:09:22 In my opinion, this forces you to decide when you want to execute the query—either chaining more methods or actually loading the records from the database.
00:09:57 Regarding associations, Active Record's main method will return a collection proxy object, which defines many methods for retrieving the records. In contrast, SQL is much more explicit. The main method for retrieving an association will directly load the associated records from the database. If you want to chain further queries on the data set, you can call the _dataset method.
00:10:41 Therefore, you have to decide on both fronts: whether to load records or to apply additional query conditions.
00:11:28 The SQL design is more layered. In Active Record, you can access almost everything through the Active Record base class. Whereas in SQL, you need to use a database object returned by the SQL.connect method and perform any action that isn't directly related to models and is related to the database in general.
00:11:56 This also means SQL trivially supports multiple database connections by calling SQL.connect multiple times and assigning them to different variables with distinct constants.
00:12:26 As for how the database objects tie into models: if you simply subclass the SQL model, it will take the first connected database object. If necessary, you can also specify the database object explicitly when defining the model, thus allowing you to assign different models to different databases.
00:12:59 In SQL, the query interface is typically accomplished through models, but you can also drop down to the dataset level, where loading records will yield Ruby hashes instead of other instances.
00:13:40 This demonstrates that in SQL, data sets and models are decoupled from one another. You can use one without the other or use data sets without models.
00:14:23 I also appreciate how SQL handles complexity. ORMs that provide multiple features can grow excessively complex. If you inspect an Active Record base class, you will see it adds behaviors for many features by default, many of which you may never use. SQL, on the other hand, takes a simpler approach by providing only essential behaviors by default.
00:15:11 For any additional features, you would load plugins. This plugin system extends SQL's core classes with modules to augment their behavior. For instance, if you want SQL models to have dirty tracking, you would load the dirty plugin; for serialization, you would require the serialization plugin.
00:15:52 If you want to opt-in for single table inheritance, you can do that as well. A distinct advantage of this approach is that if you don't use single table inheritance, which will be the case for most models, you are free to use the type column, something I stumbled over many times as a beginner.
00:16:34 In short, SQL loads much less code by default, resulting in less complexity overall.
00:17:09 Now let's proceed to feature comparison. One of the most intriguing aspects is SQL expressions. In Active Record, simpler queries can be expressed using Ruby hashes or symbols, but when more complex expressions are required, developers typically resort to raw SQL strings.
00:17:57 SQL provides a simpler hash-based expression API, and when more complex structures are needed, string literals can still be used. SQL also has a very nice expression API that, for example, employs operators in some cases rather than method calls, avoiding the necessity to nest expressions deeply within parentheses.
00:18:39 If the expression API seems overly verbose, you can use SQL's virtual block syntax, where you pass a block to a query method. SQL then uses method_missing to instantiate identifiers and functional objects, allowing you to call methods on them as usual.
00:19:43 While Arel mostly operates within the context of models, SQL expressions can be used virtually anywhere. For instance, here, we can define a unique index directly and specify conditions for where the index will be applied. SQL's expression API is consistent, allowing us to perform actions like creating views using Ruby expressions.
00:20:45 In my experience, Arel tends to be more verbose, whereas SQL's expression API is more streamlined and efficient.
00:21:25 Active Record's Arel is also considered an internal API of Active Record and is not well documented in the API docs, while SQL's expression API is a public one, and its documentation is quite comprehensive.
00:22:11 Additionally, as mentioned earlier, while Arel tends to work in the context of models, SQL's expression API is much more general and can be utilized in a wide range of contexts.
00:22:56 Next, let's discuss eager loading. Active Record provides three methods for eager loading: preload, eager_load, and includes. Preload will make a separate query, while eager_load will use a left outer join. The includes method will default to making separate queries but can also be adjusted to utilize a left outer join if needed.
00:23:38 In SQL, there are only two methods for eager loading, and it's quite clear which does what. The methods are simply named eager and eager_graph, which gives you an immediate understanding of their functionality.
00:24:22 SQL's eager loading API also supports adding additional filters or extending the data set you are loading. In this example, we are loading associated movie reviews, but only those that have been approved. This allows for customization without having to create separate association definitions.
00:25:03 Additionally, SQL's eager loading permits limited associations, a feature that may seem trivial to implement. You might think to apply a limit to a data set, but that doesn’t quite work out; instead, SQL provides various approaches using union or union all to implement this functionality.
00:26:01 One feature I appreciate about SQL is its outstanding support for PostgreSQL-specific features. My favorite is the JSONB support, which allows me to use Ruby instead of cryptic PostgreSQL operators to perform JSON queries, making it much more convenient.
00:27:06 SQL also supports built-in cursors for cursor fetching, which is faster than the typical limit offset pagination. Moreover, if you want something even faster, you can use the SQL PG extension, which offers a stream method that's quicker than cursors.
00:27:58 Generally, the page_each method on models, which is the equivalent of Active Record's find_each, will utilize the fastest available solution.
00:28:45 SQL includes commands for copying data into and out of the database, which is usually the fastest way to import or export data from PostgreSQL. It has numerous other features, encompassing sharding, primary replicas, notifications, common table expressions, prepared statements, window functions, and more.
00:29:35 Now, let's touch upon performance, particularly model performance. I have created benchmarks that compare SQL's execution time against Active Record while fetching 1000 records from a database.
00:30:13 When measuring execution time, SQL is significantly faster by default; in PostgreSQL, the speed advantage can be as much as twofold.
00:30:50 The results are also evident in memory allocations: SQL will allocate nearly three times fewer objects than Active Record. However, Active Record does have its nuances; it may appear to cheat a bit in benchmarks by optimizing how it loads attributes or column values.
00:31:34 When rerunning the same benchmarks, you will see that Active Record now takes considerably longer and allocates significantly more objects. This is due to Active Record's attempt to optimize loading—we're trying to fetch only needed values, while SQL loads everything as soon as a record is fetched.
00:32:21 In my view, this approach localizes where I'm loading data, allowing me to isolate the performance of serializers, for example, without interference.
00:32:53 Another noteworthy point is that, if we load most of the features Active Record loads by default into SQL, it can drastically affect the performance of fetching records. Keep in mind, while we are not directly using any of these features, they still introduce performance overhead.
00:33:44 What's great about SQL is that even when your instances are smaller and doing more, you can run performance-sensitive operations like CSV exports without any overhead. You can opt-out of model overhead by calling naked on the dataset, returning plain Ruby hashes instead of model instances.
00:34:25 Another topic I'd like to touch on is the connection pool. Let's say we have configured Active Record using default settings, a pool size of five and a checkout timeout of five seconds. In this example, we spawn threads equal to the pool size plus one.
00:35:06 In theory, this code should run without issues, as once a thread is done using a connection, it should become available for other threads. However, this is not the case.
00:35:43 Active Record tends to assign checked-out connections to threads longer than anticipated, which can lead to connection pool exhaustion.
00:36:10 The solution here in Active Record is to manually check out the connection using the with_connection block, ensuring that the connection is returned to the pool once it's finished. This is commonly recommended for Sidekiq users, although I consider it a bit of a hack; ideally, Active Record should handle this automatically.
00:36:51 In contrast, running the corresponding SQL code will proceed without issues, even if there are more threads than available connections, as SQL effectively manages connection allocations.
00:37:28 Now, I may have piqued your interest in SQL, but an important question is how to implement SQL in Rails. Typically, when starting a new project, you will likely be using Active Record.
00:38:09 You may want the ability to experiment with SQL alongside your existing Active Record setup. Historically, this was somewhat problematic since SQL requires its own database connection, increasing the total connections to your database.
00:38:40 To solve this, I created the SQL Active Record connection gem, providing a bridge that allows SQL to reuse Active Record's connection, enabling you to utilize SQL within your Active Record project without performance penalties.
00:39:24 However, you might also wish to use SQL to replace Active Record in scenarios where Active Record isn’t sufficient. I recommend the SQL register, which includes all the conveniences we expect from Active Record, such as database migration tasks, generators, and Rails integration.
00:39:58 To work with SQL models and Rails form helpers, simply load the built-in Active Model plugin, making simple models behave similarly to Active Model objects. This also works with Rails form helpers.
00:40:43 Additionally, you might need to replace some gems that assume Active Record with alternatives that are more inclusive.
00:41:30 While you won’t find direct replacements for everything, the number of viable alternatives continues to grow in both quantity and quality.
00:42:20 In summary, SQL has a unique design that effectively distinguishes between database objects, data sets, and models.
00:43:09 It manages complexity well with its plugin architecture, making building SQL expressions feel polished.
00:43:29 It provides flexible eager loading options and many advanced SQL features—particularly robust support for PostgreSQL.
00:43:53 Overall, SQL is considerably faster and lighter than Active Record, and it has a much more reliable connection pool management.
00:44:07 That concludes my presentation. Thank you!
00:44:32 Thank you. For the audience, can you hear me? Or is it as terrible as before? I hope not! No, seriously, this has been great.
00:45:00 We have two questions, but you already answered them in your slides. So, we have the same issues as before.
00:45:24 Let me try with another mic, aka the default one.
00:45:48 What about now? Can you hear me better? Perfect! Much better. So, whatever happened, happened. We had two questions that you answered already.
00:46:24 So, we have another question from Slobodan asking, do you have any experience with SQL and Roda instead of SQL and Rails?
00:46:56 Specifically for Roda, I have some projects where I used it. That is my preferred web framework, and ideally, I would use it at work as well. The closest I used was Cuba, which is a fork of Roda, and it worked quite well with SQL.
00:48:04 So yes, I believe modern SQL will integrate nicely with Roda since they come from the same author.
00:48:36 There’s another question: how battle-tested is SQL? Do you know how many companies besides yours are using it?
00:49:14 I know that I built projects for other companies using SQL. For example, I worked on a personal project for Apple, where we also used SQL. I also worked at Clearbit, a business intelligence platform that significantly benefited from SQL.
00:50:19 I know for a fact that Heroku utilizes SQL as well. When they migrated from Rails, they began building microservices using SQL, and it worked out excellently for them.
00:51:02 Based on that, if it works for Heroku, it's likely well-tested!
00:51:25 Thank you so much! Looking forward to seeing you in the chat and the hallway track later. Thank you.
00:51:50 It was a pleasure being here, and I’ll see you at the conference.
Explore all talks recorded at rubyday 2020
+2