Jeremy Evans

Summarized using AI

Sequel

Jeremy Evans • August 07, 2012 • Earth

In this presentation, Jeremy Evans discusses "Sequel," a powerful database toolkit for Ruby, detailing its evolution and features since its inception. Sequel differentiates itself with a clean design and functionality, aimed at enhancing database interactions for Ruby developers.

Key Points Discussed:

- Introduction to Sequel:

- Sequel is a two-year-old database toolkit used primarily in Ruby for database management, introduced to Evans shortly before he became a maintainer.

- Initial Challenges:

- Upon examining Sequel's code, Evans encountered a mix of good design marred by poor implementation, particularly with method overriding and insufficient documentation.

- Development and Versioning:

- Significant changes were made in the subsequent versions:

- Version 2.0: Introduction of expression filters to simplify advanced filtering.

- Subsequent Versions: Support for new features like prepared statements and additional database adapters.

- Sequel Architecture:

- Sequel is divided into two components: Core and Model. Core facilitates direct interaction with SQL databases, while the Model provides an object-relational mapping layer.

- Database Adapter Support:

- Sequel supports numerous database adapters—13 in total—with a focus on ease of writing adapters (only five methods needed).

- Connection Pooling:

- Introduced a thread-safe connection pool, allowing efficient database access without connection leaks.

- Chained Query Construction:

- Unique functional API enables building queries by method chaining without writing raw SQL, improving readability and composability.

- Data Set Management:

- Data sets behave like collections of rows, allowing methods like 'each' and 'all' to handle records efficiently without caching.

- Table Joining and Graphing:

- Sequel simplifies table joins, implementing graphing to manage joins efficiently by returning rows as hashes of hashes, preventing key clobbering issues.

- Associations and Eager Loading:

- Unlike Active Record, Sequel's associations are straightforward with flexibility. Eager loading options enhance performance when querying related data.

- Validation and Security:

- Sequel emphasizes security features and database constraints. Validations are primarily for aesthetic error messaging, not data integrity.

- Performance and Maintenance:

- Sequel is lighter and faster compared to its counterparts, focusing on continuous improvement and stability with active bug fixes.

- Future of Sequel:

- Discussion of features planned for Sequel 3, which will continue to enhance usability and modular structure through plugin architecture.

In conclusion, Sequel is affirmed to be a robust tool that prioritizes flexibility and user control, equipped with modern features that cater to a variety of database needs while maintaining a lean footprint in the Ruby ecosystem.

Sequel
Jeremy Evans • August 07, 2012 • Earth

Help us caption & translate this video!

http://amara.org/v/FGif/

MountainWest RubyConf 2009

00:00:11.250 Hello, my name is Jeremy Evans and I'm here to talk about Sequel, the database toolkit for Ruby. Sequel has only been around for two years, and I don't know much about the first year as I was only using Sequel for a month before I became a maintainer last March.
00:00:26.710 Until then, I hadn't read much of Sequel's source code, but when I did, what I found distressed me. The basic design was good, but the implementation was messy. Many features were implemented by overwriting methods, there was very little documentation, and the advanced filtering required a parse tree, which was poorly coded.
00:00:44.350 I deprecated some obviously bad ideas and worked on a replacement for the parse tree filters. The expression filters I added in version 2.0 allow users to write advanced filters without needing a parse tree or handwritten SQL. In version 2.2, I removed parse tree support, which allowed me to support Ruby 1.9 and JRuby. In version 2.3, I added prepared statement support, and starting support in version 2.4 was impressive, as it came only two weeks after version 2.3.
00:01:07.420 Since then, many features have been added and many bugs have been fixed. Sequel is now split into two parts: the core and the model. This splits from the DM core model and adds features to the core. Sequel core is a Ruby interface to an SQL database, great for generating reports or dealing with sets of rows instead of single rows. Sequel model, on the other hand, is an object-relational mapper built on top of core. Model classes are built on top of core data sets, allowing users to benefit from core when using model at an instance level.
00:01:51.080 The basic usage of the model depends on core, but core does not depend on the model. It's likely that you could use core data sets to return AR objects if you wanted to. Sequel currently supports 13 database adapters, with some databases supported by multiple adapters. Some adapters support multiple databases, with the best-supported adapters being the native ones.
00:02:01.090 I test Sequel with SQL, PostgreSQL, and SQLite using the native JDBC and Data Objects adapters, all of which have good support. I know users who are successfully using the Firebird ODBC and Oracle adapters, and I have not received bug reports about the others. One reason Sequel supports many adapters is that the adapters are easy to write; only five methods are required, and they range in size from 50 lines for OpenBase to 1,204 for PostgreSQL.
00:02:35.329 The PostgreSQL adapter supports three native drivers, native prepared statements, and even methods to add database triggers and functions, which is why it's so large. Adding those five methods is enough to return results, but it's not enough for full support. The delete and update methods should return the number of rows modified, and the insert method should return the auto-incrementing primary key for a key-value insert. Fully supporting usually takes the most time, as adding support for a new database involves dealing with SQL syntax differences. Sequel makes this as easy as possible, as it's designed for flexibility over performance. You generally accept having to override a few short methods. Running the integration test is the best way to test adapter support.
00:04:08.049 After requiring Sequel, create a database object via the connect method. The convention for an application using a single database is to store the database object in a constant named 'DB.' The main use for this object is to create data set objects for queries, but it also handles transactions. This object can be used to change default settings for data sets and add SQL loggers. The transaction method is the only way to use an SQL transaction; it ensures that all access to the database inside the block uses the same connection.
00:04:37.440 Now, each database object has a separate thread-safe connection pool. Sequel is well designed to use the connection pool. When I became a maintainer, I added a few features, but most of what was in Sequel remained the same. It's designed for high concurrency, where Sequel uses the pool for the least amount of time necessary. Sequel does not check out a connection until the final SQL string is ready to be sent to the database and returns the connection as soon as it is finished iterating over the results. Sequel ensures connections are not leaked, so you don't have to clean up connections manually or use a reaper thread.
00:05:08.240 There's also a single-threaded pool, which is faster if thread safety isn't important. The Sequel data set is the defining object, and it sets Sequel apart from other Ruby database libraries. It uses a functional style API where queries are built by chaining methods and doesn't send the query to the database until you ask it to. It's worth noting the lack of any SQL strings; you can write complex database applications with Sequel without ever writing any actual SQL code. You can use Sequel to construct all your SQL by hand.
00:05:55.390 If you want to request records, you can use 'each' or 'all'. 'Each' yields records as they arrive, and 'all' loads all records first. Some adapters buffer all rows in memory first, but with others, 'each' can work with a million record dataset without loading everything into memory. You can set an arbitrary proc called the 'row proc' to call with a row before yielding, which helps when model instances are returned by data sets. Data sets do not cache records, so calling 'each' twice results in two separate queries.
00:06:39.090 Inserting, updating, and deleting records should be intuitive, but be careful with 'delete' and 'update,' as they affect all rows in the receiver. Remember the order: first 'delete,' then 'update.' You should note the use of symbols for database columns; Sequel generally treats Ruby strings as SQL strings and Ruby symbols as SQL columns. Additionally, Sequel supports simple filters using strings and hashes.
00:07:03.330 You can simply use hashes for both equality and inclusion. Sequel also lets you write your queries as Ruby expressions without using the parse tree by adding methods that Ruby does not define. These examples show that Sequel knows how to implement boolean logic, leading to cleaner looking SQL. The sub-select example shown here illustrates how data sets can be used with other data sets. Even these filters are just the tip of the iceberg; queries can get much more complex than this, and Sequel handles them just fine. Putting your objects directly in your queries feels natural to me, more so than using strings with placeholders. The expression syntax is a choice but is hopefully intuitive.
00:08:34.310 Be aware that one issue with doing this is using bitwise operators in place of logical ones, which often requires extra parentheses. If you perform string manipulation in the database, Sequel can help. It supports concatenating strings and performing LIKE searches, and it also supports full-text searching on some databases. I mentioned that Sequel uses symbols for columns, but it also employs them for tables and schemas. Columns often need to be qualified or aliased; you can specify the qualifier or alias in the symbol itself. A double underscore separates a table from a column, and a triple underscore separates a column from an alias. You have methods that do the same thing and can also use this for schema-qualified tables.
00:09:35.640 Sequel makes joining tables easy and flexible. You need to provide the join type, table, and conditions. The typical foreign key to primary key join uses a hash, with the key being the column symbol in the table being joined, and the value being the column symbol in the current table or the last table joined. You can use complex conditions with an expression filter, and you can also use a using join or a natural join; Sequel has helper methods for common join types. When doing multiple joins in a query, you will often have a condition that refers to a previous table instead of the current table. In that case, you need to qualify that column yourself, as otherwise, Sequel may qualify it incorrectly.
00:10:29.920 Unfortunately, joining tables can create problems when multiple tables have columns with the same names. Since Sequel returns rows as hashes, later keys end up clobbering earlier columns. This results in issues where the has-many and belongs-to associations don't work with joined tables in Active Record. You can only select certain columns or alias columns manually, but that's a pain. Wouldn't it be great if something else did the work and returned something usable back? That's what graphing does. Graphing aliases everything for you and returns rows as a hash of hashes, where the main hash's keys are table symbols and the subhashes have column keys and column values. This is much easier than aliasing everything yourself.
00:11:21.770 I think graphing is one of Sequel's unique features, and it makes dealing with table joins at a row-based level much easier. While graphing is great, there are cases where joining manually is better, mainly for performance reasons. A lot of Sequel models remember that model classes are backed by dataset instances, and model instances represent single rows in the dataset. Sequel doesn't require you to use a plain table for a model; you can use a joined dataset if you want. Generally, using a plain table is best if you want to create, update, or delete rows.
00:12:19.490 You can use an ordered dataset to return all rows in a certain order by default or a filtered dataset to restrict the model to a subset of rows. Sequel didn't have associations when I started using it; people generally just wrote their own instance methods. It's easy to do that in Sequel, but if you want caching, callbacks, reflection, eager loading, and methods to add or remove associated objects, it's much more work. Associations handle all that for you and are created with method names that reflect the database relationship rather than imply ownership.
00:12:58.600 Sequel does not use proxies for associations; the many-to-one association returns either the associated object or nil, and the to-many association method always returns an array. Most uses of a proxy can be handled using the association dataset method, which returns a dataset you can filter, reorder, or otherwise modify. The 'many' to 'many' associations include methods for adding associations, removing associations, and removing all associations. I chose not to use proxies because I think they're more difficult to understand, implement, and reason about. With the association dataset method, there is little need for them, and the add, remove, and remove all instance methods are simple and descriptive.
00:13:40.370 Note that add and remove methods affect only the associations; the remove method does not delete the parent object from the database. Some people complain that the association dataset isn't pretty, but I think the name may not be appealing. However, accessing the dataset should be infrequent. If you find yourself using the dataset a lot with different filters, you should add multiple associations with those filters built-in. Doing so leads to more descriptive code. I added the 'clone' option to make this easier; however, 'clone' is only one of about thirty current association options.
00:14:36.520 Most users will only use a small number of them, but they exist to give users near-complete control over all aspects of associations. All association-defining methods also take a block that yields the dataset that you can modify. If there's a specific option missing, you could use this to make an association that is a union of multiple datasets. Many of these options affect eager loading, which I will discuss soon. Sequel provides the same association callbacks as Active Record and also adds an 'after_load' callback, which I use in an accounting app to determine if the associated entry is a debit or credit to the current account.
00:15:26.130 You can also use the 'extend' option to extend the association dataset with a module. Since Sequel is a toolkit, flexibility is key. The added methods are easy to override; you just override them and call 'super'—no aliasing required. The modification methods are split into public and private parts. The private method starts with an underscore and does the query, while the public method handles caching and callbacks.
00:16:00.809 If you want the setter method to accept a hash in addition to a model object, you override the public method and call 'super.' If you want to set an additional column in the database, you override the private method. Sequel's eager loading gives you a choice to load associations in a separate query with 'eager' or to use a join with 'eager_graph.' Using eager is recommended if your association uses a different database, and if your order or filters use columns in an associated table, you must use eager_graph. Both have the same API and can be used simultaneously.
00:16:53.060 I chose to use two methods because they perform very different things and to leave the choice up to the user. Sequel never attempts to parse SQL, so it wouldn't be able to guess which method is appropriate. Sequel only supports three association types natively, but you can support any other type using the dataset option which takes a proc instance. The 'has_many :through' association uses a filtered 'eager_graph' dataset. The 'has_many :through' and 'belongs_to' association are handled by the built-in many-to-many association.
00:17:49.550 You can also create much more complex associations, such as joining on one of multiple keys through a third table. This is not a hypothetical example made purposely difficult; I use this in one of my apps. Active Record can handle that association using custom SQL, but it can't eager load it. Sequel supports eager loading of custom associations using the eager_loader option, which takes a proc that accepts a key hash and an array of current objects and the dependent associations to eagerly load.
00:18:51.240 The key hash is an optimization; it's a hash of hashes with the keys being columns. The subhashes have column value keys, with the values being arrays of related objects. When breaking down how this example works, first, you get the subhash for the primary key and set cache invoices for all firms to the empty array. Next, you get all invoices for all clients of all the firms and eagerly load them using keys in the subhash. For each invoice, you get the value in the subhash for the invoice's client foreign key, adding that invoice to the firm. This feature is powerful enough to load any association that can be eagerly loaded.
00:19:54.920 Any graph or word option is also available to allow you to set custom joins when loading. Now, Sequel does not support polymorphic associations because I believe they're a bad idea. Their only purpose is to reduce the number of tables, which is a solution to a non-problem. An association between two separate classes should have its own join table; for the same reason, you use separate tables for classes, even if they have the same schema. Polymorphic associations tend to be more complex and break referential integrity.
00:20:46.730 If you must use them, there's a Sequel plug-in that handles them. This plug-in doesn't perform any monkey patching or craziness; it simply uses the options and techniques I've already discussed. Sequel's eager loader is also powerful enough to load all ancestors and descendants in a tree structure with a single call. In my opinion, Sequel has the most powerful and flexible associations of any Ruby ORM.
00:21:49.800 Sequel supports validations similar to Active Record. My philosophy is that they should be used primarily for nice error messages, not for data integrity. If you want data integrity, you should use database constraints. It supports building validations, similar to Active Record, but the difference lies in uniqueness, which uses an array instead of relying on scope. It also supports ones you write yourself, which can also take similar options to the generic ones.
00:22:55.740 That's about all I want to discuss regarding validations. Sequel supports most of the same hooks as Active Record, and the usage is currently similar. Again, I prefer database triggers to hooks if you're doing something involving data integrity. However, if you're targeting multiple databases, using hooks and validations is probably easier. Sequel has built-in pagination support, which is really just a wrapper around limit that adds a few methods. It's helpful if you're writing a search engine; each page yields a paginated dataset, a paginated version of the current dataset.
00:23:40.165 If you use an adapter that buffers all records in memory before yielding, it can help process a record set that won't fit in memory. Sequel's built-in caching supports caching any model object with an API of this name is Ruby memcache. You can use Sequel to add, create, and alter tables. Create table accepts table schemas in blocks, and unlike Active Record, you have to be explicit if you want a primary key. You can use the column method or method_missing to create columns, and you can use the index method to create indexes. If you use Ruby classes' types, Sequel will use the most appropriate database type for that class.
00:24:45.490 If you use a symbol, Sequel will just use it directly. Sequel encourages the use of database constraints and allows you to create them using check or constraint methods. Sequel also encourages real foreign key references, with foreign keys taking a table argument. You can also use composite keys if you want. Alter table is similar; you can add or drop columns, constraints, and indexes. Adding foreign and primary keys is straightforward. You can also use composite keys here. You can exchange column names, types, or defaults; most table-altering methods are called directly on the database object.
00:25:50.090 The main difference between Active Record and Sequel migrations is the use of instance methods instead of class methods. There is an individual migration API, but it is recommended to use Sequel's migrator, which deals with the directory of individual migration files, similar to how Rails does. Sequel's command-line tool, called Sequel, can be called either with a connection string or a path to a YAML file as an argument. It provides an IRB shell with a database object in 'DB' already defined, and you can also use it to run migrations like I mentioned earlier.
00:26:59.230 Sequel isn't really designed around the needs of web applications, but it does function well with them. Sequel's default settings are very strict; I'm a big proponent of security, so most people won't turn off some of them. For instance, 'raise_on_typecast_failure' will raise exceptions before validations are called unless you explicitly turn it off. You can set a validation called 'invalidates_not_string' to check that non-string columns don’t have string values.
00:27:36.480 If you have a string value in a non-string column and you have typecasting enabled, raising on typecast failure will show that typecasting failed, hence adding a validation error. You can turn off 'raise_on_type_failure' if you don’t want to use begin and rescue to catch save failures. You can also turn off strict parameter assignment exceptions, but that makes debugging more difficult. So, I recommend caution before doing that.
00:28:00.880 Sequel offers multiple methods for mass assignment; 'set_only' is the recommended one as it only allows access to attributes you specify. This approach is better than setting allowed columns at a model level since you can decide which attributes are allowed in each call. Some forms may allow you to set some attributes but not others, so ensure that no form submission can update more attributes than are available on the form.
00:28:41.760 For consistency, there's also a 'set_restricted' method that works as expected, but I don't recommend it. You can use symbols like you do in Active Record and set allowed or restricted columns at the model level. If your application is simple, that may be the easiest way to go. Overall, Sequel is lightweight compared to Active Record; Sequel core is 63% smaller than DM core, and all of Sequel is 33% smaller. All of Sequel is 3.5 times smaller than DataMapper and 2.5 times smaller than Active Record. Sequel loads as fast as DM core and is 2.5 times faster than Active Record.
00:29:24.790 From the benchmarks I've seen, Sequel's performance hasn't even been a focus during my maintenance; I mainly worked on new features, bug fixes, and cleaning the internals. Sequel is currently at version 2.11, and only a year ago it was at 1.3. There was a release every two to three months for a while, but now I stick to a strict monthly release cycle and take bug reports very seriously.
00:30:12.790 Bugs posted on the bug tracker get fixed within a day to a week. I actually try to fix bugs that don't affect me personally, and you don't even have to provide specs of the bugs; it's obvious. The Sequel bug tracker is usually empty now. Before pushing to GitHub, I run the full release test suite, so the master branch is as stable as it can be. If you use one of the databases I test, I run all my personal apps directly off the master branch.
00:30:56.120 I write pretty descriptive commit messages, so if you're pulling from there, be sure to read them to see any changes in features, as they can impact your app. It's very easy to contribute to Sequel; there's no bureaucracy or need for 'Plus Ones'—just show me the code and tell me why it's a good idea. I accept patches via IRC, the Google group, the bug tracker, or GitHub.
00:31:14.550 I don't accept all patches, and I modify many, but you will get a response quickly. I prefer specs with new features, but I will write my own specs if I think a feature is a good idea. I might even rewrite the implementation if I like the feature but not the patch. The future is coming, and it is Sequel 3.
00:31:48.550 As much as possible, it's going to be compatible with Sequel 2, just like the move from 1.x to 2.x. I'll be deprecating some features in 2.12 and removing them in 3.0. Many built-in features will be moved into plugins or extensions that will still ship with Sequel but will need to be required separately. The majority of this work is already completed and available in the master branch.
00:32:22.390 Sequel 3 will be more lightweight, easier to use, and cleaner internally than Sequel 2. Now to touch on something I haven't mentioned: models are moving to a complete plugin structure. Even the basic model functionality will be a plug-in model, and associations will also be a plug-in. You can have plugins that work at a level for all models and override any method you want; you can just call 'super.' The only method in the Sequel model that is not in a plug-in is the plug-in method, which handles plugins.
00:32:51.840 That concludes my presentation, and I will now open it to the floor for questions.
00:26:30.310 If you have questions about how this all works, feel free to ask!
Explore all talks recorded at MountainWest RubyConf 2009
+3