00:00:10.180
Hi everyone, let's go ahead and get started. This talk is about the features in Rails 5 that you haven't heard about.
00:00:16.730
My alternate title was 'Sean dramatically reads the changelog', but I don’t have a great dramatic voice.
00:00:26.360
So instead, this will be 'Sean undramatically reads the changelog'. Actually, the main focus of this talk is less about the features themselves and more about the stories that went into implementing them.
00:00:39.500
We’re going to focus on the smaller quality of life changes. Those are the things that excite me every time Rails updates.
00:00:47.899
Once I get started, I’ll be fine. I’ll just be pressing buttons on the keyboard anyway.
00:00:58.820
What excites me with every version of Rails is the smaller quality of life features. These are things that aren't going to fundamentally change how I build applications but will make my life a little bit easier.
00:01:06.710
These features include smaller tools, new protections, and performance improvements. Many of them have very interesting stories behind their addition or, in some cases, why they were not added.
00:01:12.260
That’s what we’re going to talk about today. My name is Sean Griffin, and I work full-time on open-source. I’m a committer on Ruby on Rails, and Shopify sponsors me to work on open-source full time.
00:01:24.080
So, thank you, Shopify. As I mentioned, every feature has a story about when it was implemented, why it was added, and in cases like relations, why they weren't added sooner.
00:01:30.110
The first feature I’d like to talk about is one I’ve discussed a lot in the past: the type attributes API. This one is particularly interesting to include in a discussion about Rails 5 because most of the work that went into implementing this feature was actually done in Rails 4.
00:01:42.590
In Rails 4, we revamped the entirety of how we handle type coercion in Active Record in preparation for this public API. The API was 90% implemented in version 4.2 with a few edge cases, and I’ve talked at length about this before.
00:01:58.009
However, I don’t think I’ve ever told the story of why this feature was added. There were many contributing factors, but one project really stood out as the reason we needed this change.
00:02:12.650
This was something I worked on while at Thoughtbot, where the project required all data to be encrypted at rest, including database credentials. So, if you had access to the database, you still needed to be unable to read anything. We initially used a gem called 'attr_encrypted'.
00:02:39.820
This is what using that gem looks like: you call 'attr_encrypted' and provide the name of an attribute, and it will assume that the database column is called that attribute name plus '_encrypted'. It defines a reader and a writer for the unencrypted form and performs the encryption in Ruby.
00:03:06.590
However, the problem with this approach is that it doesn't work with methods like 'where' or 'find_by'. There was a feature at the time that has since been removed, which provided an escape hatch, but it was defined in a way that we don't write Rails applications anymore.
00:03:18.830
Using this gem would mean that our entire application had to adjust to accommodate this encrypted field. We ended up using gems like Ransack for complex search queries and had to write a significant hack using the PG Crypto extension.
00:03:40.430
This hack involved traversing the entire Arel AST looking for any binary nodes where the left side was an encrypted attribute and replacing it with the corresponding encryption or decryption methods. It turned out to be some of the dirtiest code I've ever written.
00:04:04.100
Around that time, Ernie Miller was giving a talk at RailsConf about some of the intricacies of Active Record, and he mentioned that it would be really nice if Active Record defined the accessor methods to match the database schema by calling some public API automatically.
00:04:21.620
I agreed, as it resembled what I had been looking to do for quite some time. Fast forward eight months, I got into Rails to start adding this feature and vastly underestimated the amount of work involved.
00:04:36.510
We quickly put together a working implementation of the API, but I was bothered that type casting in Rails 4.1 happened by grabbing the column object associated with that field and calling a typecast method.
00:04:50.930
In Rails 4, types were basically represented as symbols when loading from the database. We looked at the SQL string representing the type in the database and created a simplified type from that, which was ultimately represented as a symbol.
00:05:08.240
The initial implementation got rid of the case statement, replacing it with objects that we could inject. However, typecasting continued to utilize columns in 4.2, and in the earliest implementations, when you called the attributes API, we created a new column object that was stored in the columns hash.
00:05:30.020
It was particularly concerning when used for something not backed by a database field because there wouldn’t be a column involved. Much of the confusion stemmed from conflating columns and attributes. The initial creation of the API was somewhat straightforward, but it took another eight months to arrive at an implementation I was truly satisfied with.
00:05:49.580
This required rewriting nearly every part of Active Record, aside from the association code, that was touched by this change. We finally reached a point with a clean and simple implementation.
00:06:08.840
The final API looks like this: you call the class macro 'attribute', provide the name of the attribute, and then give it a type object. I like to provide examples passing an explicit type object, especially when using constructor arguments.
00:06:20.850
This was important for what I humorously refer to as my overly-engineered, extremely complicated replacement for 'attr_accessor.' I aimed to work with objects, as this clarifies the semantics of things like dependency injection.
00:06:37.740
This API is universal, ensuring that we can retrieve type information from a column when needed, which is somewhat how it worked in Rails 4.1. I aimed for the code to be structured to prevent users from mistakenly using the column object.
00:06:57.600
The big benefit of the attribute API is its compatibility with methods like 'where'. For instance, if you have a 'price' column represented as a money object, you can use 'where' with it seamlessly.
00:07:14.370
The schema inference code looks like what we described: we loop through all our columns and call the public API to define attributes. The 'define_attribute' method functions similarly, but it is stricter than the 'attribute' method, which is lazy as it waits for the column definitions.
00:07:34.480
The lengthy development time for this feature occurred because I didn’t want to ship it until I was content with the implementation. Specifically, I felt that modifying the column objects was technically visible through public API.
00:07:53.420
Now, we are in Rails 5, and this feature is finally available. This was instrumental in my journey into open source, reaffirming the need for another full-time contributor to Rails. It holds a special place in my heart.
00:08:09.710
I've discussed this topic extensively in the past, and I don't want to dwell on it too long. Now, let’s move on to the single most frequently requested feature since Rails 3.
00:08:21.020
I’ve seen it proposed countless times, and I'm sincerely sorry it took this long to get introduced: Active Support’s left pad method!
00:08:35.210
Finally, we provide a way to add padding to the left side of a string. We feel this feature is so important that we are shipping it as a separate gem independent of Rails.
00:08:45.260
We genuinely hope to have everyone in the Ruby ecosystem depend on this great new feature... sorry, I might be getting ahead of myself!
00:09:00.250
But, seriously, this is probably the most requested feature—in relation to Active Record—finally here: after six years since relations were introduced, you can now add an 'or' expression to your where clause.
00:09:14.920
With such a public response to this feature, it begs the question: what took so long? Why didn’t we do this sooner?
00:09:24.750
Many reasons contributed to this delay, but the most significant is that the API is considerably less obvious than one might think.
00:09:41.030
One thing people often forget when dealing with open source projects is that it's extremely difficult to change or remove an API once it has been introduced.
00:09:57.130
We can't just implement something ‘good enough’ as a temporary patch unless we are confident that it won't become an issue later on.
00:10:12.460
We needed to be sure that we were shipping the right solution. The API we released takes a relation and is a method on that relation, taking another relation as the argument.
00:10:29.480
This is the intended use case: if we have two named scopes, one called 'recent' and another called 'pinned', and we want anything that is either recent or pinned to appear on the front page of our blog.
00:10:41.080
You would then create another scope using 'recent or pinned'. This specific design was made to optimize for named scopes.
00:10:54.950
We wanted to allow you to reuse named scopes since, if something appears as half of an 'or', it is likely to be used independently of the 'or' as well.
00:11:07.240
We really aimed to optimize for compositional abstraction, offering developers a tool that allows writing reusable, easily changeable code.
00:11:22.060
However, there were several other proposals for this API, and this was likely the most common suggestion: simply using 'or' with a hash that behaves exactly like 'where'.
00:11:40.950
Yet, this introduced significant issues. For example, 'or' does not imply 'where' at all, and it's not the only place in SQL where it can appear.
00:11:55.370
It could also appear inside a 'having' clause. Since the relation exists in the realm of set theory, it would be reasonable to think that 'or' meant a union query.
00:12:07.900
Another common proposal was to make it 'where da or'. We ultimately decided against this as it wouldn’t meet our goals for the API, specifically not allowing easy reuse of scopes.
00:12:19.460
There are drawbacks to our approach, particularly outside of named scopes. If you’re constructing the 'or' ad hoc in a controller, you have to repeat the class name.
00:12:29.830
This gets even trickier when you're not using a named scope and need to perform a one-off query.
00:12:39.220
Ultimately, we encourage developers to avoid ad hoc one-off queries and instead define named scopes for better code reuse.
00:12:51.430
Now, let’s discuss a different topic: how many of you have ever set your database URL to your production database and then accidentally dropped it?
00:13:07.320
This happens all the time, usually when using tools to pull down environment variables from your production server, forgetting to unset the database URL before running tests.
00:13:26.750
If you're running tests with Capybara and using a server that runs in another thread, running database cleaner can lead to the accidental deletion of your production database.
00:13:40.270
This was happening frequently enough that we decided it was time to take action against this common mistake. Our first thought was to simply remove the database URL.
00:13:57.630
We suggested running 'rails db:migrate', but then we’d break workflows on platforms like Heroku. Richard Neiman championed this feature and led the initiative to rethink this issue.
00:14:18.850
We convened to brainstorm alternative solutions that wouldn’t disrupt container-based infrastructure, and we ended up with a more complex implementation.
00:14:34.390
In addition to schema migrations—automatically generated for you—we now have a general-purpose metadata table with two columns: key and value.
00:14:55.540
Currently, the only entry in this table will be the name of the environment that was last run against, which may not be visible during typical operations.
00:15:12.830
If you attempt to perform a destructive action before running a migration, you will receive an error stating that we don’t know the state of the database.
00:15:28.530
This new feature protects against the catastrophic mistake of dropping a production database, and although it may seem minor, it's very significant.
00:15:51.160
It allows Rails to silently protect you from this common danger if we've done our job right. Again, I’d like to thank Richard for his work on this feature.
00:16:06.880
Now, let's discuss something different regarding migrations. A question that isn’t often asked is whether you can still run migrations from two years ago.
00:16:24.350
Can new developers run all migrations without running 'db:setup' first? Unfortunately, this has been a problem.
00:16:39.980
Weak migrations are static code that isn't tested. If a new Rails version introduces a breaking change, you're unlikely to discover that issue upfront.
00:16:56.350
Additionally, we’ve seen a trend in user requirements where Rails apps are deployed independently, meaning multiple production environments.
00:17:11.750
In cases like Discourse or ManagedIQ, where many production instances exist, the ability to run old migrations can lead to trouble.
00:17:25.800
An upgrade might skip a version, resulting in executing migrations aligned with Rails 5, even if they were created for Rails 4. This can cause breakage.
00:17:42.270
Still, we want to keep adding new features, changes, and improvements. For instance, we have added foreign key support and improved indexes.
00:17:55.830
However, we cannot change the behavior of existing API methods like 'references' or 'null: false' due to backwards compatibility.
00:18:09.800
In Rails 5, migrations now display the version of Rails they were generated against as square brackets after the class name.
00:18:21.740
If a migration doesn’t have this version number, we will assume it was for Rails 4.2 or older. Migrations written today will work for the foreseeable future.
00:18:38.270
This change is not trivial, as maintaining every version of the migrations API until the end of time adds complexity.
00:18:55.640
While this means bug fixes will become more challenging, I still believe it is advantageous to maintain functionality.
00:19:05.680
Another feature we added is 'accessed_fields' in Active Record. This gives you an array of all the fields accessed on an Active Record object.
00:19:22.300
For example, when you create a new user model and access no fields, 'accessed_fields' returns an empty array.
00:19:37.470
However, after accessing a field, such as 'name', it would return that field's name in the array.
00:19:54.520
The pattern I noticed is that developers rarely call 'select' unless performing specific calculations.
00:20:08.400
In views displaying user names, we frequently select all columns, even though a user table may have nearly 100 columns, which incurs a performance cost.
00:20:24.340
So, to simplify this process, I hoped to facilitate copying the list from 'accessed_fields' directly into 'select'.
00:20:40.560
If a field was accessed that wasn’t selected, it would throw an exception, prompting you to update your 'select' statement.
00:20:55.360
The idea was to create a background job that would warn you if less than 50% of selected fields were accessed, indicating significant potential for performance improvement.
00:21:11.700
This feature was trivial to implement, and I was happy to see that it worked well within the refactored Active Record reached after typecasting changes.
00:21:25.760
Next, let’s discuss using boolean fields in Rails. Let's think about how you might use regex with a boolean field.
00:21:40.340
Imagine you have a regex matching a string and want to assign the result to a boolean field. You might expect the field to reflect whether a match occurred.
00:21:56.480
In your head, it's clear that the output would be true; however, in Rails, the implementation has nuances that can yield unexpected results.
00:22:12.700
For example, in Rails 4.1 and earlier, this boolean type handled constants that included strings such as 'T'.
00:22:25.480
Match data objects are not included in the array of 'true' values, leading to confusion. Using the squiggly operator instead might yield misleading results.
00:22:41.520
To mitigate these issues, we adjusted how we handle boolean evaluations, establishing a clearer distinction between truthy and falsy values.
00:22:57.320
This aimed to fix the complexities arising from earlier implementations while making our approach closer to how Ruby operates.
00:23:14.200
The goal was to eliminate confusing scenarios and ensure that our boolean handling works more distinctly and coherently.
00:23:28.640
Those are some of the stories I hoped to share with you today. While I have you all as a captive audience, I’d like to promote my new ORM written in Rust called Diesel.
00:23:43.750
If you're interested, I have stickers! Thank you to Shopify for sponsoring me to work on open-source full-time. I appreciate them allowing me to attend conferences and share knowledge.
00:23:58.850
If you'd like to collaborate with Raphael and me at Shopify, please let us know. Thank you all very much! I will now take any questions.
00:24:12.380
The first question was about chaining 'or'. Yes, you can chain 'or' expressions, and you can also combine them with 'and'. The precedence will match what you pass along.
00:24:20.500
The next question was about adding an explicit 'and' to the API. I think it’s an interesting idea, but we haven’t had a specific discussion about it.
00:24:28.300
Yes, null does not represent a faulty value. If your column is not null, attempts to insert null will cause an error.
00:24:40.100
The question was about access to fields in Active Record relations. Currently, we only track accessed fields for individual models.
00:24:56.490
Accessing fields for models in a relation would present implementation challenges, as I'd need to know that every model would be used the same way.
00:25:10.720
The following question was about the metadata table mentioned earlier. It’s named something like '__active_record_metadata__'. I don’t expect much collision with it.
00:25:24.450
Furthermore, to ensure uniqueness, we opted for a double underscore prefix. The next question discussed pulling down the production database locally.
00:25:36.830
Typically, this should be smoothly handled. If you perform destructive actions without running a migration, you’ll see an error.
00:25:49.870
The question about dropping outdated tables will also result in errors unless you’ve executed a migration.
00:26:02.400
The square brackets in migrations were implemented using the square bracket method on the class object to return a new anonymous class.
00:26:17.210
The question was about future ideas for Rails 6, particularly regarding connection adapters. We're considering making third-party connection adapters more stable through a public API.
00:26:35.310
This will necessitate breaking changes, but ultimately, we want adapters to function well long-term. I'm also contemplating moving the attributes API to Active Model.
00:27:02.660
This migration will involve breaking changes pertained to dirty-checking functionality, but likely wouldn’t be too disruptive overall.
00:27:18.480
Thank you all very much! I will remain available for any additional questions you may have.