00:00:27
Welcome to Atlanta! If you haven't been welcomed yet, this is ActiveRecord encryption: Stop Hackers from Reading your Data.
00:00:33
My name is Kylie Stradley, and I'm a product security engineer at GitHub. I will have been working there for two years this May.
00:00:39
I am extremely excited to be presenting this morning with my teammate Matt.
00:00:45
Hey everyone! I'm Matt. I've been at GitHub for just over four years now, and I'm super excited to talk about ActiveRecord encryption.
00:00:57
You may have seen on the GitHub blog that we released a blog series about ActiveRecord encryption, our process of upgrading.
00:01:03
While we had a lot of fun writing it, we thought it would be interesting to take a different perspective for a conference talk.
00:01:08
We want to give people the opportunity to ask questions and share their experiences implementing ActiveRecord encryption.
00:01:13
Here's what we'll talk about today: why even encrypt database columns in the first place.
00:01:19
What is ActiveRecord encryption? Why we migrated from our internal encryption strategy.
00:01:24
How we store our keys, implementing a custom key provider, and what I think is the most interesting part.
00:01:30
That part is upgrading existing records. We had previously encrypted records as well as most of the rest of the databases in plain text.
00:01:36
So we've unlocked the ability for our developer teams to convert plain text records to encrypted ones.
00:01:44
They no longer have to feel like they have to plan that super far in advance.
00:01:49
You might have read or understood that GitHub encrypts our database at rest; why do we encrypt individual database columns?
00:01:55
And maybe, why would you even want to do that? Well, for a couple of reasons.
00:02:03
Basically, it provides an additional layer of defense against hackers or bad actors reading or tampering with your sensitive fields.
00:02:09
This is important because of the possibility of someone gaining access to your server, allowing direct database connection.
00:02:17
Or, maybe they've somehow downloaded a copy of your database.
00:02:23
It also prevents another security risk: accidentally exposing these values in logs.
00:02:28
This is a much more interesting problem to prevent than it is to clean up.
00:02:34
It's easier to schedule and plan it out, whereas once you have to clean it, you are working on that until it's finished.
00:02:40
One more note: you might be wondering what kinds of values you would want to encrypt.
00:02:45
Things like API keys, service tokens, secret tokens, or in some cases, email messages.
00:02:52
So, what is ActiveRecord encryption? It's a classic Rails API.
00:02:58
It's very easy to use but also very opinionated. It provides encryption immediately on save and decryption on access.
00:03:05
Here you can see we have an ActiveRecord class called PersonalAccessToken. It's using that encrypts keyword to encrypt a database column called token.
00:03:30
Now we'll talk a bit about the existing encryption strategy that we had in place and why we chose to migrate to ActiveRecord encryption.
00:03:40
When I first started working at GitHub, we had a nice system in place. This implementation was relatively easy to use.
00:03:50
Developers just needed to indicate the name of the database column where they wanted to store their encrypted value.
00:03:56
Here, that was called encrypted token. They also needed to indicate the name for their plain text successor.
00:04:03
Here, that's called plain text token, creating a simple access pattern.
00:04:08
If you've read about ActiveRecord encryption or looked at the last slide, you'll see that it’s actually a very similar API.
00:04:15
Great minds think alike, right? We also provided each column its own encryption key.
00:04:21
We created the name of the encryption key value from the table name. For PersonalAccessToken, you can see that reflected here in the name of our key.
00:04:35
It’s also using the database column name, encrypted token.
00:04:41
The key is prefixed with the encrypted attribute name. You can see the encrypted attribute for PersonalAccessToken encrypted token stored in our Secure Vault.
00:04:55
This seemed like a pretty good system, right? Fairly clean API, good access pattern, Secure Vault storage. Why would you change away from this?
00:05:02
We had a couple of reasons. GitHub has a policy of staying as close to the standard Rails implementation as possible.
00:05:09
So with ActiveRecord encryption released, our encryption code started to diverge from that standard.
00:05:18
This presented the opportunity for a GitHub developer to use ActiveRecord encryption instead of our secured, hardened paved path.
00:05:27
ActiveRecord encryption is excellent, but as a product security team, we must vet any security tooling that engineers at GitHub use.
00:05:34
At that point, we hadn’t vetted it yet; so we were thinking about what to do.
00:05:40
This is kind of the workflow we used when a developer team would reach out to us, asking to encrypt a column.
00:05:59
There’s only one decision point with four steps here, but we provided this information for implementing developers to generate their own encryption keys.
00:06:10
However, we found that developers were more comfortable letting our team generate and upload the encryption keys to our Secure Vault.
00:06:18
This created a bottleneck where other teams depended on our team to generate and upload the keys.
00:06:30
While we were looking at ActiveRecord encryption, we noticed that ActiveRecord encryption’s key provider pattern would allow us to derive keys on demand at the time of writing code.
00:06:42
This is instead of the reactive process of generating and uploading the keys.
00:06:52
Initially, a team would come to us about a new feature involving a sensitive column they wanted to encrypt.
00:07:05
There was a back and forth conversation, after which we would generate and upload the keys for them.
00:07:16
Unfortunately, they couldn't do much work without the key.
00:07:24
This new approach would let them have the conversation with us, leave the meeting, and start encrypting on day one.
00:07:30
Then they wouldn’t have to worry when they pushed to production that their keys were present because your keys would be derived and already available.
00:07:46
As many of you might have heard about Subway apps: life is too short for generating encryption keys by hand, one by one, ActiveRecord encryption.
00:08:00
And I’m sure that's probably why many of you are here and why there are so many encryption fans in the audience this morning.
00:08:04
Now Matt’s going to tell you a bit about how we store our keys.
00:08:11
Thank you, Kylie! At GitHub, we have a business requirement to store our keys in our Secure Vault.
00:08:18
This means that when we generate a key, we need to store it in the Secure Vault rather than the default encrypted YAML file for Rails.
00:08:27
We wanted to do this in a way that allows us to easily read these keys from the Vault during runtime.
00:08:34
All of this should happen without developers needing to specify where to find the key in the Vault.
00:08:40
Through this, we created a custom key provider that not only reads the key from Vault but also allows us to rotate the keys easily.
00:08:46
This rotation can happen regularly, mitigating the risks of key compromise or maintaining good key hygiene.
00:08:53
While we have the default encrypted YAML file, we can use a custom key provider to read the keys from a preferable location, like our Secure Vault.
00:09:01
Kylie will now talk about implementing a custom key provider.
00:09:08
Thank you, Matt! As Matt just explained, the key provider pattern helped us remove the bottleneck in our workflow.
00:09:16
For example, you might be wondering why we chose this strategy instead of some other customization options available.
00:09:24
Other than hubris and unchecked pride, there were reasons for attempting to rewrite various parts of ActiveRecord encryption.
00:09:31
We initially considered using a custom key passed into the model, but there were drawbacks.
00:09:38
The ActiveRecord API lets you provide a key per model, but that requires manual intervention by the implementing developer, which opens the door for mistakes.
00:09:47
While we could enforce this with linting, if someone forgot to pass a key, would it try to use the Rails default key?
00:09:55
We would need to write custom code to prevent that and throw an error.
00:10:02
We decided that developers shouldn't have to care about keys at all.
00:10:10
This decision extended to key derivation as well.
00:10:17
By monkey patching in the key provider, we bypass all of these concerns.
00:10:27
After a lot of deliberation, spiking, and Marathon pair programming sessions,
00:10:35
including considerations of changing Rails encryption context to accept one column at a time, we decided instead to monkey patch the encrypts method.
00:10:44
This adaptation allows us to only encrypt one column at a time per encrypts method.
00:10:51
That divergence made it easier for us to perform the upgrade on a single column at a time.
00:11:00
Matt will explain that upgrade strategy in detail in a few more slides.
00:11:07
As product security engineers, one of our top priorities is making it difficult to make insecure choices.
00:11:18
We want making the secure choice to be the easiest decision you never have to make.
00:11:26
We reduce that risk in this pattern by allowing only a single attribute to be encrypted per call to the encrypts method.
00:11:34
Each call instantiates a new key provider unless, of course, a GitHub key provider is passed, which is used for our upgrade pattern.
00:11:41
If you haven't noticed, the first half of this presentation serves as an advertisement for the coolness of the second half.
00:11:52
These snips don't contain any super secret information, but they're not relevant right now.
00:12:01
Before jumping into the cool part, I’d like you to understand how our key provider delivers encryption and decryption keys.
00:12:10
The current encryption key value you see here is the base keying material stored in Vault.
00:12:18
This key value is used to derive the key we use for AES GCM 256 encryption.
00:12:26
Interestingly, we specifically chose to use the table and column names as the salt to prevent accidental transposition of encrypted records.
00:12:34
Salting is typically involved in adding random noise to a secret value before storage to ensure uniqueness.
00:12:41
In our case, this salt is the keying material that encrypts the actual sensitive data stored in the database.
00:12:49
By using this method, a value encrypted with a salt can only be decrypted by the same model that encrypted it.
00:12:58
This prevents issues of transposition.
00:13:04
Additionally, we use the current year as a non-exhaustion preventative.
00:13:11
This is a concern at GitHub scale where generating a duplicate nonce has potential risks.
00:13:19
ActiveRecord encryption uses AES GCM 256, which means a counter is utilized.
00:13:24
This counter changes the nonce upon each use. Adding the current year to key derivation ensures we derive a new key each year, reducing the likelihood of duplicate nonce.
00:13:36
Another oddity is that we have a dummy key in our key material from the Vault for our ActiveRecord encryption key.
00:13:45
This allows us to use Vault stored keying material as the key ID when we re-materialize the decryption key.
00:13:54
The way we retrieve the decryption key is by working backward—starting with the encrypted message.
00:14:09
We pull the key ID from the encrypted message headers to look up the encryption keying material.
00:14:15
Once we find the matching keying material, we can use it to re-materialize the decryption key.
00:14:25
With this, and the salt of the table and column, we can successfully decrypt data.
00:14:33
Now that I’ve set the stage, Matt will show you how this supports our column encryption upgrade strategy and key rotation.
00:14:50
Thank you, Kylie! I want to talk about upgrading existing records, particularly records encrypted with our previous encryption scheme.
00:14:58
This includes records stored as plain text in the database.
00:15:09
First, let's define a previous encrypter.
00:15:16
ActiveRecord encrypts function allows you to pass a parameter called previous, designating your previous encryption scheme.
00:15:23
This will be used to decrypt the values using your previous encrypted attribute method.
00:15:30
Our monkey patch automatically passes this previous encryption scheme to the call to encrypts, making it easy for developers.
00:15:39
When developers need to decrypt previous records, they can do so without any additional work.
00:15:48
Additionally, when passing a previous encryption scheme, you can also include a key provider to specify which keys will be used.
00:15:56
In our case, the previous encrypter had logic for pulling the keys from the Vault.
00:16:01
We passed a no-op key provider that would return no data for the encryption key.
00:16:06
Now, defining the process for decrypting previous encrypted attributes.
00:16:15
We call this GitHub previous encrypter, initialized with the table and attribute name.
00:16:22
This allows us to derive the key for the previous encryption scheme.
00:16:29
As we attempt to encrypt new records, if decryption fails, it falls back to our previous encryption scheme.
00:16:36
In scenarios where decryption fails using ActiveRecord encryption, we revert to our previous method, enabling the decryption of the value as plain text.
00:16:51
ActiveRecord encryption also has a plain text mode. If decryption fails, it will automatically fall back to plain text.
00:16:57
We chose not to use this option, as it would activate plain text mode across all fields in the database.
00:17:04
Once we upgrade a plain text record to use new ActiveRecord encryption, it should never revert to plain text.
00:17:09
This ability allows simple APIs for developers.
00:17:16
For example, in the Personal Access Token, they can call the encrypts method on token.
00:17:22
This provides the developer with automatic encryption and decryption.
00:17:30
If decryption fails, it falls back to either plain text mode or our previous encrypted mode.
00:17:36
This means data from previous encryption schemes can be easily accessed with one simple encrypts call.
00:17:42
Now, how do we ensure that all the data for a newly encrypted attribute is encrypted in the database?
00:17:48
When a developer adds the encrypts method to a token, they can run something called a transition.
00:17:53
At GitHub, we have what's called transitions, which are reusable migrations.
00:18:00
They function similarly to Rails migrations but can be used across different models.
00:18:06
Essentially, transitions read all records from the database in batches.
00:18:12
For each of those batches, we call the encrypts function on the record.
00:18:18
When it's invoked, it will either read the plain text data or previously encrypted data, then write it back.
00:18:27
In this case, we also added logic to roll back the transition if needed.
00:18:32
Once the transition runs, all the data in the database will be encrypted.
00:18:38
Future data is automatically encrypted through the encrypts function.
00:18:49
Now I want to touch on a few notable things that were important for us at GitHub.
00:18:56
First, we used feature flags to prevent rollout issues with ActiveRecord encryption.
00:19:03
When we deploy to GitHub, we first deploy to a subset of canary servers, testing before going into full production.
00:19:13
This creates an opportunity for a race condition.
00:19:18
When we deploy ActiveRecord encryption to the canary environment, it’s possible for an encryption write to occur.
00:19:26
If we attempt to read it from a server not deployed to canary, it wouldn't be able to retrieve that value.
00:19:35
To work around this, we use feature flags for a safe rollout.
00:19:41
This feature flag would be disabled initially to ensure no encryption occurs and is later toggled for instant implementation across servers.
00:19:56
We wrote a custom cast type for this feature flag.
00:20:06
Rails uses cast types to serialize data to the database.
00:20:14
ActiveRecord encryption has an encrypted attribute type that encrypts upon saving.
00:20:21
We override this encryption cast type, determining whether encryption should occur based on the feature flag.
00:20:31
When the feature flag is enabled, the cast type writes the encrypted value to the database.
00:20:41
If disabled, it falls back to writing the encrypted data using the previous scheme.
00:20:50
This allows for safe deployments while partitioning in our rollout.
00:20:56
Lastly, when we fall back to plain text, we call the cast value's serialize function.
00:21:04
This enables writing as text to the database instead of the encrypted scheme.
00:21:17
It’s important to note that ActiveRecord encryption allows for compression before encryption.
00:21:27
This can check if the string exceeds a certain length, signaling whether to compress.
00:21:35
However, we opted out of compression prior to encryption to store high-entropy data.
00:21:43
While compression may be more effective for regular data, it’s usually bad practice in our use case.
00:21:49
We’ve worked on upstreaming this change as a configuration option, but we are not quite there yet.
00:21:56
In summation, Rails is often termed 'Omakase,' which means it’s a chef’s choice.
00:22:09
We found ActiveRecord encryption to be easy to customize for our existing code base.
00:22:16
The custom key provider has proven to be a solid option for delivering encryption keys, especially when secure storage is necessary.
00:22:26
We recommend the previous encrypter strategy as a way to seamlessly upgrade existing records.
00:22:36
If you’re interested in any of the monkey patches discussed today, please reach out.
00:22:45
If you haven’t had enough of ActiveRecord encryption, we mentioned two blog posts we wrote.
00:22:56
You can find the short link for those, along with the ActiveRecord encryption guide.
00:23:02
Finally, while we are product security engineers, we are not cryptographers.
00:23:09
I found the book 'Real World Cryptography' by David Wong to be extremely helpful.
00:23:15
It helped me understand encryption algorithms and the cryptographic standards involved.
00:23:21
Thank you for your attention! This has been ActiveRecord encryption, stopping hackers from stealing your data.
00:23:29
Now we have time for Q&A. I believe we have about 15 minutes for questions and answers.
00:23:56
AES GCM 256 by default uses a 32-character key, as that’s the standard.
00:24:05
Being a block cipher, AES GCM outputs the same size as input.
00:24:12
Key size shouldn't affect performance significantly.
00:24:17
However, any encryption can cause some performance overhead, but the trade-off is generally worth it.
00:24:29
Developers often appreciate our efforts to simplify these processes.
00:24:40
Previously, we had encrypted attributes, but the process was manual.
00:24:48
We had around five columns using this manual process.
00:24:57
Post blog post awareness, we saw a significant uptick in columns using encryption.
00:25:06
We expanded from five to 15 columns almost instantly after sharing the internal blog post.
00:25:16
Not generating keys and uploading them increased developer happiness.
00:25:24
This eliminated unnecessary server access during every transaction.
00:25:30
Is there a capability for supporting deterministic encryption?
00:25:39
Deterministic encryption allows the value to be encrypted while also acting as an index.
00:25:46
We’ve disabled deterministic encryption at GitHub; if something needs to be secret, it shouldn’t serve as an index.
00:25:57
ActiveRecord encryption does support deterministic encryption, but we haven't explored it yet.
00:26:05
Most of the data we encrypt consists of tokens that aren't typically searched.
00:26:11
What does key rotation look like?
00:26:19
We store keys as a semicolon delimited list in our vault.
00:26:30
When generating keys, we ensure that the latest key is always pulled.
00:26:40
In cases of key compromise, we append a new encryption key to this list.
00:26:50
Using our transition capabilities, we can encrypt all data with this latest key, ensuring security.
00:27:02
We can also remove old keys once we are confident about our security.
00:27:11
Which factors determine if a column should be encrypted?
00:27:21
Initially, we opted for low-hanging fruit: any tokens that might appear in logs.
00:27:31
We have a Sentinel tool that scans pull requests for new columns.
00:27:40
If a new column is added that contains the terms token or secret, we will comment to encrypt the data.
00:27:55
We also promote discussion on a case-by-case basis with teams.
00:28:03
What is required when writing a custom migration to encrypt a new column?
00:28:14
We allude to the fact that while open source, the 'maintenance tasks' gem can assist.
00:28:22
Setting things up only requires calling record.encrypt.
00:28:29
Developers should avoid manually calling record.encrypt or decrypt.
00:28:35
The focus should be on transitioning data.
00:28:42
The good approach is iterating over records in batches to maintain server integrity.
00:28:58
Any additional questions or points of interest?
00:29:04
How about the potential extra space that previously plain text column might take?
00:29:11
Encryption typically isn’t drastically larger than the original data.
00:29:18
AES 256 operates as a block cipher, so the output should match the size of the input.
00:29:28
However, encryption will incur some overhead due to metadata.
00:29:37
This metadata can affect column length in some instances.
00:29:46
We typically manage that with migrations for these instances.
00:29:52
Is it possible to encrypt personal data instead of just tokens?
00:30:04
Initially, we focused on low-hanging fruit like API tokens.
00:30:13
In future phases, we may consider other personal data encryption.
00:30:23
The goal is to ensure sensitive data isn't leaked in logs.
00:30:31
We can certainly encrypt any data if its properties call for it.
00:30:39
Final external thoughts on encryption for lookup values?
00:30:50
Deterministic encryption is disabled here because sensitive values can't serve as indexes.
00:31:01
Let’s organize discussions for future revisits. I hope everyone learned something today!