RailsConf 2023

ActiveRecord::Encryption; Stop Hackers from Reading your Data

ActiveRecord::Encryption; Stop Hackers from Reading your Data

by Matthew Langlois and Kylie Stradley

The video titled "ActiveRecord::Encryption; Stop Hackers from Reading your Data" features speakers Kylie Stradley and Matthew Langlois from GitHub, presenting their insights on data encryption within Rails applications. The discussion introduces the importance of encrypting database columns to enhance security against unauthorized access and data tampering. Key points discussed include:

  • Reasons for Encryption: The necessity of encrypting database columns arises from the need for additional defense layers against hacking and the reduction of risks associated with accidentally exposing sensitive data in logs.
  • ActiveRecord Encryption Overview: The presenters describe ActiveRecord encryption as an easy-to-use, opinionated Rails API that facilitates automatic encryption upon data saving and decryption during access.
  • Migration to ActiveRecord Encryption: The team shares their motivations for transitioning from GitHub's internal encryption strategy to ActiveRecord encryption, highlighting the need to align with Rails standards and to offer developers the ability to generate keys more autonomously.
  • Key Storage and Custom Key Providers: They explain the creation of a custom key provider that leverages a Secure Vault for key management, allowing easy key rotation and efficient key retrieval without extra developer intervention.
  • Upgrading Existing Records: Stradley and Langlois detail their approach to manage legacy data by enabling seamless upgrades for previously encrypted columns and plain text records, employing a reusable transition framework for batch processing.
  • Feature Flags and Compression Considerations: The implementation phase discussed deploying ActiveRecord encryption with feature flags to safely mitigate race conditions during rollout. They emphasize the importance of not compressing data before encryption to maintain data integrity.
  • Developer Experience: The speakers highlight the positive feedback from developers due to the simplified process for adding encryption functionalities without manual key management tasks.

In conclusion, the session provides practical insights into implementing ActiveRecord encryption effectively, making secure data handling a straightforward choice for developers while ensuring robust security practices.

00:00:27 Welcome to Atlanta! If you haven't been welcomed yet, this is ActiveRecord encryption: Stop Hackers from Reading your Data.
00:00:33 My name is Kylie Stradley, and I'm a product security engineer at GitHub. I will have been working there for two years this May.
00:00:39 I am extremely excited to be presenting this morning with my teammate Matt.
00:00:45 Hey everyone! I'm Matt. I've been at GitHub for just over four years now, and I'm super excited to talk about ActiveRecord encryption.
00:00:57 You may have seen on the GitHub blog that we released a blog series about ActiveRecord encryption, our process of upgrading.
00:01:03 While we had a lot of fun writing it, we thought it would be interesting to take a different perspective for a conference talk.
00:01:08 We want to give people the opportunity to ask questions and share their experiences implementing ActiveRecord encryption.
00:01:13 Here's what we'll talk about today: why even encrypt database columns in the first place.
00:01:19 What is ActiveRecord encryption? Why we migrated from our internal encryption strategy.
00:01:24 How we store our keys, implementing a custom key provider, and what I think is the most interesting part.
00:01:30 That part is upgrading existing records. We had previously encrypted records as well as most of the rest of the databases in plain text.
00:01:36 So we've unlocked the ability for our developer teams to convert plain text records to encrypted ones.
00:01:44 They no longer have to feel like they have to plan that super far in advance.
00:01:49 You might have read or understood that GitHub encrypts our database at rest; why do we encrypt individual database columns?
00:01:55 And maybe, why would you even want to do that? Well, for a couple of reasons.
00:02:03 Basically, it provides an additional layer of defense against hackers or bad actors reading or tampering with your sensitive fields.
00:02:09 This is important because of the possibility of someone gaining access to your server, allowing direct database connection.
00:02:17 Or, maybe they've somehow downloaded a copy of your database.
00:02:23 It also prevents another security risk: accidentally exposing these values in logs.
00:02:28 This is a much more interesting problem to prevent than it is to clean up.
00:02:34 It's easier to schedule and plan it out, whereas once you have to clean it, you are working on that until it's finished.
00:02:40 One more note: you might be wondering what kinds of values you would want to encrypt.
00:02:45 Things like API keys, service tokens, secret tokens, or in some cases, email messages.
00:02:52 So, what is ActiveRecord encryption? It's a classic Rails API.
00:02:58 It's very easy to use but also very opinionated. It provides encryption immediately on save and decryption on access.
00:03:05 Here you can see we have an ActiveRecord class called PersonalAccessToken. It's using that encrypts keyword to encrypt a database column called token.
00:03:30 Now we'll talk a bit about the existing encryption strategy that we had in place and why we chose to migrate to ActiveRecord encryption.
00:03:40 When I first started working at GitHub, we had a nice system in place. This implementation was relatively easy to use.
00:03:50 Developers just needed to indicate the name of the database column where they wanted to store their encrypted value.
00:03:56 Here, that was called encrypted token. They also needed to indicate the name for their plain text successor.
00:04:03 Here, that's called plain text token, creating a simple access pattern.
00:04:08 If you've read about ActiveRecord encryption or looked at the last slide, you'll see that it’s actually a very similar API.
00:04:15 Great minds think alike, right? We also provided each column its own encryption key.
00:04:21 We created the name of the encryption key value from the table name. For PersonalAccessToken, you can see that reflected here in the name of our key.
00:04:35 It’s also using the database column name, encrypted token.
00:04:41 The key is prefixed with the encrypted attribute name. You can see the encrypted attribute for PersonalAccessToken encrypted token stored in our Secure Vault.
00:04:55 This seemed like a pretty good system, right? Fairly clean API, good access pattern, Secure Vault storage. Why would you change away from this?
00:05:02 We had a couple of reasons. GitHub has a policy of staying as close to the standard Rails implementation as possible.
00:05:09 So with ActiveRecord encryption released, our encryption code started to diverge from that standard.
00:05:18 This presented the opportunity for a GitHub developer to use ActiveRecord encryption instead of our secured, hardened paved path.
00:05:27 ActiveRecord encryption is excellent, but as a product security team, we must vet any security tooling that engineers at GitHub use.
00:05:34 At that point, we hadn’t vetted it yet; so we were thinking about what to do.
00:05:40 This is kind of the workflow we used when a developer team would reach out to us, asking to encrypt a column.
00:05:59 There’s only one decision point with four steps here, but we provided this information for implementing developers to generate their own encryption keys.
00:06:10 However, we found that developers were more comfortable letting our team generate and upload the encryption keys to our Secure Vault.
00:06:18 This created a bottleneck where other teams depended on our team to generate and upload the keys.
00:06:30 While we were looking at ActiveRecord encryption, we noticed that ActiveRecord encryption’s key provider pattern would allow us to derive keys on demand at the time of writing code.
00:06:42 This is instead of the reactive process of generating and uploading the keys.
00:06:52 Initially, a team would come to us about a new feature involving a sensitive column they wanted to encrypt.
00:07:05 There was a back and forth conversation, after which we would generate and upload the keys for them.
00:07:16 Unfortunately, they couldn't do much work without the key.
00:07:24 This new approach would let them have the conversation with us, leave the meeting, and start encrypting on day one.
00:07:30 Then they wouldn’t have to worry when they pushed to production that their keys were present because your keys would be derived and already available.
00:07:46 As many of you might have heard about Subway apps: life is too short for generating encryption keys by hand, one by one, ActiveRecord encryption.
00:08:00 And I’m sure that's probably why many of you are here and why there are so many encryption fans in the audience this morning.
00:08:04 Now Matt’s going to tell you a bit about how we store our keys.
00:08:11 Thank you, Kylie! At GitHub, we have a business requirement to store our keys in our Secure Vault.
00:08:18 This means that when we generate a key, we need to store it in the Secure Vault rather than the default encrypted YAML file for Rails.
00:08:27 We wanted to do this in a way that allows us to easily read these keys from the Vault during runtime.
00:08:34 All of this should happen without developers needing to specify where to find the key in the Vault.
00:08:40 Through this, we created a custom key provider that not only reads the key from Vault but also allows us to rotate the keys easily.
00:08:46 This rotation can happen regularly, mitigating the risks of key compromise or maintaining good key hygiene.
00:08:53 While we have the default encrypted YAML file, we can use a custom key provider to read the keys from a preferable location, like our Secure Vault.
00:09:01 Kylie will now talk about implementing a custom key provider.
00:09:08 Thank you, Matt! As Matt just explained, the key provider pattern helped us remove the bottleneck in our workflow.
00:09:16 For example, you might be wondering why we chose this strategy instead of some other customization options available.
00:09:24 Other than hubris and unchecked pride, there were reasons for attempting to rewrite various parts of ActiveRecord encryption.
00:09:31 We initially considered using a custom key passed into the model, but there were drawbacks.
00:09:38 The ActiveRecord API lets you provide a key per model, but that requires manual intervention by the implementing developer, which opens the door for mistakes.
00:09:47 While we could enforce this with linting, if someone forgot to pass a key, would it try to use the Rails default key?
00:09:55 We would need to write custom code to prevent that and throw an error.
00:10:02 We decided that developers shouldn't have to care about keys at all.
00:10:10 This decision extended to key derivation as well.
00:10:17 By monkey patching in the key provider, we bypass all of these concerns.
00:10:27 After a lot of deliberation, spiking, and Marathon pair programming sessions,
00:10:35 including considerations of changing Rails encryption context to accept one column at a time, we decided instead to monkey patch the encrypts method.
00:10:44 This adaptation allows us to only encrypt one column at a time per encrypts method.
00:10:51 That divergence made it easier for us to perform the upgrade on a single column at a time.
00:11:00 Matt will explain that upgrade strategy in detail in a few more slides.
00:11:07 As product security engineers, one of our top priorities is making it difficult to make insecure choices.
00:11:18 We want making the secure choice to be the easiest decision you never have to make.
00:11:26 We reduce that risk in this pattern by allowing only a single attribute to be encrypted per call to the encrypts method.
00:11:34 Each call instantiates a new key provider unless, of course, a GitHub key provider is passed, which is used for our upgrade pattern.
00:11:41 If you haven't noticed, the first half of this presentation serves as an advertisement for the coolness of the second half.
00:11:52 These snips don't contain any super secret information, but they're not relevant right now.
00:12:01 Before jumping into the cool part, I’d like you to understand how our key provider delivers encryption and decryption keys.
00:12:10 The current encryption key value you see here is the base keying material stored in Vault.
00:12:18 This key value is used to derive the key we use for AES GCM 256 encryption.
00:12:26 Interestingly, we specifically chose to use the table and column names as the salt to prevent accidental transposition of encrypted records.
00:12:34 Salting is typically involved in adding random noise to a secret value before storage to ensure uniqueness.
00:12:41 In our case, this salt is the keying material that encrypts the actual sensitive data stored in the database.
00:12:49 By using this method, a value encrypted with a salt can only be decrypted by the same model that encrypted it.
00:12:58 This prevents issues of transposition.
00:13:04 Additionally, we use the current year as a non-exhaustion preventative.
00:13:11 This is a concern at GitHub scale where generating a duplicate nonce has potential risks.
00:13:19 ActiveRecord encryption uses AES GCM 256, which means a counter is utilized.
00:13:24 This counter changes the nonce upon each use. Adding the current year to key derivation ensures we derive a new key each year, reducing the likelihood of duplicate nonce.
00:13:36 Another oddity is that we have a dummy key in our key material from the Vault for our ActiveRecord encryption key.
00:13:45 This allows us to use Vault stored keying material as the key ID when we re-materialize the decryption key.
00:13:54 The way we retrieve the decryption key is by working backward—starting with the encrypted message.
00:14:09 We pull the key ID from the encrypted message headers to look up the encryption keying material.
00:14:15 Once we find the matching keying material, we can use it to re-materialize the decryption key.
00:14:25 With this, and the salt of the table and column, we can successfully decrypt data.
00:14:33 Now that I’ve set the stage, Matt will show you how this supports our column encryption upgrade strategy and key rotation.
00:14:50 Thank you, Kylie! I want to talk about upgrading existing records, particularly records encrypted with our previous encryption scheme.
00:14:58 This includes records stored as plain text in the database.
00:15:09 First, let's define a previous encrypter.
00:15:16 ActiveRecord encrypts function allows you to pass a parameter called previous, designating your previous encryption scheme.
00:15:23 This will be used to decrypt the values using your previous encrypted attribute method.
00:15:30 Our monkey patch automatically passes this previous encryption scheme to the call to encrypts, making it easy for developers.
00:15:39 When developers need to decrypt previous records, they can do so without any additional work.
00:15:48 Additionally, when passing a previous encryption scheme, you can also include a key provider to specify which keys will be used.
00:15:56 In our case, the previous encrypter had logic for pulling the keys from the Vault.
00:16:01 We passed a no-op key provider that would return no data for the encryption key.
00:16:06 Now, defining the process for decrypting previous encrypted attributes.
00:16:15 We call this GitHub previous encrypter, initialized with the table and attribute name.
00:16:22 This allows us to derive the key for the previous encryption scheme.
00:16:29 As we attempt to encrypt new records, if decryption fails, it falls back to our previous encryption scheme.
00:16:36 In scenarios where decryption fails using ActiveRecord encryption, we revert to our previous method, enabling the decryption of the value as plain text.
00:16:51 ActiveRecord encryption also has a plain text mode. If decryption fails, it will automatically fall back to plain text.
00:16:57 We chose not to use this option, as it would activate plain text mode across all fields in the database.
00:17:04 Once we upgrade a plain text record to use new ActiveRecord encryption, it should never revert to plain text.
00:17:09 This ability allows simple APIs for developers.
00:17:16 For example, in the Personal Access Token, they can call the encrypts method on token.
00:17:22 This provides the developer with automatic encryption and decryption.
00:17:30 If decryption fails, it falls back to either plain text mode or our previous encrypted mode.
00:17:36 This means data from previous encryption schemes can be easily accessed with one simple encrypts call.
00:17:42 Now, how do we ensure that all the data for a newly encrypted attribute is encrypted in the database?
00:17:48 When a developer adds the encrypts method to a token, they can run something called a transition.
00:17:53 At GitHub, we have what's called transitions, which are reusable migrations.
00:18:00 They function similarly to Rails migrations but can be used across different models.
00:18:06 Essentially, transitions read all records from the database in batches.
00:18:12 For each of those batches, we call the encrypts function on the record.
00:18:18 When it's invoked, it will either read the plain text data or previously encrypted data, then write it back.
00:18:27 In this case, we also added logic to roll back the transition if needed.
00:18:32 Once the transition runs, all the data in the database will be encrypted.
00:18:38 Future data is automatically encrypted through the encrypts function.
00:18:49 Now I want to touch on a few notable things that were important for us at GitHub.
00:18:56 First, we used feature flags to prevent rollout issues with ActiveRecord encryption.
00:19:03 When we deploy to GitHub, we first deploy to a subset of canary servers, testing before going into full production.
00:19:13 This creates an opportunity for a race condition.
00:19:18 When we deploy ActiveRecord encryption to the canary environment, it’s possible for an encryption write to occur.
00:19:26 If we attempt to read it from a server not deployed to canary, it wouldn't be able to retrieve that value.
00:19:35 To work around this, we use feature flags for a safe rollout.
00:19:41 This feature flag would be disabled initially to ensure no encryption occurs and is later toggled for instant implementation across servers.
00:19:56 We wrote a custom cast type for this feature flag.
00:20:06 Rails uses cast types to serialize data to the database.
00:20:14 ActiveRecord encryption has an encrypted attribute type that encrypts upon saving.
00:20:21 We override this encryption cast type, determining whether encryption should occur based on the feature flag.
00:20:31 When the feature flag is enabled, the cast type writes the encrypted value to the database.
00:20:41 If disabled, it falls back to writing the encrypted data using the previous scheme.
00:20:50 This allows for safe deployments while partitioning in our rollout.
00:20:56 Lastly, when we fall back to plain text, we call the cast value's serialize function.
00:21:04 This enables writing as text to the database instead of the encrypted scheme.
00:21:17 It’s important to note that ActiveRecord encryption allows for compression before encryption.
00:21:27 This can check if the string exceeds a certain length, signaling whether to compress.
00:21:35 However, we opted out of compression prior to encryption to store high-entropy data.
00:21:43 While compression may be more effective for regular data, it’s usually bad practice in our use case.
00:21:49 We’ve worked on upstreaming this change as a configuration option, but we are not quite there yet.
00:21:56 In summation, Rails is often termed 'Omakase,' which means it’s a chef’s choice.
00:22:09 We found ActiveRecord encryption to be easy to customize for our existing code base.
00:22:16 The custom key provider has proven to be a solid option for delivering encryption keys, especially when secure storage is necessary.
00:22:26 We recommend the previous encrypter strategy as a way to seamlessly upgrade existing records.
00:22:36 If you’re interested in any of the monkey patches discussed today, please reach out.
00:22:45 If you haven’t had enough of ActiveRecord encryption, we mentioned two blog posts we wrote.
00:22:56 You can find the short link for those, along with the ActiveRecord encryption guide.
00:23:02 Finally, while we are product security engineers, we are not cryptographers.
00:23:09 I found the book 'Real World Cryptography' by David Wong to be extremely helpful.
00:23:15 It helped me understand encryption algorithms and the cryptographic standards involved.
00:23:21 Thank you for your attention! This has been ActiveRecord encryption, stopping hackers from stealing your data.
00:23:29 Now we have time for Q&A. I believe we have about 15 minutes for questions and answers.
00:23:56 AES GCM 256 by default uses a 32-character key, as that’s the standard.
00:24:05 Being a block cipher, AES GCM outputs the same size as input.
00:24:12 Key size shouldn't affect performance significantly.
00:24:17 However, any encryption can cause some performance overhead, but the trade-off is generally worth it.
00:24:29 Developers often appreciate our efforts to simplify these processes.
00:24:40 Previously, we had encrypted attributes, but the process was manual.
00:24:48 We had around five columns using this manual process.
00:24:57 Post blog post awareness, we saw a significant uptick in columns using encryption.
00:25:06 We expanded from five to 15 columns almost instantly after sharing the internal blog post.
00:25:16 Not generating keys and uploading them increased developer happiness.
00:25:24 This eliminated unnecessary server access during every transaction.
00:25:30 Is there a capability for supporting deterministic encryption?
00:25:39 Deterministic encryption allows the value to be encrypted while also acting as an index.
00:25:46 We’ve disabled deterministic encryption at GitHub; if something needs to be secret, it shouldn’t serve as an index.
00:25:57 ActiveRecord encryption does support deterministic encryption, but we haven't explored it yet.
00:26:05 Most of the data we encrypt consists of tokens that aren't typically searched.
00:26:11 What does key rotation look like?
00:26:19 We store keys as a semicolon delimited list in our vault.
00:26:30 When generating keys, we ensure that the latest key is always pulled.
00:26:40 In cases of key compromise, we append a new encryption key to this list.
00:26:50 Using our transition capabilities, we can encrypt all data with this latest key, ensuring security.
00:27:02 We can also remove old keys once we are confident about our security.
00:27:11 Which factors determine if a column should be encrypted?
00:27:21 Initially, we opted for low-hanging fruit: any tokens that might appear in logs.
00:27:31 We have a Sentinel tool that scans pull requests for new columns.
00:27:40 If a new column is added that contains the terms token or secret, we will comment to encrypt the data.
00:27:55 We also promote discussion on a case-by-case basis with teams.
00:28:03 What is required when writing a custom migration to encrypt a new column?
00:28:14 We allude to the fact that while open source, the 'maintenance tasks' gem can assist.
00:28:22 Setting things up only requires calling record.encrypt.
00:28:29 Developers should avoid manually calling record.encrypt or decrypt.
00:28:35 The focus should be on transitioning data.
00:28:42 The good approach is iterating over records in batches to maintain server integrity.
00:28:58 Any additional questions or points of interest?
00:29:04 How about the potential extra space that previously plain text column might take?
00:29:11 Encryption typically isn’t drastically larger than the original data.
00:29:18 AES 256 operates as a block cipher, so the output should match the size of the input.
00:29:28 However, encryption will incur some overhead due to metadata.
00:29:37 This metadata can affect column length in some instances.
00:29:46 We typically manage that with migrations for these instances.
00:29:52 Is it possible to encrypt personal data instead of just tokens?
00:30:04 Initially, we focused on low-hanging fruit like API tokens.
00:30:13 In future phases, we may consider other personal data encryption.
00:30:23 The goal is to ensure sensitive data isn't leaked in logs.
00:30:31 We can certainly encrypt any data if its properties call for it.
00:30:39 Final external thoughts on encryption for lookup values?
00:30:50 Deterministic encryption is disabled here because sensitive values can't serve as indexes.
00:31:01 Let’s organize discussions for future revisits. I hope everyone learned something today!