00:00:15.080
yes so again good morning uh this is
00:00:17.760
everything that we learned the hard way
00:00:19.520
implementing active record encryption um
00:00:22.119
thank you so much Miriam for the
00:00:23.760
introduction um my name is Kylie
00:00:25.800
stradley you may recognize me oops um
00:00:30.000
from a presentation I gave earlier this
00:00:31.759
year at rails comp in Atlanta Georgia
00:00:33.640
with my cooworker Matt um this was
00:00:36.399
called active record encryption stop
00:00:38.600
Hackers from reading your data and it
00:00:40.840
was kind of about you know sort of just
00:00:42.559
convincing people the value of using
00:00:44.360
active record encryption and we spoke a
00:00:46.480
little bit about some of the changes
00:00:47.920
that we made um if you are jetl like I
00:00:51.320
am or uh aren't sure which talk you are
00:00:53.800
in we can just do a quick refresh of
00:00:55.559
active record encryption so you can be
00:00:57.160
certain you want to be here for you know
00:00:58.719
the next 30 minutes
00:01:00.760
um it's this really lovely API that we
00:01:03.199
get for free from rails right it
00:01:05.560
provides automatic encryption of
00:01:07.080
database records and pl Tex access when
00:01:09.400
you need
00:01:10.560
it we upgraded our previously encrypted
00:01:14.360
columns with an internal strategy that
00:01:16.280
the team wrote in about 2020 and um
00:01:19.400
upgraded some plain text columns to
00:01:21.360
active record encryption so before we
00:01:25.600
had this bespoke internal strategy which
00:01:28.040
was a very easy to use API
00:01:30.360
um if you look at it it looks very
00:01:32.040
similar to active record encryption
00:01:33.799
right but we had a couple of problems uh
00:01:37.040
the main one uh from my perspective was
00:01:39.479
this key generation bottleneck um to
00:01:42.200
start encrypting records you needed an
00:01:44.520
encryption key we found that product
00:01:46.600
Engineers were not comfortable
00:01:47.880
generating their own encryption te's so
00:01:49.920
it was on my team to generate the keys
00:01:52.680
for the product engineers and it just
00:01:55.159
kind of took a while to get things going
00:01:57.719
uh also once active record increased
00:01:59.799
cryption was introduced we were now in
00:02:02.200
the position of maintaining Divergent
00:02:04.000
column encryption code so um this was a
00:02:07.520
little harder to do and you saw the
00:02:09.360
active record encryption API so do just
00:02:11.160
a second ago it was a bit easier to do
00:02:13.520
and um keys are actually derived
00:02:15.319
basically at the time of writing code or
00:02:17.120
you could even consider it runtime uh if
00:02:19.440
you really want to get down to it um so
00:02:22.040
active record encryption was very
00:02:23.400
tempting to our product engineers and if
00:02:26.000
they started using it without us going
00:02:28.040
through and securing everything and
00:02:29.800
cluding like um it relied on this
00:02:32.000
encrypted rails secrets. yaml file which
00:02:35.040
is uh not somewhere that we can actually
00:02:36.680
store our encryption Keys that's like a
00:02:38.319
violation of our our service level
00:02:40.040
objective uh it could be a big mess so
00:02:43.920
um after we upgraded we still had a
00:02:47.440
really easy to use API right the lovely
00:02:49.720
API I showed you before uh we reduced
00:02:52.239
some of our bespoke code that we had to
00:02:54.239
maintain and now we have a couple of
00:02:57.319
benefits that we just couldn't provide
00:02:58.800
before keep these are derived at the
00:03:00.840
time of running code or or writing code
00:03:03.040
or I said if you like to get really
00:03:04.640
picky about it at runtime really um we
00:03:08.400
have a strategy now to easily upgrade
00:03:11.360
columns from plain text to encrypted
00:03:13.599
this could be done with our previous
00:03:14.799
system but it was difficult and uh
00:03:17.040
frankly intimidating so people didn't
00:03:18.680
choose it and finally we centralized key
00:03:21.400
rotation so we took the responsibility
00:03:23.920
of rotating encryption keys from product
00:03:26.040
teams and put it back on our team a
00:03:27.840
security team and um the made them much
00:03:30.480
more comfortable and
00:03:34.519
happier so in the previous presentation
00:03:37.120
I mentioned we wanted to show how
00:03:38.439
straightforward and easy column
00:03:40.000
encryption can be um but we weren't
00:03:42.680
entirely honest right deploying the new
00:03:45.319
column encryption strategy wasn't always
00:03:47.439
straightforward and
00:03:49.599
easy um so this talk is more of a real
00:03:52.400
world case study and um it's anyone who
00:03:55.680
might be converting from an existing
00:03:57.120
column encryption strategy so this might
00:03:59.239
be some of you
00:04:00.400
you um those who are working in
00:04:02.840
distributed systems uh given that reals
00:04:05.000
is 20 years old I expect that this will
00:04:06.879
be possibly some more of
00:04:09.560
you um and those who will rotate their
00:04:11.920
encryption Keys uh this should be all of
00:04:14.360
you uh we can choose to rotate our
00:04:17.639
encryption keys on a you know a
00:04:19.280
maintenance Cadence as scheduled but
00:04:21.519
actually a fun thing about working in
00:04:23.240
security or um working at a very uh a
00:04:27.160
company that has desirable data is you
00:04:29.039
may not Choose You may not always get to
00:04:30.960
choose when you rotate your encryption
00:04:32.560
keys right um there may come a time when
00:04:35.280
you just need to rotate your encryption
00:04:36.800
keys and you need to have extreme
00:04:39.160
confidence that you can do it really
00:04:40.800
well and you can do it perfectly
00:04:43.199
because um encryption must be perfect
00:04:46.199
there's simply no there's no other way
00:04:47.880
around it really uh so we learned a lot
00:04:51.160
about deploying and maintaining
00:04:52.720
resilient security
00:04:54.880
software we made a lot of assumptions
00:04:57.120
about what we thought we knew about
00:04:58.400
encryption and what we thought our
00:05:00.080
hardest problems would
00:05:01.800
be um and not to brag but we did do this
00:05:04.600
with all no
00:05:07.000
downtime we learned it the hard way but
00:05:09.759
I'm hoping that with this presentation
00:05:11.880
you don't have
00:05:13.680
to so we challenged and disproved our
00:05:17.199
own assumptions about what we thought
00:05:18.800
the hardest part of this project would
00:05:20.240
be and these are some of the the
00:05:22.319
assumptions right the hardest part of
00:05:24.400
encryption is Key Management maybe this
00:05:26.199
is something that you've read I learned
00:05:27.520
this in school uh one bite in one bite
00:05:30.560
out which is kind of a sound bite for a
00:05:32.600
larger problem we um inter interacted
00:05:35.639
with and then uh the price of it just
00:05:39.400
works is performance this is what we
00:05:42.240
believed so let's get right into it the
00:05:45.319
hardest part of uh encryption is key rot
00:05:49.080
is Key
00:05:50.120
Management so what we thought this meant
00:05:52.479
was things like fips compliance right um
00:05:54.800
fips is this standard that you have to
00:05:56.800
adhere to if you work with US Government
00:05:59.080
um into ities i' GitHub we're not
00:06:01.280
required to work with this standard but
00:06:03.280
we try to meet it as closely as
00:06:06.759
possible um and we weren't worried about
00:06:09.080
this because we were meeting it using
00:06:10.599
AAS AAS considered the best in class for
00:06:13.319
symmetric
00:06:15.599
encryption uh we were worried about non
00:06:17.919
free use non in cryptography means
00:06:20.599
number used once right um we were
00:06:23.840
concerned about nons foruse because of
00:06:25.280
the sheer number of encryptions we do
00:06:26.919
every single year at GitHub and you know
00:06:28.919
we had this information because we had a
00:06:30.720
previously uh we had a previous column
00:06:33.039
encryption strategy in
00:06:35.080
place the more encryption operations
00:06:37.319
that you do the more likely you are to
00:06:39.400
generate a duplicate knots this is
00:06:41.720
called knot exhaustion so encrypting two
00:06:44.560
things with the same Nots is just
00:06:47.039
absolutely fatal to the confidentiality
00:06:49.160
and integrity of AES so it's very very
00:06:52.120
important that you do not generate a
00:06:53.639
duplicate kns and and use it to encrypt
00:06:55.360
two
00:06:56.400
values um but we weren't worried about
00:06:59.080
this right right uh we thought we were
00:07:00.199
being very clever we added the current
00:07:02.560
year as anti exhaustion data which
00:07:05.479
effectively derived a new key for each
00:07:08.319
column every single
00:07:10.319
year um another thing that we were
00:07:12.919
concerned about that you might read
00:07:14.039
about is secure key storage right um and
00:07:17.720
we weren't worried about this though
00:07:19.039
because we were using Hashi Corp Vault
00:07:21.000
to store a keys and we had been using
00:07:23.000
that in part of our previous column
00:07:24.840
encryption strategy transition for some
00:07:28.319
time
00:07:30.120
so that's everything we thought we knew
00:07:32.680
right um you go into a project thinking
00:07:34.720
oh we know what's going to go wrong
00:07:36.039
we're not worried we're smart people um
00:07:38.960
key deployment is actually an extremely
00:07:41.759
difficult part of
00:07:43.319
encryption let's say you have one rail
00:07:45.520
server right and you uh have active
00:07:48.240
record encryption enabled and you
00:07:50.360
encrypt records with your key and you
00:07:51.840
save them to the
00:07:54.159
database um and now let's say you add a
00:07:57.280
key as part of your key rotation process
00:08:00.759
so going forward all your records will
00:08:03.319
now be encrypted with your new TL
00:08:06.960
key when you rotate your key before
00:08:10.039
you've re-encrypted every single record
00:08:12.240
to use your new key your latest key this
00:08:15.360
teal key cannot decrypt purple records
00:08:18.759
right um some of might you might be
00:08:21.039
thinking uh but this will not happen
00:08:23.120
right active record encryption considers
00:08:25.360
this and um you would be right in this
00:08:27.599
specific scenario um rails will just
00:08:31.080
move right along to the next key in the
00:08:32.800
list and you'll be able to decrypt the
00:08:34.680
record um this is wonderful for you
00:08:37.240
truly I love this for you the rails
00:08:38.959
guard will take you far in life you
00:08:40.760
probably don't need the rest of the
00:08:42.479
presentation um but maybe you like me
00:08:45.240
work with more than one rail server or
00:08:47.440
maybe if today you work with one rail
00:08:49.040
server you'd like to work with
00:08:51.320
more so let's see your app is a bit
00:08:54.120
bigger than just one rail server right
00:08:56.320
maybe your app is more in the range of
00:08:58.040
like a fleet of rail servers and uh
00:09:01.320
maybe you serve like thousands of
00:09:02.760
requests per second and you're doing
00:09:04.600
quite a few encryptions and decryptions
00:09:06.519
every
00:09:07.959
year um here's a really simplified
00:09:10.800
version of the scenario to describe we
00:09:12.959
have um a couple of servers that are
00:09:15.760
writing encrypted records to the
00:09:17.560
database uh very nice now when you
00:09:21.279
rotate your key you have to propagate
00:09:23.760
the change to all of your servers it is
00:09:26.839
difficult to propagate to coordinate
00:09:28.839
such a propagation all at once um can
00:09:31.600
you automically deploy a key to all of
00:09:33.519
your servers at the same time I
00:09:38.800
cannot what does this mean for you right
00:09:42.200
until you propagated your keys to all of
00:09:45.399
your servers any processes that may be
00:09:48.079
holding on to uh references to Old keys
00:09:50.680
or any servers that haven't received the
00:09:52.200
new keys yet you might find yourself in
00:09:55.079
a situation where record is encrypted on
00:09:57.279
one server so see we have this new teal
00:10:00.200
key coming out it's encrypted with this
00:10:02.040
teal key but it cannot be decrypted on
00:10:04.399
another server because that server has
00:10:05.920
not yet received the decryption
00:10:09.120
key so what we learned is just dep
00:10:12.320
pending a new key didn't work in our
00:10:14.240
distributed
00:10:16.680
system in a distributed system to rotate
00:10:19.720
Keys we also had to distribute keys so
00:10:23.000
before we started this project our
00:10:24.600
previous encryption service was actually
00:10:26.600
networked which is Maybe not a choice
00:10:29.760
you would want to make we decided local
00:10:31.720
encryption would be sufficient um but we
00:10:35.079
distributed Keys through a database
00:10:36.560
backed API in that situation so we
00:10:38.320
didn't have to worry about this type of
00:10:39.680
key rotation and distribution until
00:10:43.040
now we needed a solution that ensured
00:10:45.800
encryption would happen with new keys
00:10:47.839
only once they were propagated to all
00:10:50.160
servers so we decided to distribute our
00:10:52.720
key by using a two key strategy first
00:10:55.880
Distributing the new decryption key then
00:10:59.040
Distributing the same value as an
00:11:00.560
encryption
00:11:02.120
key so we can distribute the decryption
00:11:04.920
key wait for the process to signal that
00:11:07.680
that key has been propagated to all of
00:11:09.399
our servers and can ensure that it's
00:11:11.920
present in all
00:11:13.600
servers if we attempt to decrypt with
00:11:16.720
the new decryption key rails will do
00:11:19.240
what it does well and it will just move
00:11:20.800
along in the list and attempt to use the
00:11:22.519
next key which should be the correct key
00:11:25.279
uh in this situation in the system that
00:11:27.000
we've set up and this works
00:11:29.720
once we know that the process is
00:11:31.399
complete to distribute the decryption
00:11:33.240
key now we can distribute the encryption
00:11:36.040
key with both keys in place we ran our
00:11:38.839
specialized migration to reencrypt all
00:11:41.519
of the records in place with again no
00:11:43.800
downtime and no using a plain text
00:11:47.279
mode so using two keys enabled us to
00:11:50.480
maintain our encryption SLO um we didn't
00:11:53.440
want to use planex mode it's very handy
00:11:55.560
but for us we had records that were
00:11:56.880
previously encrypted and we couldn't
00:11:58.320
allow them to to be stored in plain text
00:12:01.240
also because we have so many records our
00:12:03.920
re-encryption process was fairly lengthy
00:12:06.320
right so maybe if you have just a couple
00:12:08.519
records your re-encryption process will
00:12:10.320
not take so long and it's okay but it
00:12:12.839
would not be acceptable for that length
00:12:14.519
of time for our records to be available
00:12:16.120
in plain
00:12:17.519
text so what can you learn from our
00:12:22.199
experience well we learned this during a
00:12:24.600
plan test of our key rotation strategy
00:12:27.480
um and this is certainly one way that
00:12:29.480
you can find out about faults in your
00:12:32.040
system um I do not think that this is
00:12:35.279
the best way to find out about potential
00:12:36.959
faults in your system or your deployment
00:12:39.120
strategy uh you're really your best bet
00:12:41.480
for detecting this type of potential
00:12:43.639
failure is knowledge of your system um
00:12:46.720
so I would encourage you to do some
00:12:48.560
research and understand your
00:12:50.519
capabilities for updating keys for your
00:12:52.279
production
00:12:54.079
servers can you orchestrate updating all
00:12:56.880
of your keys at once if not how can you
00:12:59.959
roll out your keys to prevent this
00:13:02.040
potential decryption
00:13:04.040
failure we have a centralized key
00:13:06.360
management system that we push out
00:13:08.600
updates do you update Keys via push or a
00:13:11.800
pull what triggers a push um if you use
00:13:15.040
a pull system how do you trigger a pool
00:13:17.800
do you pull for updates how frequently
00:13:19.839
do you
00:13:21.399
pull how and when is your data migrated
00:13:24.639
um is your database sharded will data
00:13:27.360
ever move between shards
00:13:29.519
how will you re-encrypt records for key
00:13:31.560
rotation how long does re-encryption
00:13:33.959
take these are the kinds of things that
00:13:35.760
you should think about when you're
00:13:36.839
thinking about deploying your key
00:13:39.680
rotation
00:13:41.760
system so key rotation and re-encryption
00:13:45.360
of all records was always in our road
00:13:47.079
map but looking back we all agreed that
00:13:49.720
it probably should have been the first
00:13:50.920
thing that we looked at and looked at
00:13:52.480
really hard active record encryption
00:13:55.320
with the backwards compatibility key
00:13:57.120
list makes key rotation really really
00:13:59.680
easy so take advantage of
00:14:02.279
that all right what we thought we knew
00:14:06.399
um one bite in one bite out all of our
00:14:08.759
Cipher texts are the same size but is
00:14:11.480
sorry just double-checking that the
00:14:13.560
slide is showing everything I wanted
00:14:15.360
to um so as GCM 256 works like a stream
00:14:19.480
Cipher um some cryptographers in the
00:14:21.600
audience might be saying oh but it also
00:14:23.040
works like a block Cipher this is true
00:14:25.000
but we really don't have time for that
00:14:26.160
today I'm happy to talk to you about it
00:14:27.880
later um and all Cipher text will be 128
00:14:32.440
bits right so that sounds good um some
00:14:35.920
columns are migrating from plain text
00:14:38.240
but they just need to be resized to hold
00:14:39.759
about 128 bits right I think see some of
00:14:42.639
you see where I'm going with this um our
00:14:45.519
previous encryption scheme stored some
00:14:48.120
metadata but it didn't store quite as
00:14:50.639
much metadata as rails does and not
00:14:53.000
quite in the same
00:14:54.440
way so what did we
00:14:57.120
learn Cipher TCH text is not a onetoone
00:15:00.240
mapping to encrypted record or what
00:15:02.399
active record encryption Cipher calls
00:15:04.120
the encrypted message and while we
00:15:07.000
accounted for some overhead and
00:15:08.680
migration to the new scheme we didn't
00:15:11.000
fully think this
00:15:13.120
through um all of our Cipher texts are
00:15:15.680
the same size that is true one by out
00:15:17.759
one bite in one by out there yes however
00:15:20.120
cyppher text is not all that is stored
00:15:21.880
in the
00:15:22.639
database um rails uses eded a key ID as
00:15:27.320
part of a simple envelope and encryption
00:15:29.120
strategy um it stores all of this in an
00:15:31.120
adjacent object and it includes like a
00:15:33.800
couple of other headers if you want to
00:15:36.160
add values to this envelope you need to
00:15:38.160
account for the size any of these
00:15:39.920
metadata bits may be adding to the total
00:15:42.079
size of your
00:15:44.120
record this one uh we thought we were
00:15:46.800
being very clever uh I mentioned before
00:15:49.519
we have our anti-n exhaustion data um
00:15:52.240
that is quite a bit of text I think that
00:15:53.880
might be 26 characters long uh we wanted
00:15:56.800
to uh write for read ability right we
00:15:59.519
were writing this and our previous
00:16:01.079
encryption scheme was really good but
00:16:02.880
developers didn't totally get how it
00:16:04.959
worked so we wanted to just be so
00:16:06.440
explicit and clear with everything you
00:16:08.120
know in case they wanted to look at the
00:16:09.519
internals and see the changes that we
00:16:11.160
had made um this is the actual name of
00:16:14.440
the tag that we used if you are familiar
00:16:17.160
with envelope encryption or the types of
00:16:19.079
metadata headers that get appended to
00:16:21.120
Cipher text in encrypted messages you
00:16:24.000
know that usually the names of these
00:16:26.199
tags are just one or two characters
00:16:29.920
so just for a bit of comparison you can
00:16:32.639
see our previous strategy um in red we
00:16:35.800
have a key ID and then in Gray is the
00:16:38.360
encrypted message and these are both
00:16:40.560
packed as binary strings so this comes
00:16:42.440
out to about 42 characters which is
00:16:45.120
quite small right and quite nice active
00:16:47.680
record encryption uses adjacent object
00:16:50.160
which I think is also a nice way to
00:16:51.880
store an encrypted message and we did
00:16:54.480
consider you know that there are there
00:16:56.079
are headers added differently and a
00:16:57.800
little bit larger because they are an
00:16:59.440
object and not you know a packed a
00:17:01.399
packed binary stream but you can see
00:17:03.600
active record encryption the payload
00:17:05.799
with the key P um is quite small and
00:17:09.480
then uh the message headers with the key
00:17:11.760
H is a bit bigger I think this whole
00:17:13.919
thing comes out to about 208 characters
00:17:16.000
only 60 of which are the payload um and
00:17:18.919
you'll see at the bottom someone has
00:17:20.400
added a very long uh message tag with
00:17:23.400
the name anti-n exhaustion
00:17:27.400
data um so we ended up resizing our
00:17:31.880
existing columns and recommending that
00:17:34.000
all of our um new encrypted columns use
00:17:37.520
to type MySQL text uh we made this
00:17:41.080
recommendation based on the fact that we
00:17:43.840
are not allowing deterministic
00:17:45.360
encryption at GitHub um You probably
00:17:48.880
don't want to use can't use text if you
00:17:51.280
need to index on your encrypted
00:17:53.919
columns I personally feel that encrypted
00:17:57.720
records should not be indexed on and you
00:17:59.360
should not use deterministic encryption
00:18:02.080
um but as happened at rails confid as
00:18:04.799
I'm sure will happen here someone will
00:18:06.039
tell me a good use case that they have
00:18:08.120
but um for my or for our use case we've
00:18:10.799
decided not to enable deterministic
00:18:12.919
encryption so if you need deterministic
00:18:15.120
encryption just make sure that you use a
00:18:17.400
large enough size uh column but probably
00:18:19.640
not
00:18:21.600
text so what can you learn from our
00:18:25.200
experience um really truly if you are
00:18:28.640
any changes to the message headers or
00:18:31.240
the metadata understand the size of
00:18:32.919
those bits that you're storing along the
00:18:34.240
cipher
00:18:35.440
text um in our case we really did not
00:18:38.320
need to name the tag anti-n exhaustion
00:18:41.480
data um product Engineers were not quite
00:18:44.159
chomping at the bit to like dig into the
00:18:46.200
internal changes that we made um and
00:18:48.440
really wanting to understand the API at
00:18:50.200
that level so we may be over optimized
00:18:52.400
for thinking people would be as excited
00:18:54.159
about this as we were um although they
00:18:56.400
are quite excited about what it buys
00:18:59.520
them um next is the actual value for any
00:19:04.840
um any header values that you may add
00:19:07.600
right our anti-n not exhaustion data is
00:19:09.840
the year we determine with the number of
00:19:11.679
encryptions that we do every year
00:19:13.400
rotating the key yearly automatically in
00:19:15.880
this way would be sufficient for us the
00:19:18.760
most important thing about this value I
00:19:20.840
think is that it is of a fixed length
00:19:24.320
right so um you wouldn't going to use
00:19:26.960
something like um a model and a column
00:19:29.799
name because those are not fixed length
00:19:31.679
right they could be different lengths
00:19:32.880
depending on the model and column um and
00:19:35.159
you might be thinking uh Kylie year is
00:19:37.840
not guaranteed fixed length and that's
00:19:40.400
fair but we do have about 8,000 years
00:19:43.039
before that value will become longer and
00:19:45.480
I think that this is probably enough
00:19:46.880
time for us to figure out a solution if
00:19:48.720
we need to make a change however because
00:19:51.120
we use
00:19:52.080
text I think we will be okay if we add
00:19:54.600
one more
00:19:56.039
character um just a highlight once again
00:20:00.080
all all of these other message headers
00:20:01.960
which are implemented by the rails team
00:20:04.159
are just one or two uh characters and
00:20:06.600
our message header's name is quite
00:20:09.840
long so make it easy for your engineers
00:20:12.880
don't even let them find out about the
00:20:14.480
size of message headers um that will be
00:20:17.480
added to the encrypted message right um
00:20:21.840
so consider longevity like I said um
00:20:25.000
using text and using year for antiox
00:20:27.600
exhaustion data does buy us about 8,000
00:20:29.880
years and we have a lot of work in the
00:20:31.799
backlog but I think you know if it comes
00:20:33.640
to it and we have to make a change we
00:20:35.159
have the
00:20:36.200
time that's a a joke so you can't laugh
00:20:40.520
or it's your morning
00:20:43.360
too
00:20:45.000
so uh what we thought we knew the price
00:20:47.960
of it just works is performance right
00:20:51.880
when something just works you pay a
00:20:54.799
price right uh and we assume that this
00:20:57.720
would be performance right encryption
00:20:59.679
can take some time but we were moving
00:21:02.200
from one encryption sceme to another so
00:21:05.159
our Engineers were familiar and
00:21:07.400
understanding of how much time would
00:21:09.760
conceivably be added right and so we
00:21:12.200
figured that this was negligible and for
00:21:14.200
those upgrading a plain text column to
00:21:16.520
encrypted again it will just be such a
00:21:18.720
small amount of time and it's acceptable
00:21:20.400
to our engineering
00:21:23.360
team so what we weren't thinking about
00:21:27.320
was the pride of item the price of item
00:21:29.440
potency right sometimes you pay in how
00:21:31.960
much time and sometimes you pay in how
00:21:34.200
many times if it just works how did we
00:21:37.720
find out um with monitoring um and
00:21:41.799
unfortunately with our customer audit
00:21:43.520
log which may some of you may have
00:21:46.159
noticed
00:21:47.720
so we had a bit of a red herring um and
00:21:50.919
we found that some of our custom code
00:21:53.120
was somewhat of the problem but what we
00:21:55.919
ultimately
00:21:57.039
learned some data is just extra special
00:22:01.200
and we had one such extra special column
00:22:03.720
two Factory recovery codes we had
00:22:06.000
monitoring in place to measure things
00:22:07.840
like encryption and decryption failures
00:22:10.559
but we didn't have anything internal in
00:22:12.600
place to measure side effects of our
00:22:14.760
upgrade
00:22:17.120
strategy our upgrade strategy relied on
00:22:19.679
a type to feature flag right the flag
00:22:22.840
would determine if a record should be
00:22:24.320
encrypted or not and it would be set to
00:22:27.120
encrypt before we ran our upgrade mic
00:22:29.520
creation um and this seemed like a good
00:22:31.840
system and we upgraded a couple columns
00:22:34.440
from our previously encrypted strategy
00:22:36.600
to encrypted and you know we saved what
00:22:39.200
we F felt was like a a special column
00:22:41.279
for a little bit later making sure we
00:22:42.880
had really battle tested it before we
00:22:44.880
tried this
00:22:45.919
one um but we overlooked that this
00:22:48.559
column relied on changed in place right
00:22:52.799
so when we were migrating this column
00:22:54.720
which was previously encrypted we found
00:22:57.200
that changed in place would compare
00:22:59.120
decrypted plain text to the encrypted
00:23:02.159
Cipher text um and this will always show
00:23:05.039
is changed right this is U the virtue of
00:23:08.159
encryption this is the main value of
00:23:09.840
encryption knowing the cipher text
00:23:11.880
should tell you absolutely nothing about
00:23:13.600
the value of the plain text um so the
00:23:15.919
encryption Works quite well uh but we
00:23:18.400
neglected to delegate that changed in
00:23:21.400
place method to the active record
00:23:24.000
encrypted type attribute right um so
00:23:27.679
when we migrated this column related to
00:23:30.120
two- Factor
00:23:31.720
authentication this had the unexpected
00:23:33.799
side effect of causing all of these
00:23:35.440
records to appear to have been changed
00:23:37.840
when they were in fact not um and this
00:23:40.799
generated audit audit logs for our
00:23:44.880
customers um fortunately though there
00:23:46.960
was no actual change to the data the
00:23:50.080
data itself is fine um and our
00:23:52.400
authentication team was really great and
00:23:54.080
very understanding and they annotated
00:23:55.960
the false alerts to indicate this to to
00:23:57.919
the affected customers we fix this by
00:24:00.400
delegating the changed in place to the
00:24:02.039
encrypted attribute type um and this now
00:24:05.039
compares the decrypted cipher text to
00:24:07.440
the plain text which again if encryption
00:24:09.919
was done correctly and with active
00:24:11.440
record encryption it is uh always
00:24:16.200
match um so we also noticed and this was
00:24:19.760
the red herring we thought perhaps our
00:24:21.440
issue is item potency right the idea
00:24:23.840
that maybe to get something done a
00:24:26.200
method has to be called a couple
00:24:27.559
different times before it can really
00:24:30.200
fully take effect but the encrypt method
00:24:32.880
is item potent but we notied with our
00:24:35.000
monitoring dashboard that encrypt was
00:24:36.600
being called twice um this did not
00:24:40.399
directly contribute to the erroneous
00:24:42.279
audit logs but because we generated
00:24:44.960
these eron audit logs we did notice
00:24:47.799
this however luckily like I said encrypt
00:24:50.440
is item potent um there was a 35 second
00:24:54.799
period where I was extremely ill
00:24:56.480
thinking that we had double in encrypted
00:24:58.120
records and thinking oh my goodness how
00:25:00.440
are we going to find out which ones are
00:25:01.440
double encrypted and how are we going to
00:25:03.200
decrypt them and set them back to you
00:25:05.080
know the standard single encryption
00:25:06.799
strategy that we have um but like I
00:25:10.000
mentioned encrypt is item potent and the
00:25:12.799
bug was fixed if you are for some reason
00:25:15.279
like us relying on Counting the number
00:25:17.640
of encryptions or decryptions to detect
00:25:19.720
potential
00:25:21.760
failures um yes and the bug has been
00:25:24.679
fixed so you you can rest well knowing
00:25:28.600
that uh it will only be called once per
00:25:30.799
encryption which is uh I think a really
00:25:32.960
good and appropriate number of times for
00:25:34.640
it to be called so what we learned it
00:25:38.200
just works means right it did just work
00:25:41.080
for most
00:25:42.120
cases but we had a really special case
00:25:45.760
and monitoring can help detect special
00:25:48.399
cases but the problem with monitoring is
00:25:51.120
it's too late this is in production the
00:25:54.080
special case has hit production and
00:25:55.960
could now be affecting your customer
00:25:57.440
data
00:26:00.120
data some data is just extra special and
00:26:03.760
you need to take extra time and care to
00:26:05.520
get it
00:26:06.960
right so what can you learn from our
00:26:09.880
experience uh maybe you know don't
00:26:11.520
meddle with the internals but maybe you
00:26:13.960
like us can't help it encrypted data is
00:26:16.919
being encrypted for a reason there might
00:26:19.039
be special monitors or side effects
00:26:21.279
associated with these
00:26:24.200
records but despite all of this all of
00:26:27.440
these bad things that seem to have
00:26:29.320
happened and um quite a bit of sweating
00:26:31.600
on my part we delivered a seamless
00:26:34.320
column encryption strategy we're no
00:26:36.480
longer maintaining Divergent column
00:26:38.159
encryption code we have this new easy to
00:26:40.679
use process to upgrade columns from
00:26:42.880
plain text to encrypted which before we
00:26:45.679
could provide but was difficult and a
00:26:47.679
bit arduous and not appealing to our
00:26:50.399
developers um we greatly sped up
00:26:53.480
development time of new encrypted
00:26:55.360
columns which greatly increased adoption
00:26:57.559
op of encrypted columns um and we
00:27:00.679
maintain our service level uh with no
00:27:03.200
service interruptions at all um so I
00:27:05.480
think if you keep a couple things in
00:27:07.080
mind you can too um and the two that I
00:27:10.399
think are most important probably are to
00:27:13.080
build with key rotation and Key
00:27:15.000
Management in mind first I mentioned
00:27:17.240
this before active record encryption
00:27:20.039
makes this very very easy and you should
00:27:22.320
take advantage of it um you could
00:27:25.440
probably build your own column
00:27:26.799
encryption Str stry right but building
00:27:29.399
your own key rotation strategy without
00:27:31.760
the support of active record encryption
00:27:33.600
is very difficult and I do not advise
00:27:36.480
you go down that path um and then the
00:27:39.640
next most important thing I think is
00:27:42.159
understand why a column should be
00:27:43.559
encrypted and what side effects there
00:27:45.279
may be on those records right this is
00:27:47.480
live data this affects your customers
00:27:49.519
this is the livelihood of your
00:27:53.080
application deploying seamless column
00:27:56.240
encryption was not seamless but it's
00:27:59.440
very doable and I really believe that
00:28:01.880
it's
00:28:03.120
worthwhile um if you enjoyed this
00:28:05.320
presentation uh if you'd like to learn
00:28:07.360
more we have two blog posts that
00:28:09.720
detailed um first uh all the changes
00:28:13.000
that we made and why and then um we have
00:28:15.760
another one that tells you in more
00:28:17.279
detail how a kind of like simplified
00:28:19.480
version of our key rotation strategy and
00:28:22.039
it pass some sample code um which you
00:28:24.120
can use and it relies on this really
00:28:25.640
handy gym that I like from Shopify
00:28:27.480
called maintenance tasks that helps you
00:28:29.640
handle these kind of like special
00:28:31.159
transitional migrations um the active
00:28:33.799
record encryption guide which I joked
00:28:35.960
earlier but it really will take you very
00:28:37.760
far um this presentation is just about
00:28:40.480
you know the handful of things that
00:28:41.720
weren't in there um we gave a
00:28:44.080
presentation earlier this year uh which
00:28:46.240
kind of maybe sells you on active record
00:28:48.320
encryption of it if you're not sold and
00:28:50.159
then two of my absolute favorite uh
00:28:52.159
security engineering books that I've
00:28:53.600
read real world cryptography by David
00:28:56.200
Wong um I opened this book probably like
00:28:58.960
every day or every other day while
00:29:00.279
working on this project and then I
00:29:02.480
really enjoyed um Google's building
00:29:04.679
secure and reliable systems book it's
00:29:06.640
kind of their security answer to the SRE
00:29:09.000
book um yeah I don't know that we'll
00:29:12.039
have time for questions but I would love
00:29:13.640
for you to come ask me them in person or
00:29:15.559
if you feel shy to come uh you can
00:29:17.440
message me on the conference
00:29:19.480
slack thank you so
00:29:26.320
much