RubyConf AU 2023
Encrypted Search Party
Summarized using AI

Encrypted Search Party

by Fiona McCawley

In this talk titled "Encrypted Search Party", Fiona McCawley discusses the challenges and techniques in protecting sensitive data, particularly in Rails applications where application-level encryption is vital. She focuses on Order Revealing Encryption (ORE), a scheme that enables querying capabilities while maintaining data in an encrypted state. Fiona introduces the necessity of encryption for securing sensitive information and highlights various encryption approaches, specifically differentiating between deterministic and non-deterministic encryption modes.

Key Points Discussed:

  • Introduction to Encryption:

    • Encryption transforms plaintext into ciphertext, making the data unreadable without a specific key.
    • Deterministic output produces the same ciphertext for identical plaintexts, whereas non-deterministic output generates different ciphertexts each time, using an IV (Initialization Vector).
  • Application-Level Encryption:

    • This type refers to the control of the encryption and decryption processes by the client, ensuring sensitive data is encrypted as close to the client as possible throughout its lifecycle.
    • Demonstrated using Rails 7’s Active Record, which now supports application-level encryption, highlighting the steps involved in setting it up.
  • Challenges of Querying Encrypted Data:

    • Attempting to run SQL queries against encrypted data often returns no results since the encrypted values don't match the database's plaintext expectations.
  • Overview of ORE:

    • ORE allows for comparing two ciphertexts to determine the order of their plaintext equivalents without revealing the actual plaintext values.
    • It operates using two keys (PRF and PRP) to facilitate encryption and secure comparisons.
  • Demonstration of ORE Library:

    • A toy ORE library was built to exemplify how the scheme works, showcasing how ciphertexts are generated and compared in a secure manner.

Conclusion and Takeaways:

  • ORE presents a viable solution for maintaining data privacy while allowing necessary querying capabilities.
  • Adoption of application-level encryption, such as that supported in Rails, can enhance data security practices in modern web applications.
  • The toy ORE library is available on RubyGems, inviting developers to experiment and implement these concepts in their own Ruby applications.
  • Fiona encourages further discussion and exploration of these practices after her talk to deepen understanding of data security challenges and solutions.
00:00:00.000 Uh, hi everybody. I'm Fiona, and I'm a developer at a company called CipherStash. They've sponsored me to speak here, and my talk is directly related to what I work on every day.
00:00:10.559 Ruby was the first language that I learned, and as most boot camp attendees would know, that's generally the case. However, until recently, I haven't had the opportunity to work with Ruby professionally. It's been a lot of fun coming back to Ruby after all these years. The main language that I have worked with is Elixir.
00:00:36.600 Also, prior to a year ago, I didn't know much about cryptography. How encryption worked was a bit of a mystery to me. I’m not a cryptographer. But today's talk is about sharing what I've learned about something called application-level encryption.
00:01:04.199 The original title didn’t really work as a conference title and didn’t have the same ring to it, so I decided to call my talk the "Encrypted Search Party." More specifically, I'll be discussing an encryption scheme called Order Revealing Encryption, or ORE, throughout this talk.
00:01:15.840 What we'll cover today is a brief introduction to some concepts that will help us understand what encryption is, what we mean by application-level encryption, and why querying encrypted data can be challenging. To help explain how ORE works, I've built a toy ORE library that I'll demo at the end of the talk. This library is available as a gem, so you can experiment with ORE on your own.
00:02:04.680 One of the first things I learned about encrypted data is that it can limit the usability of the data. For example, being able to query or search that data can become quite challenging. Consider this users table with an email and date of birth column containing some personally identifiable information. We might use an SQL statement to select all users where the date of birth is greater than January 1, 1990, and normally we would get records back.
00:04:34.800 However, if we choose to secure that data by encrypting it and then attempt to run the same query again, we may get nothing returned. Why is that? Let's break this down starting with application-level encryption. Encryption is the process of converting human-readable text, which we will refer to as plaintext, into incomprehensible text called ciphertext using a key or keys.
00:05:01.860 This data can be decrypted by someone who has access to those keys. The encryption can yield what's called non-deterministic or deterministic output. Non-deterministic means that given the same plaintext, a different ciphertext will be generated each time, often using something called an IV or initialization vector to introduce randomness.
00:05:25.380 Deterministic means that given the same plaintext, the same ciphertext is generated every time. Both methods have their pros and cons, which we will explore further. Now, let's understand what we mean by application-level encryption.
00:06:07.380 This basic diagram illustrates the data lifecycle between a client (where our application is) and an RDS instance in a cloud provider like AWS. One way we can encrypt our data is at rest, which protects our data from physical attacks on the underlying storage.
00:06:39.600 But what happens when our data leaves our RDS instance? We can encrypt it using SSL or TLS, known as encryption in transit. However, there are still gaps where an attacker could gain access to our sensitive data.
00:06:48.600 So, what is application-level encryption? It means the client is in control of encrypting and decrypting the data. This ensures that our sensitive data is encrypted as close to the client as possible during all stages of its lifecycle, including at any location where that data is stored.
00:07:24.240 To demonstrate application-level encryption alongside deterministic and non-deterministic decryption, I’ll show you a demo of a Rails app. As part of Rails 7, Active Record now supports application-level encryption, and there are a few basic steps to set this up.
00:07:55.860 The first step is to generate some secret keys. In a production environment, you would store these sensitive keys in your Rails credentials file, but for the purposes of this talk, I'll display them as they are not being used anywhere. The demo app we'll use has a basic users table with an email field, and in our model, we will declare that we want to encrypt our email attribute.
00:08:47.520 As you will see in the recording, when we try to create a user record, it inserts successfully. However, when we attempt to query that record, we don't get anything returned because the value of the email is now ciphertext, which is generated afresh every time we execute the query, meaning we can’t perform a simple query clause.
00:09:33.780 Active Record does have a deterministic mode that we can switch on. In this example, when we create a record, the ciphertext produced is exactly the same when we query the record. This means we can successfully retrieve the record.
00:10:26.760 However, every time we create a user record, it will produce the same ciphertext. This could be problematic because if an attacker determines the plaintext value of that ciphertext, they could decipher the underlying data for all instances of that ciphertext.
00:10:56.280 This indicates that non-deterministic encryption might be the preferable option, but the challenge is how to make that queryable. This is where order-revealing encryption comes in.
00:11:23.760 Research for ORE was conducted by two professors at Stanford University, Dr. David Wu and Dr. Kevin Louis. Although I do not know how to read the paper published in 2016, I've been learning about ORE from individuals who do understand this research and have collaborated with them.
00:12:13.860 In simple terms, ORE allows us to compare two ciphertexts such that we can determine the order of their corresponding plaintexts without revealing the plaintext itself. For instance, if we encrypt a plaintext of 'a,' ORE generates a ciphertext consisting of a left ciphertext and a right ciphertext.
00:12:37.560 By comparing these ciphertexts, we can ascertain whether the plaintext of one is less than, equal to, or greater than the other. In our simplified example, we will encrypt just the letter 'a' and analyze how each plaintext relates to the potential values in a defined domain.
00:13:39.560 We have values from zero to three, which are mapped to the ASCII characters 'a', 'b', 'c', and 'd'. We will then encrypt the letter 'a', comparing its value against the possibilities in our domain. For example, when comparing against zero, one, two, and three, we will store the results of those comparisons.
00:14:27.840 We also maintain an offset, which indicates where the plaintext value in relation to the domain is equal to each plaintext value. This allows us to maintain the context of the comparison within our ciphertext.
00:15:06.840 Now, when looking at how the plaintext values map to their respective ciphertexts, we can apply the same method to any plaintext we wish to compare. As we compare each letter to our domain values of zero, one, two, and three, the results can inform us if one is lesser, equal, or greater.
00:16:01.260 In ORE, we utilize two keys: a pseudo-random function (PRF) key, referred to as a hash key, and a pseudo-random permutation (PRP) key, referred to as a shuffle key. These mathematical concepts help facilitate our encryption and comparison results in a secure manner.
00:16:45.180 When we encrypt comparisons, we generate a key to encrypt each outcome and store an IV for those comparisons to enhance the security of our output. The results will appear jumbled, encrypting our key details while still allowing for comparison with the previously encrypted right ciphertext.
00:18:04.680 The left ciphertext will have the initial offset location where the plaintext corresponds to a domain value, while the right ciphertext stores the necessary comparisons. Therefore, both ciphertexts together facilitate querying without exposing sensitive data as plaintext.
00:18:43.920 Let’s discuss what happens in our database as we attempt queries. The client generates a left ciphertext when it performs the query, and this left ciphertext is then sent to our database that already contains the respective right ciphertexts. A function in the database compares the left ciphertext with the right ciphertext and returns the comparison results.
00:19:25.260 This illustrates the effectiveness of separating the encryption keys from the queries, allowing our database to operate with the encrypted data without ever needing to understand the underlying plaintext. Essentially, it safeguards sensitive data while still enabling functional queries.
00:20:00.360 For the sake of clarity, I'd like to reiterate that while many aspects of this encryption methodology might seem confusing now, you're more than welcome to approach me after the talk for further discussion. In conclusion, I'd like to showcase a demo using a toy ORE library that I've built. We'll walk through some code and hopefully, you'll recognize some of the concepts I've just mentioned.
00:20:54.540 The method encrypts our data based on the IV, the key, and the comparison result. We'll also review the class representations of the left and right ciphertexts, which house the necessary attributes for conducting comparisons in ORE.
00:21:46.920 This demo will illustrate how we can initialize our ORE scheme with a domain size allowing us to create ciphertexts from the values zero to three. After establishing our cipher for each value, we can then demonstrate how comparisons are effectively returning results according to their order relationships.
00:22:39.840 In summary, these comparisons will reveal how the letters rank against one another, which showcases the usefulness of ORE in applications requiring secured data while still allowing effective data querying.
00:23:26.160 The ORE scheme is available on RubyGems as "toy-ore" and features some documentation along with well-commented code, which I hope will help get you started. I'll share the links to my talks along with my GitHub profiles.
00:23:53.320 Thank you all for listening and please feel free to reach out if you’d like to dive deeper into this topic.
Explore all talks recorded at RubyConf AU 2023
+10