RailsConf 2019

How Checkr uses gRPC

How Checkr uses gRPC

by Paul Zaich and Ben Jacobson

In the talk "How Checkr uses gRPC" presented by Paul Zaich and Ben Jacobson at RailsConf 2019, the speakers discuss the challenges faced by Checkr, a background check company, in managing a growing number of internal APIs. They explain how shifting from traditional JSON-based RESTful APIs to gRPC has enhanced their documentation, service boundaries, and overall system resilience. The focus of the talk is on remote procedure calls (RPC) and the advantages of gRPC, which is an open-source framework developed by Google. The speakers outline the following key points:

  • Definition and Context: RPC (Remote Procedure Call) allows one service to invoke actions on another service. This is essential for Checkr as they have over 500 endpoints and numerous services that must communicate seamlessly.
  • Transition to gRPC: Historically, Checkr faced issues with undocumented APIs and inconsistent types when using REST. They decided to adopt gRPC to enhance their API management. gRPC provides an IDL (Interface Definition Language), which facilitates precise documentation and better definition of service terms.
  • Comparison with REST and SOAP: The speakers compare gRPC to traditional methods like SOAP and REST, highlighting the advantages of gRPC in terms of clearly defined APIs and reduced potential errors in service-to-service communication.
  • Implementation Process: A detailed walkthrough is given on how to set up a new Rails project with gRPC support, using a 'name matcher' service as an example. The process involves writing IDLs, generating code automatically, and deploying services efficiently.
  • Error Handling: The talk explains how gRPC facilitates structured error handling using status codes, which can provide more clarity on issues during service interactions.
  • Usage of Protocol Buffers: Implementing protobufs helps enforce message types across services, contributing to a more stable infrastructure.
  • Takeaway Insights: The presenters emphasize that while gRPC serves well for scaling architectures, REST remains a viable option, especially for smaller or less complex systems. They advocate for a transition that is context-driven, rather than arbitrary, based on the needs and scale of the organization.

In conclusion, Checkr’s journey with gRPC demonstrates the framework's strengths in modernizing API interactions and enhancing internal communication as they scale, with an emphasis on contract-driven development and improving documentation for internal services.

00:00:20.840 All right, thank you for joining us today. We'll talk about how Checkr uses gRPC.
00:00:29.029 First, an audience survey: have you ever worked with an API without documentation? Raise your hands. It should be everyone. It happens all the time. Have you ever worked with an API with inaccurate documentation? Maybe a little less, but sometimes, certainly for internal services. It may be a little more rare for public-facing services.
00:00:41.010 What about an API that returns inconsistent types? Or perhaps you don't know exactly what the typing is all the time? Cool, so at Checkr, the number of internal endpoints and services has grown over the years. Today, our monolith has over 500 endpoints and 20-plus additional production services in the critical path of our application.
00:01:02.190 We started to run up against the problems of undocumented APIs and unknown APIs. So, what is Checkr? Checkr is an API-first background check company. We provide modern, reliable background checks for thousands of customers, including companies like Uber, Lyft, GrubHub, and Instacart. Our API needs to be available; otherwise, they can't hire. Therefore, our systems have to be resilient and available.
00:01:21.539 I'm Ben Jacobsen, and this is Paul. We are both engineers on the Checkr team. This talk will cover what gRPC is, the trade-offs of using it compared to something like REST, and we'll walk through a detailed example of how to get started.
00:01:36.750 So, let's first start with 'What is RPC?' RPC stands for Remote Procedure Call. It refers to any time one service wants to call another service and request information or ask it to perform an action. In this example, Service A says, 'Hey, I want you to do something for Service B,' and Service B responds, 'Okay, here's the result.' Historically, there have been many ways to solve this problem. SOAP is a common method. How many of you have used a SOAP API before? We use SOAP a lot since many of our integrations are legacy ones that utilize it. Here's an example of a SOAP call, where you describe precisely the call you want to make and how you want to make it with specific inputs.
00:02:12.569 More modern approaches use REST, typically employing Swagger. Here’s a YAML definition of an endpoint that will return information in a precisely defined way. But what is gRPC? gRPC is an open-source framework from Google designed to make this process easier. It is available in several programming languages, which helps to streamline the development process.
00:02:36.180 The IDL (Interface Definition Language) for gRPC looks similar to the versions we've examined for SOAP and REST. It allows you to precisely describe the API you want to implement. For instance, we have a Stocks API or Stock Service that has a 'get stock price' method, detailing the exact request it takes and the response it'll provide.
00:03:54.990 When I think of RPC, I see it as a bundling of key concepts: an IDL that precisely defines what a service does, an implementation of the IDL, a protocol for inter-service communication, and important documentation that helps others within your organization understand what's available. If we compare these concepts with REST or SOAP, we can see that an IDL for REST could be represented by something like OpenAPI or Swagger. An implementation might use a tool like Swagger Codegen or you could handwrite an HTTP party class to talk to an endpoint. The protocol usually involves JSON or XML over HTTP, and documentation options could be Swagger UI, an open-source product, or some kind of Slate documentation.
00:04:32.990 For SOAP, it's fairly straightforward. A WSDL (Web Services Description Language) document is essentially a giant XML file that precisely describes the entire SOAP API. From there, you can use a library like Savon for Ruby or Z for Python to implement the API calls. The protocol used under the hood is typically XML. In contrast, gRPC follows a very similar structure, helping to implement an architecture around RPC. So, for gRPC, the IDL is defined in a proto file, and the implementation code is generated automatically for you. The protocol that runs in the background is something called Protocol Buffers over HTTP/2, and we use an open-source project called Protobuf Documentation to generate the documentation for the rest of our organization.
00:05:56.470 gRPC is an opinionated framework for achieving RPC. Think of it as convention over configuration, which is appreciated at a Rails conference. But, should you be using gRPC? The answer is probably not at the outset. If your organization is trying to scale and you have multiple services, gRPC becomes more valuable. However, you can get quite far with REST.
00:06:20.820 REST is great for rendering JSON. I remember when I first saw a line of code in a Rails project; I was amazed at how powerful this approach is. You can move quickly with this sort of code. However, it may not scale well to larger teams. If you need more flexibility, you can begin to incorporate a serializer that can help you specify what you want sent to the client. JSON is easily readable and intuitive. A JSON example shows that even a non-technical person can generally understand what it's trying to convey. In contrast, Protocol Buffers, which gRPC uses for communication, are binary and harder to interpret, potentially making it impossible for a human to understand what's going on.
00:07:36.680 As organizations scale, JSON APIs may start to feel less intuitive, particularly when you have more services to manage. Here we have Service A wanting to understand what resources are available in Service B. However, REST and JSON do not inherently provide the tools needed to achieve this understanding. Engineers often have to go out of their way to create documentation or integrate it into their pipeline. The challenge grows as one service becomes multiple services, leading to increasing complexity in managing documentation.
00:08:59.680 gRPC presents an approach to solving this challenge. The earlier-mentioned IDL serves as a proto file that can be viewed almost like a schema.rb for all services within your organization. Schema.rb is powerful because it allows you to open any Rails project and see the data persistence structure, demonstrating overarching relationships between entities. The process works by placing the IDL in a repository and utilizing gRPC tooling on that IDL. The gRPC tooling generates code across different languages. For example, services written in Python and Ruby that can be imported independently to implement calls or responses.
00:10:04.290 In this case, a Python client, Service A, requests stock prices from a Ruby server, Service B, which looks up the price and sends back a specific stock price response. This mapping relates closely to concepts we already know. The IDL repository can be likened to a create table command in a schema.rb file, as it describes your application’s structure. gRPC tooling resembles Rails db:migrate because it actively changes the state under the hood.
00:11:19.920 Our journey with gRPC began as our services expanded, and increasing numbers of engineers became involved in product development. Initially, we started experimenting with gRPC, gradually introducing it into some of our services. Over time, we have expanded its implementation into more and more services, particularly for new-concept services developed today. I wish I could assert that there's a definitive point in time for transitioning to gRPC—like when you have 25 engineers or 50 services—but in reality, it does not exist. It is a gray area with many pros and cons on both sides.
00:12:44.420 You can build a successful business solely using REST and JSON. In our case, we still rely on REST and JSON while using Swagger to generate public API documentation. However, many of our internal services have now adopted gRPC technology. Now, I'll take you through a detailed walkthrough of how we accomplish this at Checkr.
00:14:43.059 Thanks, Ben. As Ben mentioned, I will guide you through our everyday workflow at Checkr for implementing new gRPC endpoints and services. Let's start with an example in the context of Checkr.
00:15:19.530 A crucial task we perform daily is determining when two records match based on identity. One key component is understanding when two names match the same entity. This is such a pivotal part of our stack that we want to expose this capability across many areas of our product, making it seamless for different services.
00:16:51.680 The first step is to write the IDL for the name matcher service. We need to define the service and its functionalities. In this instance, we have a name matcher service and define a method for it: 'map match' that takes in a match request and returns a match response.
00:18:07.000 The match request actually utilizes a custom-defined message to accept arguments for name A and name B, incorporating both first names and last names. The response includes a boolean for matches and a float indicating the confidence level of the match. After defining the implementation, the first task is to push it to GitHub in our monorepo, where we host all our definition files.
00:19:23.310 We then follow the typical workflow for reviewing pull requests. A team member will assess the new definition, suggest any necessary edits, and once approved, will merge it into the master branch on GitHub. We use webhooks from GitHub to trigger a CircleCI build, which runs the autogenerated gRPC code and subsequently publishes that bundled code in a new gem version on our private gem server via Gemfury.
00:20:30.270 This workflow applies to multiple languages. Additionally, we also trigger a webhook to CodeAmp, our internal hosting service. This process generates HTML documentation, allowing anyone on our team to access it easily at IDL checker HQ Net, our internal intranet. This is a preview of our name matcher documentation, precisely mapping to our definition file while providing a user-friendly version anyone at the company can easily review.
00:21:54.240 To start using the autogenerated code, one needs to define a new source for the internal repository and run 'gem install our gem checker IDL' while specifying the version. We typically encourage compatibility, so clients can use older versions of our IDLs. A similar process applies to any language supporting gRPC. For instance, one could include 'checker IDL' in their requirements file and simply run 'pip install checker IDL'.
00:23:18.820 Next, let’s look at serving RPC requests in Ruby. A straightforward Ruby example involves initializing a service stub that points to the location of your name matcher service. You will then define a method called 'match' in your service definition that takes in a match request. From there, you can begin interacting with the match request object defined earlier.
00:24:43.720 The method should return a match response object, which acts as the final return statement in your match method. Importantly, there are no network details exposed in your method. You simply need to concentrate on the desired functionality.
00:25:56.780 To boot the server, you create a new instance of an RPC server, bind it to a local port, and specify which service definition it should handle. This setup allows your server to run indefinitely. To utilize it in Rails, you can include an open-source gem called gRPC that BigCommerce maintains. This gem allows you to incorporate many Rails paradigms into gRPC efficiently.
00:27:37.780 The primary design philosophy of gRPC is to create a controller class dedicated to each service. By placing these classes in your RPC directory, the method you define for each controller will correspond to the service definitions we just mapped out. Beyond the basic functionality, gRPC provides many useful features, including middleware, request logging, and authentication processes to bolster your service’s stability.
00:29:32.540 When making requests on your RPC client, you first need to create a new client stub pointed at your name matcher service's location, then build your request. You can access your definition objects to determine the expected format of requests. It becomes apparent that you can inject any kind of structure into your service.
00:31:40.680 As a case in point, we often encounter issues with names, where we might format them differently. For example, are Obi-Wan Kenobi and Ben Kenobi the same person? If you've seen Star Wars, you can quickly draw that conclusion, but as a business, we always need to verify if two names might represent the same entity.
00:32:56.600 To make a request, it’s as simple as calling 'client.match' with named arguments name A and name B. The response will be a Ruby object, from which you can easily check if the two names match and what the confidence level of that matching might be. One advantage of using Protocol Buffers is their applicability beyond customer-server communication. Checkr employs producer-consumer queues heavily, making it useful for stabilizing our communication.
00:34:09.840 In a simple example, if you needed to send a name IDL to a queue for consumer services, you’d encode the IDL object in Ruby, publish it to your queue, and then consumers in any programming language could decode and process that message.
00:35:08.160 This strategy is effective for enforcing boundaries within services, regardless of your architecture. Lastly, let’s discuss error handling. If you have any experience with networking, you know errors will occur. gRPC status codes align roughly with various HTTP status codes.
00:35:55.300 To throw an error on the server, simply ensure each name request includes a middle name. If it’s absent, raise a gRPC bad status with the invalid argument code while returning an appropriate error message to the client. While this might seem limiting when sending back error messages, a more structured approach could be used by defining a hash of errors akin to Active Record resource errors and serializing them into JSON.
00:37:02.760 You can define a common error response structure in your IDL, ensuring a consistent message format across your services. On the client side, you would handle Ruby exceptions, including details for easier debugging and user feedback.
00:38:22.490 To summarize a few takeaways from our journey at Checkr using gRPC: First, remember that REST can take you a long way. We still leverage REST extensively at Checkr. Second, gRPC is an opinionated framework that enhances service communication. Finally, we have seen an influence on our development cycle, prompting us to adopt contract-driven development. As we build new services, we carefully consider the boundaries of those services and how they should act.
00:39:32.560 A shout out to open-source tools that help; we extensively use gRPC and the associated framework called Gruff to improve its integration within Rails. For our internal documentation, we utilize ProtoC Gen Doc. Thank you.
00:40:31.850 One audience member: Is it true that it's one port per service?
00:40:55.330 Yes, it’s not necessarily just one port per service. You can run multiple services on a single server, but how you structure that configuration is up to your organizational needs.
00:41:59.080 Regarding dependencies for the libraries generated for gRPC, any application that needs to call your service must treat them as dependencies. With that, there is a build step required for managing those dependencies.
00:43:11.200 As your organization transitions, you will likely need to re-implement some data models into IDLs, ensuring all communicating services clearly understand the expected response data formats.
00:44:40.950 One audience member: Do you envision a world where Active Record and IDLs share some of the code or share the same model? That's an interesting idea and certainly feasible, reducing code duplication. The implementation nuances and structures are something we haven't explored much in depth.
00:45:54.950 Audience member: Do you use streaming at all? No, not currently. We haven't found a use case where that would be beneficial yet, although there are many potential applications.
00:47:26.100 Audience member: Do you only utilize it for trusted backends? For now, yes. Authentication hasn't been a pressing concern yet, given the environment we've built.
00:48:46.420 Audience member: What about switching to using JSON instead of Protocol Buffers? While technically feasible, we haven’t experimented with it because our existing architecture is integrated with Protocol Buffers.
00:50:09.470 Closing remarks: gRPC Web allows you to convert JSON requests into gRPC calls. Though our focus has predominantly been on internal communications, we may explore these solutions in the future. Thank you again!