Serialization

RPC Frameworks Overview

RPC Frameworks Overview

by Yulia Oletskaya

In the video titled 'RPC Frameworks Overview' presented by Yulia Oletskaya at the Balkan Ruby 2019 conference, the main focus is on Remote Procedure Calls (RPC) and their increasing relevance in web development, especially in the context of microservices architectures. Yulia begins with a definition of RPC, emphasizing its role as a protocol for one program to request a service from another located on different computers. She explains that despite being conceptualized in the 1970s, RPC has gained momentum in the modern web development landscape.

Key points discussed include:

- Migration Experience: Yulia shares her experience with migrating a system from an internal RESTful JSON API to an RPC-based architecture involving over 50 services.

- Technical Stack: The new architecture includes an RPC server and utilizes Amazon SQS for messaging, improving performance through direct service communication and maintaining schema consistency for easier versioning.

- RPC Frameworks Overview: The talk highlights various RPC frameworks such as gRPC, Apache Thrift, and Cap'n Proto, detailing their advantages and disadvantages:
- gRPC: Popular due to its support from Google, it offers efficient serialization with protocol buffers but has caveats such as a delay in language updates.
- Envoy: Designed to support gRPC and tackle load balancer issues, this framework enhances ease of use but is limited in some advanced features.
- Apache Thrift: An older framework with extensive language support, yet with lesser community engagement compared to gRPC.
- Cap'n Proto: Claims higher speed than gRPC by not relying on traditional encoding but has limited Ruby support.
- Conclusion: The importance of selecting the appropriate RPC framework based on application needs is underscored. Yulia concludes the talk by encouraging the audience to consider their specific requirements when choosing an RPC solution.

Overall, this presentation provides valuable insights into the various RPC frameworks available and practical migration experiences from traditional APIs to more modern, efficient communication protocols.

00:00:21 Thank you for the introduction! It's actually the best introduction I've ever had, so I appreciate it.
00:00:27 Hello everyone! I know this is the end of the day, and I guess everyone is tired already.
00:00:34 But this talk will be tough, as I have 80 slides to fit into a limited time. I need about 30 seconds per slide.
00:00:42 I hope I can stay within the time limit. We're also going to have examples of encoding civilization and binary data.
00:00:47 So I hope you are ready for it! Here's a slide with some information about me. You can see my favorite photo here.
00:01:00 I work as the lead web developer at a company called Gel, which is an outsourcing company in Minsk. I'm also an open-source contributor.
00:01:13 Let's start by defining some terms. First, what is RPC? RPC stands for 'Remote Procedure Call.'
00:01:26 It's a protocol that one program can use to request a service from another program located on a different computer. This term is not new.
00:01:40 In fact, it was first described back in the 1970s and has been widely used in distributed programming.
00:01:51 Recently, it has become increasingly popular in the context of web development, particularly with the rise of microservices architecture.
00:02:02 Let me share why I decided to give this talk.
00:02:09 For the last two years, I have been working on a significant project consisting of more than 50 services.
00:02:20 We were migrating our system from an internal RESTful JSON API to an RPC-based communication solution.
00:02:27 Here's an example of what it looked like before the migration.
00:02:32 This is a simplified example just to give you a basic idea. You can see at the end, AWS Lambda sends a RESTful JSON request.
00:02:45 We send this to the Management Service as an entry point, which then sends RESTful JSON requests to our internal API and other services.
00:02:57 We also used a message queue, which is what we received. In our migration, we made some changes to our tech stack.
00:03:12 Now, we are using Amazon SQS as our message queue, and at the center of the architecture is an RPC server.
00:03:18 This server maintains connections with our services, which now communicate via the RPC protocol.
00:03:30 This may seem like a single point of failure system, but it is not as I was just too lazy to draw it out properly.
00:03:41 We are using a HashiCorp stack and service discovery, which actually helps our system to scale.
00:03:53 During peak times, we operate several instances of the RPC server.
00:04:05 At this point, some of you may be wondering why we needed it and what the purpose is.
00:04:11 The answer consists of several parts. The first aspect is performance, as RPC comes with serialization.
00:04:23 XML is quite slow, while JSON is convenient yet relatively expensive.
00:04:32 We wanted our services to communicate directly, avoiding the overhead of using a reverse proxy.
00:04:43 Now, we have a keep-alive connection, which simplifies versioning since all our services now share a schema definition.
00:04:56 This makes it easier to maintain backward compatibility or to implement changes in the services.
00:05:11 Now, let’s define what an RPC framework is. Typically, it provides both a client and a server that help implement the connection management.
00:05:25 We must remember that RPC frameworks are usually not monolithic.
00:05:31 They consist of a collection of technologies, allowing you to choose the transport technology and serialization methods.
00:05:44 In 2019, there are quite a few options available, including gRPC, Apache Thrift, Cap'n Proto, and many others.
00:05:51 We will have a quick overview of each one, outlining the pros and cons.
00:06:02 The first one, gRPC, is the most trendy and is backed by Google. It uses protocol buffers for service definitions and supports features like bidirectional streaming.
00:06:16 As a spoiler alert, this is actually what we are using on our production servers.
00:06:30 gRPC supports various technologies including Java, Golang, Python, C++, and more. Notably, Ruby is part of the list, but at a later stage.
00:06:44 This means that updates, security fixes, and enhancements are released for the primary languages first, and others have to wait.
00:07:02 For example, when we integrated from Ruby 2.5 to 2.6, we had to wait about two weeks for an update on the protobuf library.
00:07:16 This two-week wait can be critical for our team, so it's something we need to keep in mind.
00:07:28 For transport, gRPC offers several options including HTTP/2 as the default, and other options like Cronet for Android apps.
00:07:45 Another option is in-process routing when the client and server are within the same process, allowing for direct message passing.
00:08:01 In terms of RPC frameworks, protocol refers to serialization. gRPC supports protocol buffers, flatbuffers, and others.
00:08:12 We will take a closer look at protocol buffers since this is the most popular and the default option.
00:08:28 Protocol buffers guarantee type safety and backward compatibility. They offer size efficiency, being much smaller than JSON or XML.
00:08:43 Here’s an example of a proto file. You can see that it shares some similarities with JSON.
00:08:56 In this file, you can define message types, including fields of varying types, along with unique identifiers.
00:09:06 Additionally, services are defined at the bottom of the file, indicating what data each function takes and what it returns.
00:09:18 Now, let’s delve into how protocol buffers work. Typically, they operate as a black box, so you don’t need to worry about the details.
00:09:30 However, as engineers, we often have a curious mind about how data is compressed to be smaller than JSON.
00:09:46 Let's consider how a message is serialized. For instance, if we need to serialize a message type with a string content, it produces a specific byte sequence.
00:09:58 The first byte will always contain a wire type key followed by the field ID. If the first bit is significant, it indicates more data follows.
00:10:12 If it does not fit into one byte, that means more bytes need to be read.
00:10:25 Now, let’s examine the comparisons with regular JSON requests. For example, sending 'Hello' would result in the typical request and response.
00:10:37 When observing the data size, it's usually 25 bytes for regular JSON requests.
00:10:49 In contrast, a gRPC request is more complex and the actual transmitted data is much smaller.
00:11:01 While the data size is reduced, be mindful that sending smaller and rare messages may not justify the switch to gRPC.
00:11:15 However, for frequent messages, using gRPC can lead to significant bandwidth savings.
00:11:27 Additionally, gRPC comes with a protocol compiler that generates the client and server code from your proto files.
00:11:38 In the case of Ruby, there’s a library called grpc-tools, which handles this process.
00:11:48 The file structure is generally straightforward. The RPC proto file exists in the root, generating the necessary files for messages and service definitions.
00:12:03 The generated code is relatively concise and quite efficient. Pros of gRPC include great documentation and a large community.
00:12:15 You can find a multitude of resources and assistance on platforms like Stack Overflow for any queries you may encounter.
00:12:30 Also, gRPC supports HTTP/2, which is excellent, enabling full-duplex communication where messages can flow in both directions.
00:12:43 However, there are some drawbacks. gRPC has a reputation for having numerous bugs.
00:12:55 A notable issue is that it works with HTTP/2, which poses complications with load balancers that may only support HTTP/1.
00:13:06 In our case, we use HashiCorp's service discovery and our not-so-smart load balancing system for simplicity.
00:13:19 Another framework worth mentioning is Envoy, which is designed specifically to support gRPC.
00:13:30 This framework was developed by the folks at Twitch for interacting with Amazon load balancers while maintaining compatibility with HTTP/1.
00:13:41 Envoy supports a wide range of clients and servers, and the project structure mirrors that of gRPC.
00:13:53 The pros of Envoy revolve around the ease of use and the ability to integrate with established systems, where many have claimed it's superior.
00:14:05 However, its limitations include a lack of support for advanced features like full duplex streams compared to gRPC.
00:14:17 Next, we have Apache Thrift, which is one of the oldest frameworks and is backed by Facebook.
00:14:31 Facebook made a wise decision by open-sourcing Thrift, allowing many companies to adopt this solution.
00:14:43 It is said that an intern at Google contributed some foundational ideas to Thrift before it became popular.
00:14:57 As a result, Thrift shares many similarities with gRPC, including support for a wide range of programming languages.
00:15:10 In terms of pros, Thrift provides extensive language support, making it suitable for diverse systems.
00:15:25 However, the level of community engagement seems lower compared to gRPC, which may impact long-term viability.
00:15:39 Next on our list is Cap'n Proto, developed by a former tech lead for protocol buffers.
00:15:51 The claim is that it is significantly faster than gRPC as it doesn’t rely on encoding. However, this statement is slightly misleading.
00:16:04 While it does not use serialization in a traditional sense, it still requires a form of data arrangement.
00:16:17 When we look at implementations in Cap'n Proto, we can see a variety of options emerging, but Ruby support remains limited.
00:16:29 Next is Finagle, which is a transport-agnostic system designed specifically for the JVM.
00:16:41 Finagle integrates well with Scala and provides some unique features that cannot be found in other RPC frameworks.
00:16:56 However, there is limited information available on its compatibility with JRuby, which may hinder its usefulness.
00:17:08 Following that, we have Dubbo, created by Alibaba, which is a Java-based RPC framework in an incubating stage.
00:17:20 Dubbo is unique due to its complexity, involving various abstractions like consumer, provider, and container registry.
00:17:34 There is also a comprehensive UI for tracking message exchanges and monitoring performance.
00:17:42 Another option is Apache Avro, backed by the Hadoop project, focusing on dynamic typing and JSON-defined schemas.
00:17:54 It sends the schema before trying to process data, which is usually employed in larger data tasks.
00:18:06 In the case of benchmarks, they vary widely, and some can be misleading due to differing parameters being tested.
00:18:20 It's crucial to choose the right RPC framework based on the specific requirements of your application.
00:18:33 Lastly, there's distributed Ruby, which was popular in the past and is still actively used by Red Hat.
00:18:45 Distributed Ruby also has a new feature called parallel tests, which can leverage multiple Ruby processes.
00:19:00 And with that, I tried to fit everything into my 40-minute slot!