00:00:21
Thank you for the introduction! It's actually the best introduction I've ever had, so I appreciate it.
00:00:27
Hello everyone! I know this is the end of the day, and I guess everyone is tired already.
00:00:34
But this talk will be tough, as I have 80 slides to fit into a limited time. I need about 30 seconds per slide.
00:00:42
I hope I can stay within the time limit. We're also going to have examples of encoding civilization and binary data.
00:00:47
So I hope you are ready for it! Here's a slide with some information about me. You can see my favorite photo here.
00:01:00
I work as the lead web developer at a company called Gel, which is an outsourcing company in Minsk. I'm also an open-source contributor.
00:01:13
Let's start by defining some terms. First, what is RPC? RPC stands for 'Remote Procedure Call.'
00:01:26
It's a protocol that one program can use to request a service from another program located on a different computer. This term is not new.
00:01:40
In fact, it was first described back in the 1970s and has been widely used in distributed programming.
00:01:51
Recently, it has become increasingly popular in the context of web development, particularly with the rise of microservices architecture.
00:02:02
Let me share why I decided to give this talk.
00:02:09
For the last two years, I have been working on a significant project consisting of more than 50 services.
00:02:20
We were migrating our system from an internal RESTful JSON API to an RPC-based communication solution.
00:02:27
Here's an example of what it looked like before the migration.
00:02:32
This is a simplified example just to give you a basic idea. You can see at the end, AWS Lambda sends a RESTful JSON request.
00:02:45
We send this to the Management Service as an entry point, which then sends RESTful JSON requests to our internal API and other services.
00:02:57
We also used a message queue, which is what we received. In our migration, we made some changes to our tech stack.
00:03:12
Now, we are using Amazon SQS as our message queue, and at the center of the architecture is an RPC server.
00:03:18
This server maintains connections with our services, which now communicate via the RPC protocol.
00:03:30
This may seem like a single point of failure system, but it is not as I was just too lazy to draw it out properly.
00:03:41
We are using a HashiCorp stack and service discovery, which actually helps our system to scale.
00:03:53
During peak times, we operate several instances of the RPC server.
00:04:05
At this point, some of you may be wondering why we needed it and what the purpose is.
00:04:11
The answer consists of several parts. The first aspect is performance, as RPC comes with serialization.
00:04:23
XML is quite slow, while JSON is convenient yet relatively expensive.
00:04:32
We wanted our services to communicate directly, avoiding the overhead of using a reverse proxy.
00:04:43
Now, we have a keep-alive connection, which simplifies versioning since all our services now share a schema definition.
00:04:56
This makes it easier to maintain backward compatibility or to implement changes in the services.
00:05:11
Now, let’s define what an RPC framework is. Typically, it provides both a client and a server that help implement the connection management.
00:05:25
We must remember that RPC frameworks are usually not monolithic.
00:05:31
They consist of a collection of technologies, allowing you to choose the transport technology and serialization methods.
00:05:44
In 2019, there are quite a few options available, including gRPC, Apache Thrift, Cap'n Proto, and many others.
00:05:51
We will have a quick overview of each one, outlining the pros and cons.
00:06:02
The first one, gRPC, is the most trendy and is backed by Google. It uses protocol buffers for service definitions and supports features like bidirectional streaming.
00:06:16
As a spoiler alert, this is actually what we are using on our production servers.
00:06:30
gRPC supports various technologies including Java, Golang, Python, C++, and more. Notably, Ruby is part of the list, but at a later stage.
00:06:44
This means that updates, security fixes, and enhancements are released for the primary languages first, and others have to wait.
00:07:02
For example, when we integrated from Ruby 2.5 to 2.6, we had to wait about two weeks for an update on the protobuf library.
00:07:16
This two-week wait can be critical for our team, so it's something we need to keep in mind.
00:07:28
For transport, gRPC offers several options including HTTP/2 as the default, and other options like Cronet for Android apps.
00:07:45
Another option is in-process routing when the client and server are within the same process, allowing for direct message passing.
00:08:01
In terms of RPC frameworks, protocol refers to serialization. gRPC supports protocol buffers, flatbuffers, and others.
00:08:12
We will take a closer look at protocol buffers since this is the most popular and the default option.
00:08:28
Protocol buffers guarantee type safety and backward compatibility. They offer size efficiency, being much smaller than JSON or XML.
00:08:43
Here’s an example of a proto file. You can see that it shares some similarities with JSON.
00:08:56
In this file, you can define message types, including fields of varying types, along with unique identifiers.
00:09:06
Additionally, services are defined at the bottom of the file, indicating what data each function takes and what it returns.
00:09:18
Now, let’s delve into how protocol buffers work. Typically, they operate as a black box, so you don’t need to worry about the details.
00:09:30
However, as engineers, we often have a curious mind about how data is compressed to be smaller than JSON.
00:09:46
Let's consider how a message is serialized. For instance, if we need to serialize a message type with a string content, it produces a specific byte sequence.
00:09:58
The first byte will always contain a wire type key followed by the field ID. If the first bit is significant, it indicates more data follows.
00:10:12
If it does not fit into one byte, that means more bytes need to be read.
00:10:25
Now, let’s examine the comparisons with regular JSON requests. For example, sending 'Hello' would result in the typical request and response.
00:10:37
When observing the data size, it's usually 25 bytes for regular JSON requests.
00:10:49
In contrast, a gRPC request is more complex and the actual transmitted data is much smaller.
00:11:01
While the data size is reduced, be mindful that sending smaller and rare messages may not justify the switch to gRPC.
00:11:15
However, for frequent messages, using gRPC can lead to significant bandwidth savings.
00:11:27
Additionally, gRPC comes with a protocol compiler that generates the client and server code from your proto files.
00:11:38
In the case of Ruby, there’s a library called grpc-tools, which handles this process.
00:11:48
The file structure is generally straightforward. The RPC proto file exists in the root, generating the necessary files for messages and service definitions.
00:12:03
The generated code is relatively concise and quite efficient. Pros of gRPC include great documentation and a large community.
00:12:15
You can find a multitude of resources and assistance on platforms like Stack Overflow for any queries you may encounter.
00:12:30
Also, gRPC supports HTTP/2, which is excellent, enabling full-duplex communication where messages can flow in both directions.
00:12:43
However, there are some drawbacks. gRPC has a reputation for having numerous bugs.
00:12:55
A notable issue is that it works with HTTP/2, which poses complications with load balancers that may only support HTTP/1.
00:13:06
In our case, we use HashiCorp's service discovery and our not-so-smart load balancing system for simplicity.
00:13:19
Another framework worth mentioning is Envoy, which is designed specifically to support gRPC.
00:13:30
This framework was developed by the folks at Twitch for interacting with Amazon load balancers while maintaining compatibility with HTTP/1.
00:13:41
Envoy supports a wide range of clients and servers, and the project structure mirrors that of gRPC.
00:13:53
The pros of Envoy revolve around the ease of use and the ability to integrate with established systems, where many have claimed it's superior.
00:14:05
However, its limitations include a lack of support for advanced features like full duplex streams compared to gRPC.
00:14:17
Next, we have Apache Thrift, which is one of the oldest frameworks and is backed by Facebook.
00:14:31
Facebook made a wise decision by open-sourcing Thrift, allowing many companies to adopt this solution.
00:14:43
It is said that an intern at Google contributed some foundational ideas to Thrift before it became popular.
00:14:57
As a result, Thrift shares many similarities with gRPC, including support for a wide range of programming languages.
00:15:10
In terms of pros, Thrift provides extensive language support, making it suitable for diverse systems.
00:15:25
However, the level of community engagement seems lower compared to gRPC, which may impact long-term viability.
00:15:39
Next on our list is Cap'n Proto, developed by a former tech lead for protocol buffers.
00:15:51
The claim is that it is significantly faster than gRPC as it doesn’t rely on encoding. However, this statement is slightly misleading.
00:16:04
While it does not use serialization in a traditional sense, it still requires a form of data arrangement.
00:16:17
When we look at implementations in Cap'n Proto, we can see a variety of options emerging, but Ruby support remains limited.
00:16:29
Next is Finagle, which is a transport-agnostic system designed specifically for the JVM.
00:16:41
Finagle integrates well with Scala and provides some unique features that cannot be found in other RPC frameworks.
00:16:56
However, there is limited information available on its compatibility with JRuby, which may hinder its usefulness.
00:17:08
Following that, we have Dubbo, created by Alibaba, which is a Java-based RPC framework in an incubating stage.
00:17:20
Dubbo is unique due to its complexity, involving various abstractions like consumer, provider, and container registry.
00:17:34
There is also a comprehensive UI for tracking message exchanges and monitoring performance.
00:17:42
Another option is Apache Avro, backed by the Hadoop project, focusing on dynamic typing and JSON-defined schemas.
00:17:54
It sends the schema before trying to process data, which is usually employed in larger data tasks.
00:18:06
In the case of benchmarks, they vary widely, and some can be misleading due to differing parameters being tested.
00:18:20
It's crucial to choose the right RPC framework based on the specific requirements of your application.
00:18:33
Lastly, there's distributed Ruby, which was popular in the past and is still actively used by Red Hat.
00:18:45
Distributed Ruby also has a new feature called parallel tests, which can leverage multiple Ruby processes.
00:19:00
And with that, I tried to fit everything into my 40-minute slot!