00:00:03.840
Hello everyone.
00:00:17.539
Congratulations on the release of Splatoon 3. I'm super happy to be here at RubyKaigi again.
00:00:29.640
My name is Uchio Kondo, and I work as an infrastructure engineer at a Japanese startup that provides live streaming services for mobile devices.
00:00:42.840
Today, I'm here to talk about BPF.
00:00:50.160
However, most people probably don't know what BPF is. To grab your attention, I've prepared an easy-to-understand demonstration.
00:00:56.460
Have you read the paper about the Cloud of Reality? It discusses how connections accumulate in a queue each time the server accepts a connection.
00:01:07.799
Here, we have two environments to experiment with: the same Ruby server and the same configuration, but one parameter is different: the 'net.core.somaxconn'. It is set to 500 for one and 4000 for the other.
00:01:29.880
The results of the experiment show that with a maximum of 500, the response time is clearly more erratic compared to when it is 4000. This can be assumed to be due to queue saturation.
00:01:40.220
I've written some short Ruby code to create a tool that visualizes the current and next capacity of the accept queue.
00:01:48.780
Now, I will show you the demo.
00:03:04.200
As you can see, this tool is purely written in Ruby and C, enabling us to visualize the server status. So, this is BPF. Thank you. Now, let's have a quick introduction to BPF.
00:03:54.720
BPF is somewhat complicated; therefore, I've prepared an overall diagram that you can look at later. However, I'll describe a short version of BPF information.
00:04:20.040
BPF, or Berkeley Packet Filter, was first introduced to the public by a 1992 paper. It was then incorporated into several operating systems, and BPF was introduced into Linux in 1997 as what is called Classic BPF.
00:05:00.680
Significant changes were made to BPF in 2014, when it became known as Extended BPF, leading to increased usage scenarios and the addition of various features required by contemporary containerized environments.
00:05:27.419
I will briefly describe how it works. BPF programs are first created from scripts or C source code, compiled into binaries, and then loaded into the kernel with a BPF system call. They are executed in the kernel space, and the results are collected and retrieved into user space via a storage structure called a BPF map or through other mechanisms.
00:06:06.860
It's essential to remember that BPF facilitates exchange between user space and the kernel. Some technologies like BTF or BTF trace require a specially configured kernel and are not yet considered user-friendly.
00:07:04.200
Now it's time to compare BPF with other tools. There are several existing tools offering similar functionality to BPF, but today I want to focus on two specific tools. Do you know about 'estrace'? It provides a powerful tracing system for syscall but has a significant performance overhead due to the constant stopping and resuming of processes.
00:08:04.500
The next tool is 'perf.' Perf is useful for identifying bottlenecks in functions within a program or an entire system. However, remember that it samples data rather than tracing every function call, which is a strength of BPF.
00:09:02.770
BPF is used extensively in many aspects of systems for original packet filtering, advanced networking configurations, kernel modules, and more. Typical examples of whenever BPF is utilized include Cilium and Falco.
00:10:01.380
Falco provides security features such as threat detection, and these tools are widely used in cloud-native applications. Another interesting tool is RBBCC, created by Javier H. Kondo. This tool observes Ruby's performance and aids in debugging.
00:11:08.040
RBBCC stands for BPF Compiler Collection for Ruby. While BCC, which stands for BPF Compiler Collection, supports scripting languages like Python and Lua, it does not support Ruby. Thus, I created RBBCC to utilize Ruby to handle BCC functionality.
00:12:03.440
In using RBBCC, we can trace functions defined in the Linux kernel. For example, we can observe considerable function calls in kernel space.
00:12:47.520
In the writing of BCC or RBBCC code, there is a segment written in C that defines the function to call every time a specific function is invoked. For instance, we will increment a counter when a certain function is called.
00:13:40.710
As a demo, when we monitor the function, we attach it and allow it to collect data every three seconds to show the current BPF map's data.
00:15:00.320
The first demo involved using the TCP 'accept' function in the kernel. Both C and Ruby code are integrated into this tool, producing specific output. This showcases the first key part of function tracing.
00:16:02.519
Next, we have static trace points, which offer a defined entry point to leverage and monitor events without changing configurations.
00:17:06.179
In the Ruby environment, we can establish user-defined static trace points. These can be employed to monitor latency and performance characteristics effectively.
00:18:34.280
So far, I've provided about RBBCC and its functionality.
00:18:51.900
Now, let's pivot towards real-world tuning. For example, I created a sample JSON parser named Reston, written in Rust. It functions well, but it's somewhat slow.
00:19:15.500
In my benchmarking code, I invoked the function 50,000 times, and the result showed a clear lack of performance. To improve it, I started by identifying the first command invoked.
00:20:37.800
This is where I used perf to get an overview of the bottlenecks, which led to creating a flame graph for analysis.
00:21:16.020
Upon investigating deeper into the functions indicated by RBBCC, significant differences arose based on various invocation counts.
00:21:53.620
Each time I monitored the performance, I noticed that certain methods, particularly in Rust’s memory management, were hot spots for optimization.
00:23:10.680
After several iterations and refinements in my code, I managed to reduce the execution time of the Rust parser significantly.
00:24:05.160
In conclusion, tools like BPF empower developers to observe performance at granular levels, enabling significant optimizations in real time.
00:25:01.300
I want to acknowledge that RBBCC received support from the Ruby Association Grant in 2019, and I am grateful for the advice and mentoring I received.
00:25:24.679
Thank you for your attention.