00:00:00.000
Ready for takeoff.
00:00:18.480
I guess we can go ahead and get started. Um, what's up, everybody?
00:00:23.580
Post-lunch, I will try to keep everybody awake, I guess.
00:00:29.039
Um, but yeah, what's going on everyone? I'm Ryan, and I am one of the maintainers of
00:00:34.079
a library called Pyroscope. Today, I'm going to talk about continuous profiling,
00:00:40.079
specifically about flame graphs in Ruby. I don't know if anyone happened to be
00:00:46.559
here last year at RubyConf. I believe her name was Jade Dickinson. She gave a talk about profiling and flame graphs.
00:00:52.320
It was really good, and that's kind of where I got the basis for this talk from.
00:00:57.660
I thought she did a really good job. However, when you're using flame graphs in certain ways,
00:01:04.559
it only tells part of the story. I believe that adding continuous profiling—as opposed to normal profiling,
00:01:10.439
which I will explain the difference of shortly—tells a more complete story.
00:01:17.820
I will provide some examples of that at the end.
00:01:23.220
Quickly, our plan for today is to first cover what flame graphs are in general.
00:01:28.619
We will discuss the different types of profilers, how to transition from normal profiling to continuous profiling,
00:01:34.860
how to manage storage of profiling data, and provide examples of how you can use them to tell a story about your application.
00:01:40.619
There is no better place to start when talking about flame graphs than an illustration by Julia Evans.
00:01:45.659
As many of you know, she is quite popular in the Ruby community, and this illustration she made
00:01:51.720
explains in the abstract how flame graphs work.
00:01:57.360
For those who aren't familiar, basically how it works is that you start with an application and some sort of metric.
00:02:02.820
In this case, we don’t know what the metric is. But at the base, where you see 'main,'
00:02:08.099
that represents 100% of the time that this application was running.
00:02:14.700
Think of these flame graphs like a tree. Here, 'main' calls both 'alligator' and 'Panda,' which then work on something.
00:02:21.599
'Alligator' works on its tasks for 60% of the time, while 'Panda' does so for 40%.
00:02:27.540
'Alligator' calls 'bite' and 'teeth,' while 'Panda' calls 'bamboo.'
00:02:35.280
This structure helps you understand whatever metric you're analyzing.
00:02:41.760
You can see where to optimize resource utilization—whether that be CPU, memory, or some other type of metric.
00:02:48.660
Maybe something happened on a server, and you're trying to understand it.
00:02:54.720
Regardless, this gives you a nested pie chart—or a pie chart on steroids—of what your application is doing.
00:03:00.239
It helps you know where you should look if something is wrong or if you want to improve performance.
00:03:07.440
She also references Brendan Gregg, who is one of the pioneers of employing this technology at Netflix and other locations.
00:03:15.599
Next, I'll provide a less abstract version of how this translates into code.
00:03:21.720
This basic example is translated into the flame graph you see on the left.
00:03:26.879
In it, you see 'while true,' followed by 'fast function,' and 'slow function,' each running for two and eight units of time respectively.
00:03:33.540
These numbers are usually represented in samples.
00:03:39.260
This representation can be very useful for breaking down real pieces of code into a flame graph.
00:03:44.760
Now, I mentioned that I will talk about the different kinds of profilers.
00:03:51.660
There are really two main types that are popular.
00:03:58.200
One is the standard profilers, which I would call instrumented profilers.
00:04:04.080
These profilers are instrumented inside the classes or methods to report when they run.
00:04:09.659
Think of them as breakpoints that you would put at the beginning and end of every function.
00:04:15.180
These profilers break down how long a function takes to run.
00:04:20.519
This method provides a very accurate way to get profiling data and understand applications' runtime behaviors.
00:04:26.520
However, on the downside, if something goes wrong, it might be slightly less performant.
00:04:33.240
This may depend on how many functions you have or how you are using them.
00:04:39.960
It's more likely for them to break, hang, or fail if not properly executed.
00:04:46.440
The alternative to standard profilers is sampling profilers, which have become increasingly popular.
00:04:54.300
Sampling profilers work differently; instead of instrumenting code, they sample the stack trace at a certain frequency—like 100 times per second.
00:05:02.639
They analyze the stack trace regularly, sending that information somewhere else.
00:05:09.180
Thus, sampling allows for a less accurate yet broadly accurate depiction of the application's activities.
00:05:15.840
This method has less performance impact while still providing a wealth of profiling data.
00:05:22.380
In the Ruby community, there are several popular profiling tools.
00:05:29.699
Some examples include rbspy, stackprof, rack-mini-profiler, and app-profiler, which many people use for various use cases.
00:05:39.240
A good article from someone at Shopify breaks down these tools; I’ll include the link at the end of this talk.
00:05:45.960
They are all used for different purposes.
00:05:51.660
However, these tools are typically not designed for continuous profiling.
00:05:57.419
They often serve more for ad hoc investigation or debugging.
00:06:02.699
For example, you may SSH into a server to figure out why it’s acting strangely.
00:06:08.759
You might run rbspy to see what your application is doing.
00:06:15.000
Typically, these tools are used to profile specific time periods or actions.
00:06:22.140
Now, Pyroscope does something different.
00:06:29.700
What we'll discuss later is that Pyroscope takes data from various profilers.
00:06:37.140
In Pyroscope's case, it primarily uses rbspy.
00:06:44.520
The tool collects profiling data and sends it to an external server.
00:06:53.220
This means you don’t have to tell your application when to profile itself.
00:07:00.300
Pyroscope continuously profiles your application, allowing you to query data retrospectively—like Prometheus.
00:07:09.060
The component called a session manager determines how often to collect profiles.
00:07:15.300
It standardizes the format of the profiles and sends them to the Pyroscope server.
00:07:24.000
This server performs compression and efficiency operations to store data efficiently and query it effectively.
00:07:30.840
Thus, continuous profiling allows you to query and analyze data without additional overhead.
00:07:37.140
Now let's dive into some details regarding the implementation.
00:07:42.240
We've gone through a lot of iterations to ensure functionality for users.
00:07:50.520
There are two main options available:
00:07:56.580
one allows you to profile seamlessly without doing anything, and the other uses a breakpoint or tag.
00:08:05.280
This tagging feature allows for organization of profiles in a meaningful way.
00:08:11.640
In a Ruby context, this might look like getting a profile per controller or per action.
00:08:18.899
This feature allows you to analyze why a specific controller or action may be behaving a certain way.
00:08:26.400
To enable that, a Ruby application simply adds the gem.
00:08:31.920
You can use this tag wrapper to wrap any code you’re particularly interested in profiling.
00:08:39.000
Essentially, the Ruby gem adds metadata to the profiles collected by rbspy.
00:08:45.480
This way, you can later query based on this metadata or get a high-level view of the whole profile.
00:08:52.920
With this in mind, let’s look at what Pyroscope offers.
00:08:59.760
The user interface may differ from many profile tools you may be familiar with.
00:09:07.200
Typically, people expect to see flame graphs in fiery colors.
00:09:12.840
Pyroscope’s visualization organizes colors by package and supports a table view for clarity.
00:09:19.800
This specifics about the UI allows you to zoom in to various subsets of the application.
00:09:26.040
Additionally, you can employ a tag bar to see results by controller or action.
00:09:32.520
This UI design takes inspiration from other tools like Speed Scope.
00:09:39.780
With high cardinality and queryability, we face two main concerns: efficient storage and efficient querying.
00:09:46.740
Profiling data can be large since stack traces can be long.
00:09:53.520
As you can imagine, storing this data requires thoughtful management.
00:09:58.920
The first problem we solved was addressing the storage requirements.
00:10:05.760
Stack traces often have a lot of repetitive data.
00:10:12.600
By turning these traces into trees, we've found ways to compress the data, eliminating duplicated information.
00:10:20.220
For instance, if a file name appears multiple times, you need not repeat it in the data.
00:10:25.560
The symbol names can also have a lot of repetition.
00:10:32.460
We serialize these names and store a dictionary instead, like turning 'Net::HTTP.Request' into '0'.
00:10:40.260
This storage method conserves space and better accommodates high cardinality data.
00:10:46.620
The second concern becomes quickly retrieving stored data.
00:10:53.520
To accomplish this, we use something called segment trees.
00:10:59.760
Pyroscope collects data in 10-second chunks and sends them to the server.
00:11:06.480
When querying a day’s worth of data, these chunks must be merged.
00:11:13.920
Instead of just storing the data as 10-second chunks, we aggregate them at various granularities.
00:11:19.800
This way, we save time when processing queries.
00:11:26.520
For instance, instead of merging four 10-second blocks, you’d get one 40-second block plus a 10-second block.
00:11:34.680
This makes it much more efficient to retrieve the required data quickly.
00:11:41.160
Profiling is closely associated with latency, an important metric for companies like Amazon and Google.
00:11:47.880
For every millisecond in latency, they suffer revenue losses, hence the importance of profiling.
00:11:54.360
Amazon reported losing a billion dollars due to latency issues; this figure likely grows today.
00:12:01.440
Likewise, Google noted users being less likely to visit a site with increased load times.
00:12:08.160
In the case of ride-hailing apps, users tend to switch providers when one takes too long.
00:12:14.520
Profiling helps delineate application performance breakdowns, making it integral to the business.
00:12:20.760
Now, I’ll walk through a couple of practical examples.
00:12:28.500
The first is a real-world example demonstrating the ability to understand tags and how they break down profiles.
00:12:34.920
Imagine a simple rideshare application with three routes: one for car, one for bike, one for scooter.
00:12:41.400
We would tag these routes by vehicle node and also tag the regions they belong to.
00:12:49.500
With the corresponding data displayed, let's analyze the CPU utilization.
00:12:55.920
In this scenario, we may find that cars account for 60% of total CPU usage, while bikes and scooters
00:13:01.620
take up significantly less. Using Pyroscope, we can query each vehicle type to see how it performs.
00:13:08.160
We may find a region consuming a large amount of resources.
00:13:14.520
From this information, we construct a flame graph.
00:13:20.160
By comparing flame graphs, we can easily identify discrepancies—for example, if one region takes significantly more time than another.
00:13:26.040
This helps identify performance issues quickly to optimize resources.
00:13:32.520
Another example involves a view we adapted from Speed Scope, known as the sandwich view.
00:13:38.520
This allows us to identify commonly called functions, such as logging.
00:13:44.520
Often, logging can distribute its workload across many code paths.
00:13:50.760
In this view, you can access parent and child functions, helping you understand the logging process better.
00:13:57.240
This can lead to significant performance impacts.
00:14:03.600
Lastly, I want to share a personal story about the development of Pyroscope.
00:14:11.400
My colleague Dimitri and I were working at a company using Ruby when we realized the compression taking place.
00:14:17.400
At this company, we deployed the broadly package for compression.
00:14:23.880
Initially, we weren’t aware that the default level of compression is set to 11, which is the highest.
00:14:30.540
After running performance optimizations, we compared compression settings at levels 11 and 2.
00:14:38.040
The results showed that the maximum level of compression consumed drastically more CPU resources.
00:14:44.520
However, by lowering the compression level, we were able to save approximately 30%.
00:14:50.700
This allowed us to reduce the number of servers needed.
00:14:57.960
The only way we discovered this was through profiling, allowing us to analyze the performance implications.
00:15:03.840
Often, the most significant performance optimizations can come from reviewing functions commonly used throughout the code.
00:15:10.440
That brings us to a close on this talk.
00:15:17.520
I’d be happy to answer any remaining questions.
00:15:23.760
Thanks for attending, and enjoy the rest of your conference!
00:15:39.000
That's a good question.
00:15:42.600
The question was, sometimes if there’s a shift in your application behavior,
00:15:49.200
you may want to explore committing.
00:15:55.920
We often utilize tags internally.
00:16:01.260
For example, we tag commits to track which version lies behind each flame graph.
00:16:08.520
This helps understand performance changes over time.
00:16:14.640
Typically, you'll need some additional tools to correlate application changes.
00:16:20.160
In a real-world scenario, increased CPU usage could align with holidays.
00:16:27.030
This demonstrates how profiling can help deduce whether performance is expected.
00:16:34.200
Yes, we utilize Sidekiq in conjunction, running it as a separate application.
00:16:42.360
Although I can't showcase the actual code here, we track them as separate applications.
00:16:48.660
You can tag Sidekiq jobs if you wish, or set them as different applications for monitoring.
00:16:55.440
This method simplifies debugging Sidekiq worker issues.
00:17:03.000
With that in mind, regarding the example of compression,
00:17:09.240
we approached it as an A/B testing scenario.
00:17:15.480
In actual practice, you could compare a staged server versus a live server.
00:17:22.200
This allows for performance observation across different stages, providing insights.
00:17:28.560
Thank you once again. Feel free to ask more questions!
00:17:36.840
Have a great time at the rest of the conference!