A Tale of Two Flamegraphs: Continuous Profiling in Ruby

by Ryan Perry

In the talk 'A Tale of Two Flamegraphs: Continuous Profiling in Ruby,' Ryan Perry provides an insightful exploration into continuous profiling in Ruby using the Pyroscope gem, which leverages features from rbspy and speedscope for performance analysis. He begins by defining flame graphs, illustrating how they represent the time spent in different parts of an application, ultimately revealing optimization opportunities. Key points discussed include:

Types of Profilers: Perry distinguishes between instrumented profilers, which instrument methods to capture execution time, and sampling profilers, which capture stack traces at regular intervals, providing a less accurate but broader view of application behavior.
Continuous Profiling: He explains how Pyroscope enhances profiling by continuously collecting data from ephemeral sessions without needing specific commands from the application, allowing retrospective queries akin to monitoring tools like Prometheus.
Pyroscope Features: Pyroscope organizes profiling data meaningfully through tagging and metadata, enabling developers to analyze specific functions or controllers. Its user interface is designed to present flame graphs and tabular data, offering insights into CPU usage and resource distribution across different operations.
Storage and Query Efficiency: The talk elaborates on challenges related to data storage and querying efficiencies, detailing methods like tree-like compression of stack traces to minimize space requirements, and aggregate data collection within defined time chunks to expedite query responses.
Real-World Examples: Perry presents practical scenarios, such as analyzing CPU usage across different routes in a rideshare application to identify performance bottlenecks and optimize resource allocation, and shares an anecdote about refining compression settings to enhance performance based on profiling insights.

In conclusion, continuous profiling with Pyroscope provides a powerful means for Ruby developers to understand application performance dynamics, address latency issues proactively, and produce significant performance improvements through data-driven decisions. Perry emphasizes that the insights gained from profiling can lead to substantial operational enhancements in software applications, particularly for high-stakes environments prone to performance fluctuations.

00:00:00.000 Ready for takeoff.

00:00:18.480 I guess we can go ahead and get started. Um, what's up, everybody?

00:00:23.580 Post-lunch, I will try to keep everybody awake, I guess.

00:00:29.039 Um, but yeah, what's going on everyone? I'm Ryan, and I am one of the maintainers of

00:00:34.079 a library called Pyroscope. Today, I'm going to talk about continuous profiling,

00:00:40.079 specifically about flame graphs in Ruby. I don't know if anyone happened to be

00:00:46.559 here last year at RubyConf. I believe her name was Jade Dickinson. She gave a talk about profiling and flame graphs.

00:00:52.320 It was really good, and that's kind of where I got the basis for this talk from.

00:00:57.660 I thought she did a really good job. However, when you're using flame graphs in certain ways,

00:01:04.559 it only tells part of the story. I believe that adding continuous profiling—as opposed to normal profiling,

00:01:10.439 which I will explain the difference of shortly—tells a more complete story.

00:01:17.820 I will provide some examples of that at the end.

00:01:23.220 Quickly, our plan for today is to first cover what flame graphs are in general.

00:01:28.619 We will discuss the different types of profilers, how to transition from normal profiling to continuous profiling,

00:01:34.860 how to manage storage of profiling data, and provide examples of how you can use them to tell a story about your application.

00:01:40.619 There is no better place to start when talking about flame graphs than an illustration by Julia Evans.

00:01:45.659 As many of you know, she is quite popular in the Ruby community, and this illustration she made

00:01:51.720 explains in the abstract how flame graphs work.

00:01:57.360 For those who aren't familiar, basically how it works is that you start with an application and some sort of metric.

00:02:02.820 In this case, we don’t know what the metric is. But at the base, where you see 'main,'

00:02:08.099 that represents 100% of the time that this application was running.

00:02:14.700 Think of these flame graphs like a tree. Here, 'main' calls both 'alligator' and 'Panda,' which then work on something.

00:02:21.599 'Alligator' works on its tasks for 60% of the time, while 'Panda' does so for 40%.

00:02:27.540 'Alligator' calls 'bite' and 'teeth,' while 'Panda' calls 'bamboo.'

00:02:35.280 This structure helps you understand whatever metric you're analyzing.

00:02:41.760 You can see where to optimize resource utilization—whether that be CPU, memory, or some other type of metric.

00:02:48.660 Maybe something happened on a server, and you're trying to understand it.

00:02:54.720 Regardless, this gives you a nested pie chart—or a pie chart on steroids—of what your application is doing.

00:03:00.239 It helps you know where you should look if something is wrong or if you want to improve performance.

00:03:07.440 She also references Brendan Gregg, who is one of the pioneers of employing this technology at Netflix and other locations.

00:03:15.599 Next, I'll provide a less abstract version of how this translates into code.

00:03:21.720 This basic example is translated into the flame graph you see on the left.

00:03:26.879 In it, you see 'while true,' followed by 'fast function,' and 'slow function,' each running for two and eight units of time respectively.

00:03:33.540 These numbers are usually represented in samples.

00:03:39.260 This representation can be very useful for breaking down real pieces of code into a flame graph.

00:03:44.760 Now, I mentioned that I will talk about the different kinds of profilers.

00:03:51.660 There are really two main types that are popular.

00:03:58.200 One is the standard profilers, which I would call instrumented profilers.

00:04:04.080 These profilers are instrumented inside the classes or methods to report when they run.

00:04:09.659 Think of them as breakpoints that you would put at the beginning and end of every function.

00:04:15.180 These profilers break down how long a function takes to run.

00:04:20.519 This method provides a very accurate way to get profiling data and understand applications' runtime behaviors.

00:04:26.520 However, on the downside, if something goes wrong, it might be slightly less performant.

00:04:33.240 This may depend on how many functions you have or how you are using them.

00:04:39.960 It's more likely for them to break, hang, or fail if not properly executed.

00:04:46.440 The alternative to standard profilers is sampling profilers, which have become increasingly popular.

00:04:54.300 Sampling profilers work differently; instead of instrumenting code, they sample the stack trace at a certain frequency—like 100 times per second.

00:05:02.639 They analyze the stack trace regularly, sending that information somewhere else.

00:05:09.180 Thus, sampling allows for a less accurate yet broadly accurate depiction of the application's activities.

00:05:15.840 This method has less performance impact while still providing a wealth of profiling data.

00:05:22.380 In the Ruby community, there are several popular profiling tools.

00:05:29.699 Some examples include rbspy, stackprof, rack-mini-profiler, and app-profiler, which many people use for various use cases.

00:05:39.240 A good article from someone at Shopify breaks down these tools; I’ll include the link at the end of this talk.

00:05:45.960 They are all used for different purposes.

00:05:51.660 However, these tools are typically not designed for continuous profiling.

00:05:57.419 They often serve more for ad hoc investigation or debugging.

00:06:02.699 For example, you may SSH into a server to figure out why it’s acting strangely.

00:06:08.759 You might run rbspy to see what your application is doing.

00:06:15.000 Typically, these tools are used to profile specific time periods or actions.

00:06:22.140 Now, Pyroscope does something different.

00:06:29.700 What we'll discuss later is that Pyroscope takes data from various profilers.

00:06:37.140 In Pyroscope's case, it primarily uses rbspy.

00:06:44.520 The tool collects profiling data and sends it to an external server.

00:06:53.220 This means you don’t have to tell your application when to profile itself.

00:07:00.300 Pyroscope continuously profiles your application, allowing you to query data retrospectively—like Prometheus.

00:07:09.060 The component called a session manager determines how often to collect profiles.

00:07:15.300 It standardizes the format of the profiles and sends them to the Pyroscope server.

00:07:24.000 This server performs compression and efficiency operations to store data efficiently and query it effectively.

00:07:30.840 Thus, continuous profiling allows you to query and analyze data without additional overhead.

00:07:37.140 Now let's dive into some details regarding the implementation.

00:07:42.240 We've gone through a lot of iterations to ensure functionality for users.

00:07:50.520 There are two main options available:

00:07:56.580 one allows you to profile seamlessly without doing anything, and the other uses a breakpoint or tag.

00:08:05.280 This tagging feature allows for organization of profiles in a meaningful way.

00:08:11.640 In a Ruby context, this might look like getting a profile per controller or per action.

00:08:18.899 This feature allows you to analyze why a specific controller or action may be behaving a certain way.

00:08:26.400 To enable that, a Ruby application simply adds the gem.

00:08:31.920 You can use this tag wrapper to wrap any code you’re particularly interested in profiling.

00:08:39.000 Essentially, the Ruby gem adds metadata to the profiles collected by rbspy.

00:08:45.480 This way, you can later query based on this metadata or get a high-level view of the whole profile.

00:08:52.920 With this in mind, let’s look at what Pyroscope offers.

00:08:59.760 The user interface may differ from many profile tools you may be familiar with.

00:09:07.200 Typically, people expect to see flame graphs in fiery colors.

00:09:12.840 Pyroscope’s visualization organizes colors by package and supports a table view for clarity.

00:09:19.800 This specifics about the UI allows you to zoom in to various subsets of the application.

00:09:26.040 Additionally, you can employ a tag bar to see results by controller or action.

00:09:32.520 This UI design takes inspiration from other tools like Speed Scope.

00:09:39.780 With high cardinality and queryability, we face two main concerns: efficient storage and efficient querying.

00:09:46.740 Profiling data can be large since stack traces can be long.

00:09:53.520 As you can imagine, storing this data requires thoughtful management.

00:09:58.920 The first problem we solved was addressing the storage requirements.

00:10:05.760 Stack traces often have a lot of repetitive data.

00:10:12.600 By turning these traces into trees, we've found ways to compress the data, eliminating duplicated information.

00:10:20.220 For instance, if a file name appears multiple times, you need not repeat it in the data.

00:10:25.560 The symbol names can also have a lot of repetition.

00:10:32.460 We serialize these names and store a dictionary instead, like turning 'Net::HTTP.Request' into '0'.

00:10:40.260 This storage method conserves space and better accommodates high cardinality data.

00:10:46.620 The second concern becomes quickly retrieving stored data.

00:10:53.520 To accomplish this, we use something called segment trees.

00:10:59.760 Pyroscope collects data in 10-second chunks and sends them to the server.

00:11:06.480 When querying a day’s worth of data, these chunks must be merged.

00:11:13.920 Instead of just storing the data as 10-second chunks, we aggregate them at various granularities.

00:11:19.800 This way, we save time when processing queries.

00:11:26.520 For instance, instead of merging four 10-second blocks, you’d get one 40-second block plus a 10-second block.

00:11:34.680 This makes it much more efficient to retrieve the required data quickly.

00:11:41.160 Profiling is closely associated with latency, an important metric for companies like Amazon and Google.

00:11:47.880 For every millisecond in latency, they suffer revenue losses, hence the importance of profiling.

00:11:54.360 Amazon reported losing a billion dollars due to latency issues; this figure likely grows today.

00:12:01.440 Likewise, Google noted users being less likely to visit a site with increased load times.

00:12:08.160 In the case of ride-hailing apps, users tend to switch providers when one takes too long.

00:12:14.520 Profiling helps delineate application performance breakdowns, making it integral to the business.

00:12:20.760 Now, I’ll walk through a couple of practical examples.

00:12:28.500 The first is a real-world example demonstrating the ability to understand tags and how they break down profiles.

00:12:34.920 Imagine a simple rideshare application with three routes: one for car, one for bike, one for scooter.

00:12:41.400 We would tag these routes by vehicle node and also tag the regions they belong to.

00:12:49.500 With the corresponding data displayed, let's analyze the CPU utilization.

00:12:55.920 In this scenario, we may find that cars account for 60% of total CPU usage, while bikes and scooters

00:13:01.620 take up significantly less. Using Pyroscope, we can query each vehicle type to see how it performs.

00:13:08.160 We may find a region consuming a large amount of resources.

00:13:14.520 From this information, we construct a flame graph.

00:13:20.160 By comparing flame graphs, we can easily identify discrepancies—for example, if one region takes significantly more time than another.

00:13:26.040 This helps identify performance issues quickly to optimize resources.

00:13:32.520 Another example involves a view we adapted from Speed Scope, known as the sandwich view.

00:13:38.520 This allows us to identify commonly called functions, such as logging.

00:13:44.520 Often, logging can distribute its workload across many code paths.

00:13:50.760 In this view, you can access parent and child functions, helping you understand the logging process better.

00:13:57.240 This can lead to significant performance impacts.

00:14:03.600 Lastly, I want to share a personal story about the development of Pyroscope.

00:14:11.400 My colleague Dimitri and I were working at a company using Ruby when we realized the compression taking place.

00:14:17.400 At this company, we deployed the broadly package for compression.

00:14:23.880 Initially, we weren’t aware that the default level of compression is set to 11, which is the highest.

00:14:30.540 After running performance optimizations, we compared compression settings at levels 11 and 2.

00:14:38.040 The results showed that the maximum level of compression consumed drastically more CPU resources.

00:14:44.520 However, by lowering the compression level, we were able to save approximately 30%.

00:14:50.700 This allowed us to reduce the number of servers needed.

00:14:57.960 The only way we discovered this was through profiling, allowing us to analyze the performance implications.

00:15:03.840 Often, the most significant performance optimizations can come from reviewing functions commonly used throughout the code.

00:15:10.440 That brings us to a close on this talk.

00:15:17.520 I’d be happy to answer any remaining questions.

00:15:23.760 Thanks for attending, and enjoy the rest of your conference!

00:15:39.000 That's a good question.

00:15:42.600 The question was, sometimes if there’s a shift in your application behavior,

00:15:49.200 you may want to explore committing.

00:15:55.920 We often utilize tags internally.

00:16:01.260 For example, we tag commits to track which version lies behind each flame graph.

00:16:08.520 This helps understand performance changes over time.

00:16:14.640 Typically, you'll need some additional tools to correlate application changes.

00:16:20.160 In a real-world scenario, increased CPU usage could align with holidays.

00:16:27.030 This demonstrates how profiling can help deduce whether performance is expected.

00:16:34.200 Yes, we utilize Sidekiq in conjunction, running it as a separate application.

00:16:42.360 Although I can't showcase the actual code here, we track them as separate applications.

00:16:48.660 You can tag Sidekiq jobs if you wish, or set them as different applications for monitoring.

00:16:55.440 This method simplifies debugging Sidekiq worker issues.

00:17:03.000 With that in mind, regarding the example of compression,

00:17:09.240 we approached it as an A/B testing scenario.

00:17:15.480 In actual practice, you could compare a staged server versus a live server.

00:17:22.200 This allows for performance observation across different stages, providing insights.

00:17:28.560 Thank you once again. Feel free to ask more questions!

00:17:36.840 Have a great time at the rest of the conference!

RubyConf 2022

From beginner to expert, and back again

Michael Toppa

Change the Climate Before Changing the Weather

Ben Greenberg

Simulated Annealing: The Most Metal Algorithm Ever 🤘

Chris Bloom

Ruby Lambdas

Keith Bennett

Building Native GUI Apps in Ruby

Andy Maleh

In Defense of Ruby Metaprogramming

Noel Rappin

Pushing to master - adopting trunk based development

Dylan Blakemore

Improving the development experience with language servers

Vinicius Stock

A Tale of Two Flamegraphs: Continuous Profiling in Ruby

Ryan Perry

Everything a Microservice: The Worst Possible Intro to dRuby

Kevin Kuchta

1.5 is the Midpoint Between 0 and Infinity

Peter Zhu

Using JRuby: What, When, How, and Why

Charles Nutter

Building a Commercial Game Engine using mRuby and SDL

Amir Rajan

Writing Ruby, just not in English!

Ratnadeep Deshmane

This Old App

Lori M Olson

Never again without a contract: dry-validation

Espartaco Palma

Boutique machine generated gems

CJ Avilla

The Power of 'No'

Glenn Harmon

Ruby Central Panel

Allison McMillan, Chelsea Kaufman, Neil McGovern, Marty Haught, Valerie Woolard

Exit(ing) Through the YJIT

Eileen M. Uchitelle

Crocheting & Coding: they're more similar than you think!

Tori Machen

Keynote: The Case Of The Vanished Variable - A Ruby Mystery Story

Nadia Odunayo

Eclectics Unite: Leverage Your Diverse Background

Sijia Wu

Keynote: Lost in the Wilderness

Suzan Bond

How music works, using Ruby

Thijs Cadier

The Magnitude 9.1 Meltdown at Fukushima

Nickolas Means

Don't @ me! Faster Instance Variables with Object Shapes

Aaron Patterson

scip-ruby - A Ruby indexer built with Sorbet

Varun Gandhi

Building an education savings platform, with Ruby!

Tyler Ackerman

Static typing with RBS in Ruby

Gaurav Kumar Singh

Helping Redistrict California with Ruby

Jeremy Evans

Solidarity not Charity and Collective Liberation

Mae Beale

What does "high priority" mean? The secret to happy queues

Daniel Magliola

RSpec: The Bad Parts

Caleb Hearth

Data indexing with RGB (Ruby, Graphs and Bitmaps)

Benjamin Lewis

Bending Time with Crystal: 6 hours to 15 minutes

Paul Hoffer

Splitting: the Crucial Optimization for Ruby Blocks

Benoit Daloze

Building Stream Processing Applications with Ruby & Meroxa

Ali Hamidi

Boot the backlog: Optimizing your workflow for dev happiness

Stacey McKnight

Business in the Front, Party in the Back (Office)

Kevin Whinnery

Analyzing an analyzer - A dive into how RuboCop works

Kyle d'Oliveira

Ruby Archaeology: Forgotten web frameworks

Nick Schwaderer

Ruby's Core Gem

Chris Seaton

I'm in love with Mermaid

Carolyn Cole

Discover Machine Learning in Ruby

Justin Bowen

Staff Engineer: “Here be dragons”

Alexandre Terrasa

Working Together: Pairing as Senior and Junior Developers

Kelly Ryan