Performance
Heaping on the complexity! An adventure in GC Compaction

Summarized using AI

Heaping on the complexity! An adventure in GC Compaction

Matt Valentine-House • September 08, 2022 • Mie, Japan

In this talk, Matt Valentine-House, a Ruby infrastructure team member at Shopify, shares insights into the complexities surrounding garbage collection (GC) compaction in Ruby, specifically on the advancements made for the variable width allocation project leading up to Ruby 3.2.

The presentation is structured into four main sections:

1. Background on Memory Management and Garbage Collection:

- Introduction to Ruby's memory organization prior to version 3.1, focusing on how objects are stored in the Heap using fixed-size slots and how extended objects utilize malloc for larger sizes.

- Explanation of cache performance and data locality, emphasizing the need for efficient memory operations to minimize CPU cache misses.

  1. Variable Width Allocation Project:

    • Discusses the project’s aim to improve data locality, which is crucial for Ruby’s performance, and how to embed larger objects within the Heap to reduce allocations and subsequent cache misses.
    • Initial promising performance results from attempts to modify the Heap structure.
  2. Garbage Collection and Compaction:

    • Overview of Ruby's tracing GC, its phases, and incremental GC techniques designed to reduce pause times.
    • Introduction to memory compaction, its importance, and how it was affected by the introduction of multiple size pools in Ruby 3.1.
    • Discussion of challenges faced during compaction, particularly managing flags across pools and ensuring efficient processing without delays.
  3. Solution Implementation and HeapVis Tool:

    • Description of the iterative approach to finding efficient solutions, culminating in a new algorithm allowing objects to dictate their optimal placement within memory pools, resulting in fewer traversals and better memory management.
    • Introduction of 'HeapVis', a visualization tool to help understand complex Heap behaviors, ultimately improving teamwork and debugging processes.

Conclusions:

- The improvements made will not only enhance Ruby’s performance but also streamline GC and memory allocation processes for developers.

- Looking toward the release of Ruby 3.2, these advancements reinforce Ruby’s standing as a robust programming solution.

- Valentine-House expresses gratitude to the RubyKaigi organizers and the team effort involved in this project, underlining the collaborative spirit within the Ruby community.

Heaping on the complexity! An adventure in GC Compaction
Matt Valentine-House • September 08, 2022 • Mie, Japan

RubyKaigi 2022

00:00:01.860 Hello! My name is Matt Valentine-House and you can find me on the internet as 8bit Raptor. I work for Shopify on the Ruby infrastructure team.
00:00:06.620 Today, I’m here to discuss a problem that my team and I introduced during a project we were working on. I’ll also share the various ways I attempted to fix it, the challenges I faced, and how I ultimately succeeded.
00:00:17.760 This talk is divided into four rough sections. First, I will provide some background on memory management, garbage collection, and the variable width allocation project. It may come off as a bit of a brain dump, but bear with me; I hope we’ll develop a shared understanding that will help us grasp the rest of the discussion.
00:00:42.480 The second and main part will introduce the problem that emerged, the journey I undertook to resolve it, and the follow-up work that fixing this problem enabled. Then, I’ll summarize the current state of our variable width allocation project and outline what it will look like when version 3.2 is released in December.
00:01:06.420 Finally, I’ll briefly introduce HeapVis, a small tool I created as part of this project, which visualizes Ruby's memory space and aids in debugging the issue I’m going to discuss.
00:01:14.220 Let's begin our exploration by talking about how Ruby organized its memory prior to version 3.1, when we introduced variable width allocation.
00:01:18.540 In Ruby, objects are stored in a data structure known as the Heap. The Heap is divided into pages, with each page having a fixed size that contains a number of fixed-size slots for Ruby objects. When Ruby programs create objects, they are placed into these slots.
00:01:39.360 Ruby uses a structure called 'RValue' to contain all these objects, where each Ruby object can be represented as an RValue. My colleague Gemma discussed this in her talk before lunch, but I’d like to emphasize how Ruby manages differently sized objects. For example, the string 'Hello RubyKaigi, thanks for having me' will clearly require more memory than just 'Hello'.
00:02:02.939 To handle this, Ruby employs the concept of embedded and extended objects, which are allocated in the program Heap. When our Ruby program runs, the operating system provides a contiguous chunk of memory, split into two regions: the stack and the Heap. We won’t delve into the stack since our focus is on dynamic memory allocation occurring in the Heap.
00:02:24.180 Memory allocation in C programs, like the Ruby interpreter, utilizes functions from the C standard library: malloc, which allocates memory, and free, which deallocates it, returning the memory to the Heap. When I say the Ruby VM allocates its Heap at startup, I mean it uses some memory from the program Heap and structures it into a data format for holding objects, which is also referred to as a heap.
00:02:52.620 This distinction can be confusing but is important to note. Returning to Ruby's treatment of embedded or extended objects, let's quickly run through an example.
00:03:06.599 The simplest case is when we allocate a short string, 'Hello'. The RValue types contain a small amount of metadata, and the remaining slots can be utilized freely if there’s sufficient space. Objects fitting neatly into the slot do not require further considerations. However, when an object is too large for these embedded slots, Ruby uses malloc to allocate memory in the program Heap.
00:03:40.320 The data is stored in this newly allocated space, and a pointer to it replaces the direct reference. The metadata of the object is modified to indicate that it’s no longer an embedded object. This model of fixed-size slots and extended slots is convenient, relatively simple to implement, and enables quick operations over Heap pages.
00:04:22.020 However, there is a trade-off related to CPU performance. Data resides at various locations, with access times increasing the farther it is from the processor’s core. CPUs possess multiple cache levels, where level one is the smallest and fastest, while RAM is larger but comparatively slower.
00:04:35.400 Cache misses—when data isn’t found in the level one cache—require fetching data from the next closest storage. This process aims to cache data nearer to the CPU. Poor cache performance impacts program efficiency, so optimization focuses on ensuring that related data is laid out contiguously in memory.
00:05:05.940 The distance between related data is referred to as data locality, crucial when discussing Ruby's object allocation. Pointers in RValues to program Heap are known to exhibit poor locality, inciting more reads to bring objects closer to the CPU during operations. At Shopify, we've been focusing on a project called variable width allocation, aiming to enhance Ruby's performance by improving data locality to decrease CPU cache misses.
00:05:47.760 We aim to modify the Heap for embedding larger objects, thus minimizing Heap allocations and reducing subsequent cache misses during access. In the following section of the talk, I will delve into the implementation of variable width allocation, and preliminary performance results have been promising.
00:06:50.760 Before we move on, we should briefly cover garbage collection (GC). Ruby utilizes a tracing GC that accumulates live object references through a graph of references. In this context, a reference occurs when one Ruby object depends on another, keeping a trace of the relationships.
00:07:19.140 The garbage collector operates intermittently during the program's lifecycle, automatically invoked in scenarios like running out of Heap slots. When triggered, GC halts all user code execution—a process known as a stop-the-world collector.
00:07:44.040 The GC functions in two phases: marking, where of all reachable objects from known roots; and sweeping, where unmarked objects are swept from the Heap, freeing associated memory for reuse. Ruby's GC framework maintains generational collections for efficiency—allowing older and younger objects to be collected in different intervals.
00:08:09.360 Incremental GC marks objects over several cycles, attempting to minimize long GC pauses and yield more predictable performance throughout program execution.
00:08:26.580 Lastly, Ruby's GC incorporates compaction, which addresses memory fragmentation caused by repeated allocation and sweeping. Compaction restructures memory, relocating objects to eliminate gaps and maximize space density, benefiting application performance by minimizing wasted memory and maximizing cache performance.
00:09:04.800 The two-finger algorithm facilitates this memory restructuring. It scans the Heap twice; first for movement, then for updating references. When the compact cursor meets the scan cursor, the process concludes with a final scan that refines the object references throughout the system.
00:09:53.090 Some objects cannot be moved; they are pinned during the marking phase, typically from C extensions. Compaction was first introduced in Ruby version 2.7 but required manual triggering. Ruby 3.0 enabled auto-compaction but required tighter integration across GC phases.
00:10:31.140 Notably, with the introduction of size pools in Ruby 3.1, managing multiple Heaps shifted the complexity of compaction. Initially assumed to be a single Heap, the system now comprises multiple variable width allocation pools instead.
00:11:08.580 This brought forth challenges, including the need to manage flags across multiple pools and ensuring all size pools underwent compaction without halting prematurely due to the first cursor meeting.
00:11:54.900 My first solution was straightforward—scope compaction context to size pools so that each Heap knew its own compaction status. However, this solution was inefficient, requiring a time-intensive process of multiple sweeps for interdependent object allocations across Heaps.
00:12:39.660 In the second attempt, I attempted to decouple the movement and reference update steps, allowing for more streamlined operations. However, this introduced added complexity and didn’t effectively address the core issues.
00:13:26.959 After extensive brainstorming with my team, the solution emerged—a reverse approach where objects indicate their desired arrangements. This innovative thought allowed continued effectiveness within the constraints of moving individual objects for optimal memory allocation.
00:14:20.280 In my final implementation, I structured the algorithm such that upon compact cursor engagement, each object recognizes its rightful allocation without unnecessary displacement. This approach maintained efficiency by eliminating excessive shifts amongst the pools.
00:14:52.440 With my changes, objects were empowered to identify which pools best suited their new placement, reinforcing better memory management overall. The adjustments detected within the Heap required fewer traversals and permit overheads.
00:15:25.260 Final optimizations and testing have led to enhanced efficiency within the compacted space between pools. User-friendly adjustments now allow string movement, employing the improvements effectively while sustaining the performance durability expected from Ruby.
00:16:00.600 The introduction of 'HeapVis' aimed to further elucidate the algorithmic changes implemented for both personal and collaborative advantage. Visualization aids facilitated understanding around the complex mechanisms of Heap management.
00:16:32.460 In summary, the amendments outlined promote higher efficiency while retaining user-friendly operational processes for developers to address garbage collection and memory allocation through Ruby. I commend all efforts from the team, as those consolidating actions have markedly advanced successful content movement barring confusion, melting into coherent frameworks.
00:17:07.980 Looking forward, as Ruby 3.2 is prepped for release, strides made will surmount upon previous foundations established, solidifying Ruby's validation as an exceptional programming resource while framing collaborative exploration moving forward.
00:17:45.000 Finally, my sincere appreciation extends to RubyKaigi organizers for the opportunity to share this journey, acknowledging the extensive teamwork and dedication that contributed to this success. Thank you all for your attention!
Explore all talks recorded at RubyKaigi 2022
+33