Keynote: Exploring Memory in Ruby - Building a Compacting GC

Summarized using AI

Keynote: Exploring Memory in Ruby - Building a Compacting GC

Aaron Patterson • October 13, 2017 • Selangor, Malaysia • Keynote

In this keynote presentation titled "Exploring Memory in Ruby - Building a Compacting GC," Aaron Patterson, a key member of the Ruby and Rails core teams, discusses the intricacies of garbage collection (GC) and memory management in Ruby. The talk is structured around three main topics: copy-on-write (COW) optimizations, building a compacting garbage collector, and memory inspection tools in Ruby. Patterson underscores the importance of a compacting GC in reducing memory fragmentation and improving memory usage. He explains COW optimizations, which allow memory to be copied only when it's necessary, demonstrating this with examples in Ruby, particularly with strings and arrays.

Key Points Discussed:
- Introduction to Garbage Collection: The significance of GC in managing memory usage in Ruby applications.
- Copy-On-Write Optimizations: COW allows duplicate objects to share the same memory until one is modified, minimizing unnecessary memory usage. This is illustrated through Ruby examples and comparisons to how operating systems handle memory.
- Challenges with Forking: Patterson explains how using forking in web servers like Unicorn can lead to excessive memory overhead if COW optimizations are not utilized properly, emphasizing that it’s crucial to load shared libraries only once to conserve memory.
- Compaction Techniques: He introduces the concept of compaction, where live objects are rearranged to reduce fragmentation. Patterson describes a specific technique, two-finger compaction, outlining its advantages and complexities. He emphasizes that while compaction might seem slow, its effectiveness in merging free spaces can significantly alleviate memory problems during forking.
- Implementation Challenges: The talk covers technical challenges faced during implementation, including issues with hash keys, dual references, and string literals affecting memory allocation and management. Patterson shares how these obstacles can be mitigated through careful coding practices and heuristics.
- Memory Inspection Tools: Patterson also highlights the importance of memory inspection tools in evaluating GC efficiency and the effectiveness of COW optimizations, demonstrating a simple method using Ruby's object space capabilities.

In conclusion, Patterson motivates attendees to challenge the perception of impossible tasks in programming, proving through his work with GC compaction that innovative solutions can be formulated by questioning assumptions and exploring new ideas in the context of Ruby memory management. He advocates for community engagement and provides an open invitation for discussions, emphasizing collaboration in overcoming programming challenges.

Takeaways:
- Understanding and optimizing memory usage is critical in Ruby, especially in web applications.
- Compaction can be realized practically, despite perceived limitations, allowing for improved performance in memory management.
- Encouragement to experiment with perceived problems, fostering a mindset of innovation and creativity within coding practices.

Keynote: Exploring Memory in Ruby - Building a Compacting GC
Aaron Patterson • October 13, 2017 • Selangor, Malaysia • Keynote

Speaker: Aaron Patterson (@tenderlove)

Website: http://rubyconf.my

Produced by Engineers.SG

RubyConf MY 2017

00:00:06.040 ah I'm only a little nervous no I'm very
00:00:11.889 nervous it's so sad I'm the last presenter I
00:00:17.410 feel so sad at the end of conferences because I have such a good time watching all the talks and now I am the last talk
00:00:24.300 but fortunately I am on stage so I don't
00:00:29.800 get to enjoy my talk you all get to enjoy my talk so let's I guess we should
00:00:35.860 get this started um I am giving a talk called exploring memory in Ruby or building a compact and
00:00:41.170 garbage collector my name is Aaron Patterson my nickname on most websites
00:00:47.350 if you don't know already is tenderlove I'm sorry if you don't recognize me in
00:00:53.950 person you might recognize my icon which is this I used to have much longer hair
00:01:00.370 and it wasn't so gray just kidding into wig so I'm on the Ruby
00:01:06.700 core team and the rails core team so if you want to talk to me about any Ruby stuff or rail stuff I'll talk to you
00:01:12.610 about either one of those I love Ruby and I also love Rails so I will tell you anything you want to know about any of
00:01:18.700 those well if I know the thing if I don't I'll tell you what to Google for I
00:01:24.280 work for a small startup company called github we're an online code hosting
00:01:33.790 website haven't heard of us anyway I like to use I like to use git at work I
00:01:40.119 really enjoy using it but I will not force push it on you I have two cats and
00:01:52.420 they like to help me write code this is one of them this is Gorbachev puff-puff thunderhorse he's the one on the right
00:02:02.130 this is my other this is my other cat choo choo her full name is SeaTac Airport Facebook YouTube Instagram just
00:02:12.519 kidding so I noticed that a lot of people at the first on the first day
00:02:19.340 me a lot so I decided to quote myself which is that I love my cats I also like
00:02:26.360 selfies so please come take selfies with me and in fact I recently started doing
00:02:33.769 double selfies which if you I'm talking
00:02:40.940 about you I recently don't started doing double selfies and you might be wondering what
00:02:46.220 is that and that's well it's when two people with two selfie sticks take selfies of each other and this is
00:02:52.010 something that Jimmy and I have been doing this is us at red dye Ruby conference and this is us again here at
00:02:59.110 Ruby Kai yeah so I want to tell all of
00:03:04.129 you like please please come say hello to me after this talk at the after party
00:03:10.579 tonight any time please come talk to me I know like some people have told me Erin you're famous I'm afraid to talk to
00:03:16.760 you please but don't be I will not bite I like to talk about programming or anything if
00:03:22.700 you don't know what to say to me I do have stickers these are they are
00:03:29.660 stickers of my cat so I will give you a sticker of my cat or if you want to take
00:03:38.720 a selfie we can do that too so I just want to say thank you thank you to the organizers thank you so much
00:03:44.090 for having me here thank you to all of you for coming to this conference I'm really happy to be here it's so good to
00:03:50.569 see where the magic happens
00:03:58.840 I also enjoyed Tim pointed this out to me that apparently that people here are
00:04:06.340 really into security especially cybersecurity makes me laugh alright so
00:04:19.139 we're gonna talk about today we're gonna talk about a mark and compact garbage collector this is something that I've
00:04:25.120 been working on at work oh when I have time to do it this is interested me for a long time
00:04:30.190 because a compacting garbage collector can help us reduce fragmentation and
00:04:35.979 memory usage in our programs and one of the things that I want to focus on is talking about copy-on-write
00:04:42.210 optimizations that we can get from a compacting garbage collector so we're gonna talk a little bit about GC and
00:04:48.370 memory and Ruby so we're gonna have three different topics today we'll talk
00:04:54.039 about copy-on-write optimizations next we'll talk about building a compacting
00:05:00.639 GC and MRI and then we're gonna talk a little bit about memory inspection tools in Ruby as well this talk is a bit of a
00:05:09.100 low-level talk but I want everyone here to be able to get something out of the
00:05:15.370 presentation so for new people what I would like you to do if there are new new developers in the audience is just
00:05:21.880 think about like think about these questions as were as we're going through the presentation like what is what
00:05:27.580 exactly is copy-on-write what is compaction and the thing that I want you
00:05:33.070 to know is like maybe you don't understand all of the presentation but when you're sitting at work some day and
00:05:39.220 you're thinking oh I've heard about this problem or I know I know about this problem I saw it in a presentation I
00:05:46.690 know what I can search for online I want you to know that you you can solve the problem eventually I just want you to be
00:05:51.880 able to remember these things and say well other people have come across this so I can I can probably fix it - for
00:05:58.810 experienced people if you already know about these things from a high level perspective I want you to pay attention
00:06:05.169 to the algorithms and implementation details that I'll talk about so look at the actual algorithms and the
00:06:12.679 different implementation details and maybe tonight we can talk about the downsides of the work that I have done
00:06:20.019 so first let's talk about copy and write optimizations or as it's commonly known
00:06:25.699 as Co W or cow which I will refer to it as cow from now on so I enjoy that or I
00:06:33.199 may use copy and write to but if you see this term it's basically the same thing just a shortening so what is what is
00:06:40.279 this optimization and very simply what it is is it means that we don't copy
00:06:46.099 something until it's been written to so let's look at a concrete example of this
00:06:51.199 optimization we'll look at it in Ruby so here is an example of cow optimization
00:06:57.169 using Ruby you'll see here that first we create a very large string a very large
00:07:03.529 Ruby string and if we take a look at the size you can see that the initial size is over 9000 yes
00:07:22.429 all right I'm wired now why sound much
00:07:28.409 better so this one's over 9000 and if we
00:07:33.959 dupe the string if we call dupe on the string and then we check the size you can see that it's actually 40 the copy
00:07:39.749 is size 40 so we didn't actually copy the string we didn't copy the underlying string it's it's much smaller than the
00:07:46.529 original however if we write to the string you can see on the next slide we write to the string if we check the size
00:07:51.659 after we write this write to the string you can see that now it's of size the
00:07:56.819 same size as the previous string so we copied the string when it was written to in other words we had a copy on right so
00:08:03.899 this is a copy and right optimization and in in Ruby itself with just strings and you can observe the same behavior
00:08:11.489 with arrays so the array starts out large the dupe is small and then when we
00:08:17.099 write to the array it gets big again unfortunately we don't have the same behavior with hashes if you call dupe on
00:08:24.059 a hash it'll actually copy the entire hash so maybe we can fix this particular case but I'm not sure if you want to
00:08:32.729 contribute to Ruby maybe this is a good place to start looking but the interesting thing about this
00:08:38.579 optimization is that essentially there is no observable difference between a copy and the original right if you copy
00:08:46.709 something and you look at both of them they should be exactly the same there's no difference between the two of them so
00:08:52.230 the basis of this optimization is that we don't actually have to do that copy until one of them changes now this
00:08:59.579 optimization isn't limited to just Ruby in these strings it's you can use this anywhere you want to just think about
00:09:05.639 this optimization when you're doing any any type of stuff and one place that this optimization is used frequently is
00:09:12.059 within your operating system so we use this inside of operating systems and we
00:09:17.250 use this a lot when we use the command for we heard a little bit about this yesterday when you fork a process the
00:09:24.689 operating system makes a copy of that process and the operating system uses
00:09:29.850 the copy and write optimization so in this particular case we have a parent process where we create
00:09:35.430 string our initial string is again nine over 9,000 now when we fork the process
00:09:43.470 we make a copy of the parent process into the child process but it doesn't actually copy anything just creates a
00:09:49.649 new process now if we write to that string down here the operating system
00:09:55.140 will actually copy it it'll copy that string but the operating system is a bit smarter than Ruby is it doesn't copy the
00:10:02.370 entire string if we look at how the operating system actually works it splits that string up to into many pages
00:10:09.330 so we have in our parent process we we split up memory into multiple pages and
00:10:14.610 each of those pages is about four kilobytes now when the child process is created
00:10:19.800 all the child process does is point back up to the parent process so it's just
00:10:25.080 pointing at the parent processes memory now when we actually write into that string the child profit the operating
00:10:31.770 system copies the memory from that parent process into the child but it only copies the section that's been
00:10:37.830 written to so in this case we we wrote to the string and only maybe this page
00:10:43.020 gets copied so we wrote into that string there so we write into the middle of
00:10:49.470 string and one page gets copied and in this case when that page gets copied that's what's called a copy on right
00:10:55.980 page fault so if you want to observe where this happens in your programs you need to look for copy-on-write page
00:11:02.040 faults so in general if we want to do better with cow optimizations we want to
00:11:09.959 decrease the number of page faults that we have in our programs so you may be
00:11:15.000 asking why is this so important to me the reason this is important to me is because at work we use forking a lot in
00:11:22.200 fact our web server we use unicorn as our web server and unicorn is a forking
00:11:27.450 web server meaning that creates a parent process and then Forks off many child processes and those child processes
00:11:33.779 handle requests so we fork lots of processes at work so this this optimization is pretty important to me
00:11:40.170 now when unicorn Creek creates its children tiled processes does something
00:11:46.170 like this say we have a unicorn parent Forks off many children now let's imagine that the reason let's
00:11:55.100 imagine that each of these child processes loads rails in now what that
00:12:00.410 means is unfortunately each of these child processes would contain their own copy of rails and this is a bad thing so
00:12:06.350 we don't want to have this we don't want to have rails loaded three times into memory why should we waste all of that
00:12:11.870 space so in this particular case we can take advantage of copy and write
00:12:17.240 optimizations the way we do that is we have the parent process load rails then
00:12:22.610 we fork off a bunch of child processes then those child processes now point at
00:12:27.620 the parent processes copy of rails and we only have one copy in memory so the advantages of this are that we reduce
00:12:34.639 boot time because we only have to load rails once we decrease memory because we're only keeping one copy in memory at
00:12:41.300 a time now don't freak out this is how your application works today if you're
00:12:46.879 using unicorn so this is one way that we use copy and write optimizations with
00:12:53.089 our operating systems on a day to day basis now this particular optimization isn't the one that I'm interested in but
00:12:59.899 it's one that you can use at work today what I'm interested in doing is reducing
00:13:04.939 page faults and as I said earlier page faults are something that happens when you write you write to a bit of memory
00:13:12.379 and then that page needs to be copied into a different process now what causes page faults page faults are caused by
00:13:19.939 mutating shared memory any time we have a piece of shared memory and that gets mutated we create a page fault and the
00:13:27.170 biggest offender of this particular issue mutating shared pages is actually
00:13:32.269 the garbage collector so let's go over some different things that the garbage collector does to ruin our route our day
00:13:40.149 it used to be that unfortunately the mark phase used to cause us this issue
00:13:45.529 now when we went through and marked all the objects that mark bit was actually stored on each object what would happen
00:13:52.100 is in older versions of Ruby when we would mark a child when we would mark an object it would mark a bit and
00:13:58.389 unfortunately that meant that the object had to be to the child process so each one of
00:14:05.510 these as the GC went through it would mark a bit on the object and that object would have to get copied into the child
00:14:10.760 process and that meant after a few GCS the entire process was copied between
00:14:16.400 the two now unfortunately this is even worse than it looks because remember the
00:14:22.640 operating system copies one page at a time so it didn't copy one object at a time in fact what would happen is when we
00:14:29.300 marked an object maybe it would copy the entire page so the whole thing would be copied
00:14:34.520 not just the objects that got marked now the good news is that we introduced a
00:14:39.650 technique called bitmap marking and what this does essentially is instead of keeping that marked bid on the object
00:14:45.530 itself we keep a table so when the when the GC goes to mark an object all we do
00:14:50.930 is we set a bit in a bit table that's stored off to the side now that bit table does get copied between processes
00:14:57.230 but the bit table is much smaller than the rest of the memory that we have so we save overall so this is how this is
00:15:04.760 how the garbage collector works today so this is a good optimization for cow improvements now another problem is
00:15:12.980 object generation the actual generation of the object ruby has a generational
00:15:19.250 garbage collector now what happens is today the age of the object is stored on
00:15:24.890 the on the object itself so every time an object survives a garbage collection we increase the age of the object and
00:15:30.890 that age is stored on the object and that means that we end up in a similar situation as the marked bit mark bit
00:15:36.650 phase meaning that well when this object gets older we have to write to that
00:15:41.810 object and it's going to get copied to a child process so we actually have a workaround for that that's one that we
00:15:47.270 use at work and it was called Nakayoshi fork and essentially the this gem works
00:15:52.610 on the following principles the principles are that generations are bounded that we only have a certain
00:15:58.850 number of generations so an object can only get so old before it continues to age further once an object is considered
00:16:07.550 as old it won't get any older essentially so what we can do is we'll say well you know if an object is
00:16:14.930 created before we call fork we're pretty sure that it's going to get old eventually so we assumed that
00:16:21.800 well all the code that we loaded all the objects that we created if they survived
00:16:27.350 before we do a fork they're probably going to get old so what we can do is we'll say okay before you fork let's
00:16:33.530 just G see a bunch of times we'll just run the garbage collector and then make everything old before we fork so we just
00:16:39.290 G see a bunch and if you look at the source of the Nakayoshi Nakayoshi fork Jim it'll look a little bit like this
00:16:46.430 and you'll see here I've simplified the code a little bit but you can see that essentially all it's doing is its
00:16:51.650 monkey-patching fork and it says okay I'm just gonna GC a bunch of times four
00:16:57.200 times in fact and then just call the real fork that's essentially all it does so if you're running a forking web
00:17:03.500 server and production I recommend using this gem now the last thing is in fact
00:17:08.570 object allocation last problem with the GC would be object allocation and this
00:17:13.970 one is kind of surprising but just allocating an object can cause memory to
00:17:19.310 be copied from the parent process to the child process so for example we have in
00:17:24.980 this case two processes a parent process and a child process the child points at the parent but we have some free space
00:17:32.090 you'll see we have some free space in these pages and the child process may allocate an object so we'll allocate an
00:17:39.140 object there and that means that that object has to be copied or that section has to be copied and maybe unfortunately
00:17:47.090 remember the whole page gets copied so maybe the operating system copies the entire page so now we have this
00:17:53.150 unnecessary amount of memory copied from the parent process to the child so how can we reduce the space or how can we
00:17:59.840 how can we reduce the amount of data that gets copied and the way we do that
00:18:05.000 is with a technique called GC compaction oh oh what is happening to my slides GC
00:18:15.470 compaction all right so we're gonna discuss GC compaction so for example let's say before we fork we have two
00:18:23.780 pages that look like this and you can see we have some free space on those pages what if we were to
00:18:29.400 those objects and move them around such that all the free space was grouped together and all the non free spaces
00:18:36.420 grouped together now we fork a process or fork a child the child points at the parent process
00:18:41.940 and maybe we allocate some objects right you can see from this graph here that
00:18:47.550 maybe only one page gets copied so this only this one page with the free space
00:18:53.850 gets copied and the other one will not so before in the worst case scenario it could be that both pages get copied
00:19:00.180 because both pages had free space if we compact before we fork then we only copy
00:19:05.340 one page so half of the memory gets copied so let's talk about GC compaction
00:19:10.920 itself how it works and what it is and this is a project that I've been working on at github and I at work and I want to
00:19:17.910 talk about how it works so we have a fork of Ruby that has compaction built in so what is what is compaction it's
00:19:26.190 essentially just what I showed before compaction is taking the objects from moving the objects around such that all
00:19:32.760 the free spaces grouped together and all the non free spaces grouped together that's it so why would we compact
00:19:38.870 there's two main reasons for doing this reducing memory usage two main reasons
00:19:44.880 why I wanted to do this the one is to reduce memory usage and the other one is because people told me that it was
00:19:52.860 impossible to do I actually talked about this I talked about moving objects
00:19:59.130 around at one or a different GC technique at rubyconf your year or so
00:20:05.130 ago and I said oh the reason that we're gonna do this optimization is because GC
00:20:10.200 compaction is impossible and unfortunately I think I had been talking to J Ruby developers too long they told
00:20:18.300 me this was impossible to do and after my presentation koishi came up to me and
00:20:23.310 said why is it impossible and I said I don't know people told me it's
00:20:30.990 impossible and he said well maybe you should try it it's impossible so I said
00:20:36.720 okay well I'll try it then and it turns out it is not impossible it is just hard but we'll talk about how we're going to
00:20:43.020 talk about specifically how to do it with MRI on how you can do it too so since I didn't have a good answer to
00:20:49.980 why this was impossible I had to do it so there's there's many different compaction algorithms and we're going to
00:20:56.190 talk about the one that I chose which is called two finger compaction and you'll
00:21:03.210 see why it is called two finger compaction and these algorithms have different advantages and disadvantages
00:21:09.570 but first I want to talk about the disadvantages of two finger compaction unfortunately a two finger compaction it
00:21:16.200 is is slow this is not a good attribute the other the other attribute is that it
00:21:23.370 moves objects to random places in memory and we should talk about this
00:21:30.510 disadvantage tonight over beers or not beers if you don't drink just we should
00:21:38.100 talk about it this is bad compared to other bad compared to other object or other compaction algorithms because they
00:21:43.830 don't move them to random places now this algorithm does have one extremely
00:21:49.080 good advantage and that advantage is that it is easy this is the thing that
00:21:58.350 drew me to this algorithm I like easy things so that is why I chose this one
00:22:03.750 so the algorithm has essentially two main parts we can divide it into two parts with one is object movement and
00:22:10.890 the other is reference updating so object movement essentially what we do is we say okay we have a bunch of
00:22:18.270 objects here and we're going to move them around and I'm going to describe to you exactly how this algorithm works basically what we do is we take two
00:22:24.840 fingers and point them at either side of the heap so imagine that this is our this is our heap in memory we have some
00:22:31.080 free spaces and some objects that are that are in memory so we point at either side of the heap we have one pointer
00:22:37.650 that we call a free pointer and then we have another point where we call a scan pointer and this is why we call it two finger compaction because we're pointing
00:22:44.250 at either side now the free pointer moves to the right until it finds a free space and the scan pointer moves to the
00:22:51.360 left until it finds something that's filled in this case they don't have to move because where they started
00:22:56.700 the free pointer started on a free space the Scan pointer started on a filled space so they don't need to move now
00:23:03.150 what they do is when they when they stop they swap those two spaces like so so we
00:23:08.790 swapped the object over to the free space and the free space back over and then we leave a forwarding address in
00:23:15.030 the previous one so this object that was in B is now in slot 1 so we're gonna
00:23:20.670 leave a forwarding address of 1 there so we know that whatever was in B what ever
00:23:25.890 used to be in B is now in slot 1 then we repeat scan free pointer moving to the
00:23:31.860 left scan pointer moving to the right then they swap leave a forwarding address and
00:23:36.960 we repeat this process until the two fingers meet yes animations yes so they
00:23:47.100 met and they're done so we've compacted this all of the free spaces in one side and the objects from it on the other the
00:23:54.900 next step is reference updating so imagine that we have this bit of Ruby code here at the bottom some hash table
00:24:00.660 the hash table points at a symbol and a string and maybe it's laid out in memory
00:24:05.670 like this where the hash table is at 4 and the symbol and stringer at 6 and 7 now after we've compacted the e the heap
00:24:12.360 it may look something like this where the string is now at position 3 and the
00:24:18.840 symbol is now position 5 but the hash points at these old locations so we have
00:24:25.080 to update these arrows to point at the correct location and the way that we do that is fairly simple we just start out
00:24:30.840 on one side we look at the object we say hey object do you have any references if so we look at the references and see if
00:24:37.650 they point to a forwarding address if they do we update the address so and then we move to the next object so those
00:24:43.350 ones don't have pointers this one does so we update it to now point it change
00:24:48.930 from 6 and 7 to 5 and 3 like that and now points at the right locations again
00:24:54.870 and we continue moving along the heap until we've exhausted all the objects
00:24:59.970 and we're done so we've updated we've updated all the references then the very
00:25:05.880 last step is we change these forwarding addresses to free slots again like that and worked on we've compacted the heap
00:25:13.350 all of the objects are in one place all the free spaces in another and we've completed this completed this process so
00:25:20.610 the next thing I want to look at is actual implementation details so I want to walk through the actual code used to implement this and you can find the code
00:25:27.270 here up on github shameless plug at this at this address
00:25:35.100 if you go there you can check out the code what this code does is it introduces a new method called GC
00:25:40.260 compact so this compact this method allows you to manually compact the objects in your heap and the way that we
00:25:46.860 use it at work is we say okay well we have a bit of code like this we're gonna load this is just pseudocode we say okay
00:25:52.590 load all of rails and it's dependencies so we load all of rails and it's dependencies here we load all of our
00:25:58.890 application code then we compact the heap and then we GC or then we call a
00:26:05.640 fork for all of the mote go back then we call fork for all of the child processes
00:26:10.890 all the unicorn processes so this isn't what it actually looks like in our code base but we've hooked into unicorn to do
00:26:16.950 the GC steps the compaction at the right place so let's look at the actual
00:26:22.590 changes to GCSE all the since this is all to do with the garbage collector and
00:26:28.380 GC dot C contains all the GC code that's where most of my changes had to be so I
00:26:34.890 introduced a new function called GCE move and all this function does is swap
00:26:40.320 the two locations swap two locations and set a forwarding address so it takes to
00:26:45.840 address two objects and swaps the two the next thing that I had to do is
00:26:51.540 introduce a new type of object called T moved the team moved T moved object is
00:26:57.300 hidden to normal Ruby developers it's a internal internal only object and all
00:27:02.460 this t move does t moved is right here that is a T moved all it does is indicate that this is a forwarding
00:27:08.730 address it's a way for when you're scanning the scanning the Ruby memory to say hey oh this is a forwarding address
00:27:15.180 I know that I need to change and move it the other function that I introduced is
00:27:20.280 a new one called GC compact heap and all this function does move the pointers left and right so GC
00:27:27.820 compact heap moves the pointers and then calls GC move those are the two main
00:27:33.580 functions that actually do the compaction step the next step is a function I introduced called GC update
00:27:39.310 object references and it does exactly what it says it just says okay if we have these this objects sitting at one
00:27:46.090 and it pointed it at one we need to change we need to update the address that should say two that is a typo but
00:27:53.140 it should point it to now now this function actually has to call out to a bunch of different functions reference
00:28:00.010 update helpers because the way that we update references is different for arrays or for objects or hashes or any
00:28:06.340 type of object that you might create in Ruby we have to the way that we actually update those references is different the
00:28:12.130 way we find an update references is different now I had to introduce one other thing and this is a sad thing and
00:28:17.650 we're going to talk about why it's sad this is a thing called pinned bits this is a this is a table called pin bits and
00:28:24.490 what this does is it says okay this object cannot move in certain cases
00:28:29.830 there are objects that can't be moved and we need to be able to indicate that somehow and we I do that with a table called pin bits so the question is you
00:28:37.210 know this is a limitation of the compaction is that some objects can and can't move but okay let's up to now
00:28:44.410 let's talk about let's make a list I want to make a list of objects that can move so far this algorithm we've
00:28:50.680 discussed everything can move so we have our list here this is good it's great every object in our in our
00:28:57.520 system can move now we have a problem here there's a challenge with compaction is that we need to be able to find
00:29:03.040 references so I need to be able to look at an object and say okay what other
00:29:08.500 objects does this object reference because I need to update those references now this is really easy for a
00:29:14.440 pure Ruby pure Ruby references so here's an example of a ruby object that it's a pure Ruby object just regular Ruby
00:29:21.220 object and if we look at an object graph we can see that we have an instance of
00:29:26.320 foo and it points at an instance of bar and it does that via an instance variable called at bar and both of these
00:29:33.490 objects are implemented in Ruby Ruby itself implements these these objects knows the internals of these objects and
00:29:40.179 because it knows the internals of these objects it's easy for me as a GC developer to say okay I'm gonna find all
00:29:46.779 the references because I know how those internals are implemented and you should
00:29:52.239 too after the awesome talk today about
00:29:57.539 classes so you should know this too okay since we know since we know the
00:30:03.489 internals it's easy for us to update now this is this is good this is our best case scenario unfortunately we have
00:30:09.940 another issue which is see references now imagine that we have this class class bar class borrows our regular Ruby
00:30:16.720 object but we have this this other class called foo now let's say this foo class
00:30:22.090 is implemented in C okay and it points that it points at bar now unfortunately
00:30:29.649 we don't know how this arrow is made how does foo point at bar we don't know how
00:30:36.429 it actually does that well the GC doesn't I don't know how your C code is written
00:30:41.830 I can't inspect it because I can't inspect it I don't know how that reference is made now let's say that bar
00:30:49.749 gets garbage collected some day if bar is garbage collected goes away and it
00:30:56.529 means that that reference is now bad so if your foo object tries that tries to access bar it's going to crash because
00:31:04.450 the GC collected bar so how do we solve this problem today if you're writing a C
00:31:09.759 extension to Ruby you need to do something to prevent that object from going away and the thing that you have
00:31:14.859 to do is you have to call a function called RB GC mark so you have to manually mark this object and say hey
00:31:21.850 I'm holding a reference to this okay you say I hold a reference to this object
00:31:27.730 please do not garbage collect this object I need it it's important to me so
00:31:33.330 now unfortunately if I move that object off to the side and I place a team moved
00:31:40.480 inside of that your program will crash again it'll try to access that team moved it'll blow up
00:31:47.270 right so we can't move that I can't move that object I can't move it
00:31:53.520 so but I need a way to be able to detect that I cannot move that arrow I can't move that arrow so the way that I decide
00:31:59.460 to do this is to change our BGC marks such that pins your object it says okay
00:32:05.780 there is a CEO object that's pointing at this ruby object and I cannot move it so
00:32:11.340 I changed the our VGC mark function to put a mark four bar inside this pinned
00:32:17.550 bits table and say hey we can't move this sorry it's gonna stay right here so that the C the C object works okay so I
00:32:25.500 changed the implementation of our BGC mark to do two things one is to mark the object and the other is to put a pin bit
00:32:31.740 in the pin bits table so now our GC
00:32:36.990 compact function what it does is it says okay we're gonna do a full GC so all objects get pinned we compact all the
00:32:43.860 objects and then we update the references so those are the three steps of this compaction method okay so let's
00:32:49.980 look at our list again what can we move everything except objects marked with our VGC mark so let's look at other
00:32:56.670 let's look at other problems so far everything has been fine my journey to
00:33:02.400 build this compaction everything was easy until I had to implement it so I
00:33:07.860 want to talk about other objects that can't move these are program these are problems I encountered while developing this so
00:33:13.530 first let's look at hash tables oh now most of us know how hashing works we calculate a hash key for our object and
00:33:20.580 we stick that into the hash now unfortunately the default implementation of a hash key is the actual memory
00:33:26.640 address of the object so if I change the location of that object it means that its address changes and that means that
00:33:33.390 the key changes and that means that you won't be able to look up your object in the hash anymore and that would be bad
00:33:38.880 you probably want to look up your objects in hashes I think if you can't
00:33:44.220 find it you might have a bad day so I don't allow those to move those cannot
00:33:49.590 move because they're based off the memory address so that's another thing that goes in our pin bit table the way that we can fix this is we can actually
00:33:55.680 cache our hash key I just haven't done this yet we can say okay well
00:34:00.690 someone inserted this into the hash let's cache the hash key and never change it I just haven't implemented this yet so
00:34:06.660 let's go back to our list of what can move everything except objects marked with our BGC mark and hash keys okay
00:34:13.740 it's okay just two things now let's look at dual references there's another
00:34:19.140 problem I ran into let's say we have two objects foo which is written in C and Baz which was written in Ruby and they
00:34:25.500 both point at bar so one is a C object one is a ruby object however the author of the C object doesn't call our VGC
00:34:33.420 mark they decide not to call our VGC mark and the reason behind that is because bass points at bar and they know
00:34:40.590 that the object won't move it stays in the same location so they say well the other Ruby object is going to mark it so
00:34:46.800 I'm not going to bother marking it because the other Ruby object keeps it alive however with a compacting garbage
00:34:52.980 collector okay so it doesn't move however with a compacting garbage collector it may move I may move that
00:34:59.090 and we end up with a tea moved a team moved location in that same place and
00:35:04.500 what happens in this case is we say okay well the compactor says well okay Baz used to point it to moved so we're going
00:35:10.920 to update the address it gets updated to updated to the right location but since the GC didn't know about the C
00:35:17.250 implementation it won't get updated and the program will crash so the fix for
00:35:24.300 this particular case is to change the code to call our b GC mark or only use
00:35:29.460 Ruby objects don't use a c object anymore and i ran into this problem with the message fact gem if you go to that
00:35:35.610 pull request you can actually see you can see the example where I fixed this message pack gem and the example turned
00:35:41.640 out to be well you didn't actually need to use C in the first place we could use Ruby so change it to use
00:35:47.460 Ruby so what can move everything except objects marked with our VGC mark and
00:35:52.980 hash keys and dual reference objects okay three things let's look at let's
00:35:58.890 look at something else so global variables we'll look at global variables so this is a global variables and what I'm talking about global variables I
00:36:05.190 mean in C so for example here we have the C code and what it does it creates a new class called foo and it assigns that
00:36:12.150 class to a see global now unfortunately the garbage collector knows nothing about your C
00:36:19.140 Global's so it doesn't know that you have a seagull object pointing at this class so
00:36:24.870 the CGC can't update these Global's so in this case what I decided to do is use
00:36:31.440 heuristics to pin these objects I know that it's common for people to call RB define class and assign that to a global
00:36:37.800 so I use heuristics to pin those objects say all right if you create a class that can't move those can't move ok so what
00:36:45.510 can move everything except objects marked with our BGC mark and hash keys
00:36:51.150 and dual reference objects and objects created with our be defined class ok so
00:36:56.160 what's next string literal string literals so let's say we have a string
00:37:01.500 literal for example we have this foo and we just say puts hello world now when
00:37:07.440 you're our Ruby code gets compiled is turned into what's called an IEEE sequence object this I sequence object
00:37:14.280 it's a ruby object but it's written in C this I sequence object points at a list
00:37:19.620 of literals so it has a Ruby array of literals and that literal points at inner is that literal points at the
00:37:26.160 hello world the string hello world the actual byte code the instruction sequence also points at the byte code
00:37:32.220 the byte code is written in C and the byte code points at that string as well so if the literal moves that means that
00:37:41.100 the byte code needs to be updated as well so we need to know how to go through an update byte code so we cannot
00:37:47.940 move that because it is very difficult to change the byte code updating byte code is hard so I haven't done that yet
00:37:55.440 it means that literals cannot move it's possible but I haven't written it yet so what can move okay everything except
00:38:03.360 objects mark with our VGC mark and hash keys and dual reference objects and objects created with our be defined class and string literals so if you're
00:38:10.650 looking at this and you're saying wow jeez it seems like it seems like nothing
00:38:19.430 well the good news the good news for you and for me is that most of these things
00:38:24.450 can be fixed so I went a list of all these things that cannot move and when you think about all of
00:38:32.240 them and their problems they can all be overcome we can overcome all of them except for the RB GC mark case we cannot
00:38:39.080 fix that the rest of these have fixes so the real question is how long does it take to do it and I need to be able to
00:38:47.180 actually type out the code that fixes them so even with these restrictions that we have today with this initial
00:38:53.300 implementation if I run this against a test rails application in fact 46% of
00:38:58.880 the objects in the system can move so even with these massive restrictions that we've applied 46% of our heap can
00:39:05.750 still move and that's actually really great so if we take a look at a graph let's let's look at a graph before we do
00:39:11.450 compaction this is a graph of the heap the vertical lines are pages of the heap
00:39:16.730 the red lines are the number of objects in that page that cannot move and the
00:39:21.980 green lines are objects on that page that can move and the white in the lines is free space and these pages are sorted
00:39:28.850 by number of objects that cannot move this is before compaction if we compact it you can see that all of those objects
00:39:34.670 the green objects move together so they get grouped together we have those pages on the right now let's look at some
00:39:42.080 inspecting memory I want to talk a little bit about inspecting memory and the reason I want to talk about this is because if you're working on GC if
00:39:48.470 you're developing the garbage collector you need to be able to inspect memory and analyze it otherwise how do you know
00:39:54.380 how much improvements you've done now there's my favorite my favorite way to
00:39:59.869 inspect memory with Ruby is a method called object space dump ball and you can use this in your applications today
00:40:05.780 so I want you to take a look at this function it's the easiest and fastest way to understand your Ruby objects
00:40:11.180 memory the way that or your Ruby programs memory the way that you use it is you just call dump ball and it
00:40:16.640 outputs a JSON so I in this case I output I open a file out JSON and I
00:40:22.640 write all of it out to that JSON file and if you want to use this please do
00:40:29.150 what I am doing here this file dot open copy this code please the reason you
00:40:34.310 want to do it is because other forms of it may close the file or your program
00:40:40.590 I've run into problems where your program exits without closing the file late enough and you'll end up with half
00:40:45.720 written JSON so do this please if we take this we can actually measure the
00:40:53.130 size of our heap when we boot rails so here's an example where I'm booting rails in production
00:40:58.830 I do GC compact and then I output this JSON and I can see what the output looks
00:41:04.440 after it looks like or your heap looks like after we've compacted and booted the process so the output looks
00:41:10.380 something like this this is just the JSON each line in your JSON file is one object this is the same JSON I've just
00:41:17.870 formatted it so it looks nice but what's interesting about this is you'll see the address of the object that's the actual
00:41:24.480 address of the object in memory its location in your computer's memory and it lists all the objects that that
00:41:30.360 object references so if you look through the JSON you can build a tree of all the objects in your system and it also lists
00:41:36.720 the size of the objects how much memory that object used so you can figure out
00:41:42.350 what's consuming memory in your system so remember the address is the location
00:41:47.400 and if we know the location of the objects we can actually calculate a graph of the heap and this is what a
00:41:54.060 graph of the heap looks like based off of those addresses so if you look at this you can see heap fragmentation
00:42:00.870 before we didn't really see it so much because all the objects were stacked together this is a graph of the memory
00:42:06.690 and where objects are actually located so red dots represent objects and white
00:42:12.840 dots represent empty spaces and this is before before we do compaction now if we
00:42:18.030 graph it again after compaction you can see all of those get grouped together and we have one solid Red Square I've
00:42:23.910 made another graph where you can see here red dots represent hind' objects
00:42:29.730 and green dots represent ones that can actually move so you can get a sense of where all of the pinned objects are and
00:42:35.940 again if we move it like this you can see all the red objects can move together and if you want to do something
00:42:42.150 like this with your application today you don't need the patches that I have against Ruby you can use this with your ruby implementation today if you go to
00:42:49.200 this URL okay again hun github.com I feel like a
00:42:57.220 Salesman I'm sorry if you go to this URL let's just say this URL you can get
00:43:03.010 those you can get access to those tools so now I wanted I talked a lot about
00:43:08.109 copy-on-write optimizations and I want to talk about how to measure those in production so let's look at how to
00:43:14.470 actually measure copy-on-write optimizations with your process the way you do this on Linux I don't actually
00:43:20.500 know how to do this on OS 10 yet the way you measure this on Linux is you use the
00:43:25.990 proc file system and you look for a thing called this Maps file so you look at your pid' whatever kid you have for
00:43:33.190 unicorn or whatever process and look up this maps file if you look inside of it it gives you descriptions of the layouts
00:43:39.400 of the memory for that process and you'll see a bunch of entries in it that look like this and what this is is
00:43:45.400 you'll see at the top here is the address range of those the address range of those pages and it also displays the
00:43:52.119 RSS the PSS and the shared dirty and shared clean we'll talk about those in a
00:43:57.700 second so those address range is important because that address range matches up to those object addresses
00:44:03.730 that we saw in the heap dumps so if you look at your heap dumps versus the SMAP files you can figure out exactly where
00:44:10.030 those objects are mapped into your operating systems memory now the RSS and
00:44:18.010 the PSS are pretty interesting you've I'm sure you've seen RSS if you run top the RSS is essentially taking these
00:44:24.670 values the shared clean shared dirty private clean and private dirty and adding all those up together ok we'll
00:44:32.619 talk about why this is important in a second the PSS what that does is it takes the shared dirty the number
00:44:37.930 divides that by the number of processes and then adds up the shared clean private clean private dirty now these
00:44:45.220 are interesting numbers because if we look at the RSS versus the PSS of say
00:44:52.180 your unicorn application or any any process you'll see that if we compare
00:44:57.520 the two say we have a unicorn parent process that's three megabytes and a
00:45:02.950 unicorn child process that's three megabytes you might assume that the total amount
00:45:08.109 of memory that they're using is seven megabytes but it's not it's actually the sum of the PSS the total usage is
00:45:14.859 actually three megabytes this is just if we measure just after forking and the reason is because the child process is
00:45:21.160 sharing all of the memory of the parent so our total usage is three megabytes not seven whether if you looked at the
00:45:28.060 RSS so we can take an example of this let's look an exam at an example of how
00:45:33.310 these these numbers change over time so let's say we we created a program here
00:45:39.100 where we fork a process and we gradually write to that process and if we graph the RSS and the PSS it'll look something
00:45:46.420 like this over time you'll see the RSS stays constant where the PSS and the
00:45:51.940 shared dirty PSS increases in the shared dirty decrease so what we want to do is we want to optimize for having those the
00:46:00.190 most number of shared dirty pages possible so now let's now that we know how to
00:46:06.190 measure copy and write optimizations what we want to do is measure how this compaction impacts copy and write so
00:46:12.760 what we say is in this particular case we compact the heap fork a new process then fill the rest of the heap and
00:46:19.240 compare it see what the and see what the PSS looks like now unfortunately these
00:46:24.700 are the numbers if we don't compact the
00:46:30.280 amount of memory it takes is about 2.6 megabytes if we do compact it's 2.5 so our total savings is 154 kilobytes so
00:46:38.310 we're still testing this in production I need you to know that this is with a basic rails applet
00:46:45.150 we're still testing this in production with real loads so I'm not sure what a real application looks like but it
00:46:51.430 doesn't look so great after this development so let's finish this up the
00:46:57.990 compaction savings are unknown the reason is because we're testing against a fake load we don't know what this
00:47:04.210 looks like in a real-world situation currently also only 46 percent of objects can move obviously the higher we
00:47:11.140 get this the better the the better these numbers are going to look if you want to measure objects or measure memory usage
00:47:17.710 in your application today use the this use object space if you want to measure copy-on-write
00:47:23.010 optimizations use the smap's files again I want to end I want to end this
00:47:30.150 presentation by talking about why compact and I mean compaction it's
00:47:36.000 important for improving copy-on-write and saving memory but to me the most
00:47:42.599 important thing about this project was proving that it wasn't impossible
00:47:47.869 somebody asked me why you know why is it why is this impossible and I didn't have to answer or I didn't have an answer and
00:47:54.630 I feel bad that I wasted so much time thinking oh this problem is impossible I
00:48:01.040 thinking this is impossible so what I want to say to people to you today in
00:48:07.079 the audience's if you think something is impossible like question your assumptions think about it why is this
00:48:12.300 impossible have I tried it maybe I should try it maybe it's something I can do because in my experience especially
00:48:18.510 with computers we're only limited by our imaginations right we can accomplish
00:48:24.569 anything in with these little magic boxes that we have in front of ourselves so if you think something is impossible
00:48:30.869 just try it see maybe you can so that is all thank you so much thank you for
00:48:36.720 having me
00:48:42.490 I don't know if we have time for questions okay we have time for questions yeah we're gonna have some
00:48:48.819 questions please question yeah and my assumptions black please yes you can
00:48:54.670 have a cat sticker already it's coming sale oh I already have had stickers
00:49:01.230 yes well my question is what are one of
00:49:06.790 those problems is most restrictive so what if we eliminate for example hash
00:49:12.579 keys program could we have a 70 percent compaction oh the same question about
00:49:18.220 another question no problems yeah that's a good question I don't know
00:49:26.309 so I've got all the tools to map out like okay so one of the one of the open
00:49:33.130 questions that I didn't talk about in this presentation is what is actually preventing all those objects from moving
00:49:38.680 and I have all the tools to calculate that so I know I just don't remember the percentages off the top of my head but
00:49:45.130 the main targets are hash keys and literals in code okay thank you very
00:49:51.930 good question next one Nik
00:50:07.500 actually just stick with a different place of memory and then that curve that has read it yes that is an excellent
00:50:14.850 question so for anyone that didn't hear the question is are there heuristics about objects that we could say at the
00:50:21.510 time we allocate the object is this thing unmovable and then stick that in a
00:50:26.700 separate location and the answer is yes I've actually done some other work on
00:50:31.980 splitting heaps by type and we do know we can know heuristics about things so
00:50:38.310 for example the the example that I showed where we said okay classes classes allocated with rbl are be
00:50:45.570 defined class those can't move and we can know at allocation time that those can't move and we'll we can split it so
00:50:51.390 if we combine these two techniques of compaction plus keep splitting we can actually improve a lot so there are
00:50:59.190 cases where we can move and I just literally have not typed the code yet like for example hash keys and there are
00:51:05.880 cases where we cannot move but we could allocate it into a different location and improve this so there are different
00:51:11.790 techniques we can use to get to you know 90 99 % of objects yes all right one
00:51:21.990 more final question yep I'm not sure if it isn't a stupid
00:51:28.670 question but since I it seems that we are trading memory efficiency with the computation
00:51:36.530 efficiency because I mean in a world where if you have a lot of memory space
00:51:41.750 would compact can make any difference you know yeah so that's a good question
00:51:49.910 actually will will compact and make any difference even though we're allocating tons of objects all the time in this
00:51:56.210 case so in this case yes the idea is we take all the so let me back up a little
00:52:03.680 bit when you compile your Ruby code or when when your Ruby code gets compiled a
00:52:08.750 lot of that is internally represented as Ruby objects so classes those string
00:52:16.609 literals are actually represented as Ruby objects you don't you don't ever have access to those objects but they're
00:52:22.910 implemented internally as Ruby objects so what I can do is say okay after we've
00:52:28.790 loaded all the code since that code is gonna live in memory for the life of your process if we compact all of those
00:52:35.420 together then fork off processes later all of that stuff that was initially
00:52:41.690 allocated will stay stay shared between all the processes today the issue is if
00:52:48.109 we take a look at let me rewind the rewind this a little bit too
00:52:55.440 here no there so today this is what your
00:53:01.020 memory looks like and any time we allocated an object in one of the child processes and allocates into one of
00:53:08.190 those white holes maybe some of the red stuff gets copied along with it to the
00:53:14.400 child process right but if we compact it together like this there's no holes in
00:53:19.770 there for it to get copied because remember the operating system copies more than just one object it'll copy a
00:53:25.740 whole page so if we can make sure that there's no holes in there then we don't have those extra extra objects being
00:53:32.369 copied to the child process so as your code is running or as your rails app is
00:53:40.160 processing requests maybe compaction isn't so useful because those objects
00:53:46.170 are going to go away which is actually why we wanted to do it right before we forked all the child
00:53:51.869 processes we want this for long-lived objects does that make sense all right
00:53:58.770 thank you so much put your hands together one more time for Eric Patterson thank you and please please
00:54:06.869 come say hello to me this is my first time in Malaysia and I don't know when I'll get to come back so I would like to
00:54:13.440 talk to all of you we can say hello does a group home right now yeah one two or three
00:54:18.800 problem solved we compacted that hello
Explore all talks recorded at RubyConf MY 2017
+16