RubyConf 2022

Ruby's Core Gem

Ruby has a core library that is part of the interpreter and always available. It’s classes like String and Time. But what would it be like if we re-implemented the core library, writing it in Ruby itself, and made it available as a gem? Would it be faster or slower? Would it be easier to understand and debug? What other benefits could there be? It was originally Rubinius that implemented Ruby’s core in Ruby, and it has been taken up and maintained by the TruffleRuby team.

RubyConf 2022

00:00:00.000 ready for takeoff
00:00:17.340 right thanks for coming everyone I'll
00:00:19.080 get cracking so I want to talk to you
00:00:20.520 today about an idea that's been around
00:00:23.100 for a while which we can call Ruby's
00:00:25.140 core gem
00:00:26.460 I'm a big fan of giving the the big idea
00:00:28.980 up front to get straight into what we're
00:00:31.080 talking about achieving
00:00:32.700 so Ruby has a core library right it's
00:00:34.500 those those methods like those classes
00:00:36.300 like array hash string that sort of
00:00:38.219 thing and currently this is implemented
00:00:40.079 in C in the standard version of Ruby so
00:00:43.559 this is the code to implement uh loop do
00:00:45.899 for example which is a core Library
00:00:47.280 routine and you can see it's written in
00:00:49.140 C has to be broken apart into a couple
00:00:51.960 of methods because that's the way things
00:00:53.879 like rescue work in the C version of
00:00:55.860 Ruby and this isn't very readable right
00:00:57.899 this is hard for us to understand as
00:00:59.640 application programs turns out it's also
00:01:01.860 a bit hard for the Ruby VM to understand
00:01:03.899 it to do anything meaningful with
00:01:06.299 so the big idea is let's rewrite this
00:01:09.540 into Ruby using the language we use for
00:01:12.119 our applications let's use it for the
00:01:14.280 core Library as well and you can already
00:01:16.260 see some benefits here this is much more
00:01:18.240 understandable if you want to know what
00:01:19.979 Loop do does you can simply read this
00:01:22.140 code you can see it runs a while loop
00:01:24.119 and it yields the block each iteration
00:01:26.040 you can maybe even see some things you
00:01:27.840 didn't know before like did you know if
00:01:29.400 you call Loop without a block it gives
00:01:31.560 you an infinite enumerator where you can
00:01:33.720 see that from the source code even if
00:01:35.820 you couldn't see that from the C code as
00:01:37.259 easily we can also see that can you
00:01:39.479 break out of a loop do yes you can you
00:01:41.280 can use the stop iteration exception and
00:01:43.860 that's clear again from the Ruby code
00:01:45.180 and it turns out this has many benefits
00:01:47.159 not only for understandability but also
00:01:49.320 for how the VM can optimize it how you
00:01:52.020 can use tooling on it all sorts of
00:01:54.000 things and it also turns out to be a
00:01:56.100 great way to talk about the potential
00:01:57.659 future for Ruby and how we can make Ruby
00:01:59.880 much better in the longer term
00:02:02.399 bit of context about me and my work I'm
00:02:04.320 from Cheshire in the UK that's Cheshire
00:02:06.600 as in the cat I've got a PhD in
00:02:08.399 compiling Ruby I founded truffle Ruby
00:02:10.920 which is an alternative implementation
00:02:12.300 of Ruby I'm using as an example for some
00:02:14.520 of the work I talked about today I was
00:02:16.440 formerly at Oracle Labs I'm now at
00:02:17.940 Shopify uh which is a really supportive
00:02:19.800 place with great people I'm interested
00:02:22.140 in specifically in optimizing idiomatic
00:02:24.420 Ruby code so I like talking about
00:02:25.739 optimizing Ruby as it is rather than
00:02:28.140 transforming Ruby to be something else
00:02:29.760 in order to be optimized I lead a
00:02:32.459 British Cavalry Squadron in my spare
00:02:33.780 time I'm interested in meeting other
00:02:35.160 Ruby reservists and Veterans if you're
00:02:37.500 out there
00:02:39.239 so one of the Core Concepts I want to
00:02:40.980 talk about is Ruby's Tower of libraries
00:02:43.019 we can talk about Ruby's different sort
00:02:44.459 of libraries and where they sit in a
00:02:46.019 tower
00:02:47.220 so we have the language the core Ruby
00:02:50.280 language that we have in the Ruby
00:02:51.720 interpreter we can talk about the core
00:02:53.220 libraries being one level above that and
00:02:55.800 then above that we have the standard
00:02:57.420 Library which is things like Json that
00:02:59.280 you can require without installing
00:03:00.599 anything on top of that we can talk
00:03:02.099 about gems and user code
00:03:04.200 so the bottom we can also talk about
00:03:05.940 this being Ruby code and C code and the
00:03:08.760 further down the stack you are more
00:03:10.680 stuff is written in low level C and the
00:03:12.900 higher you get it's more written in Ruby
00:03:15.120 and currently the crawl library is
00:03:16.440 written in C almost entirely and that's
00:03:18.900 what we're going to talk about today
00:03:20.819 so at the bottom there's the language
00:03:22.379 this is a very small number of things
00:03:24.720 provided by the language Ruby it's the
00:03:26.459 Ruby language itself so that's things
00:03:28.260 like classes modules methods uh method
00:03:31.980 calls and some control structures like
00:03:33.900 if while case and or things like that
00:03:36.959 but it's actually a really small subset
00:03:39.120 very little else is provided by the
00:03:41.220 language so in the code example here
00:03:43.080 we've got if and an and a method call
00:03:45.299 and that's really all the language
00:03:46.440 provides
00:03:48.299 the next level up we have the core
00:03:49.920 Library this is things like array hash
00:03:52.560 but also lower level things like numbers
00:03:54.599 and strings numbers and strings are
00:03:56.519 actually part of the core Library
00:03:57.840 they're not really provided by the
00:03:59.459 language
00:04:00.840 great thing about uh and also some
00:04:02.760 control structures so things like Loop
00:04:04.560 and um array each and hash each these
00:04:07.799 are provided by the core Library again
00:04:09.060 even though they're like control flow
00:04:10.200 structures the core library is
00:04:12.180 automatically available you don't have
00:04:13.439 to require it it's just magically always
00:04:15.120 there it's implemented as a c extension
00:04:18.720 um it's a c extension that's just built
00:04:20.400 into Ruby but it's the same API it uses
00:04:22.199 and there's around
00:04:23.960 2250 methods so it's really large
00:04:26.940 there's a lot in Ruby's batteries
00:04:29.340 included core Library so if we have
00:04:31.320 something like a hash and we do dot
00:04:33.620 values.sort.first.add values sort first
00:04:36.540 and add are all provided by the core
00:04:38.160 Library
00:04:39.540 on top of that is the standard Library
00:04:41.280 this needs to be required with some
00:04:43.620 exceptions but it's available without
00:04:45.120 installing anything so it's just there
00:04:46.800 it's part of the Ruby distribution still
00:04:48.600 we won't worry too much about the
00:04:50.340 standard library in this talk it's not
00:04:51.600 really relevant
00:04:52.919 um the code example here shows Json for
00:04:55.080 example so json.generate that's a
00:04:57.060 standard Library feature something
00:04:58.440 slightly interesting about it though is
00:04:59.699 it's being lifted in the tower so over
00:05:01.860 time the standard library is becoming a
00:05:04.259 gem it's being gemified and made
00:05:06.840 something you can install separately if
00:05:08.520 you wanted to
00:05:10.380 on top of this you have your gems and
00:05:12.660 your user code this is Ruby code that's
00:05:14.580 loaded at runtime from outside The
00:05:16.680 Interpreter outside the Ruby
00:05:17.759 distribution it can be from a gem or it
00:05:20.040 can be from coding your repo that makes
00:05:21.900 a big difference to us as programmers we
00:05:23.639 think about gems as being something
00:05:24.780 separate from user code but for the VM
00:05:27.720 it doesn't really make any difference
00:05:28.740 it's all code loaded from disk at
00:05:31.080 runtime
00:05:32.880 um so for example some rails code and a
00:05:35.220 controller that's all user code and
00:05:37.380 sometimes gems and user code are written
00:05:39.000 in C as well right a gem like nokugiri
00:05:41.699 or like openssl there's a lot of C code
00:05:44.280 there so it can still include C code but
00:05:46.199 it's required at runtime
00:05:49.259 there's many great things about core as
00:05:51.300 it is it's always available it can't go
00:05:53.520 wrong you can't end up with the wrong
00:05:55.620 version of core or find yourself without
00:05:57.960 core installed that's great and it can
00:06:00.419 be used to build bigger things because
00:06:01.740 it's always available so ruby gems for
00:06:03.539 example requiring gem stuff like that
00:06:05.520 that's built on top of core it's
00:06:07.680 available instantly as soon as The
00:06:09.479 Interpreter starts it's just there ready
00:06:10.979 to go which is great for application
00:06:12.419 boot time it can use VM internals to do
00:06:15.479 things you can't do in Ruby so a
00:06:17.820 low-level thing like file i o that can't
00:06:20.759 be done in pure Ruby but it can be
00:06:22.740 implemented as the core library with a
00:06:24.300 file object
00:06:26.160 Pilots can be taught about it and we'll
00:06:27.960 explain more about that later because
00:06:29.280 that's a key point
00:06:30.960 but there's bad things about Coreys is
00:06:32.819 it's far too big 2 000 some methods
00:06:36.000 that's too many methods for us to
00:06:38.220 understand and to work with as VM
00:06:40.800 implementers
00:06:42.060 there's no Ruby code that you can read
00:06:43.620 so you can't go and see what a method
00:06:45.300 does with your knowledge of how to read
00:06:47.400 Ruby and understand what the core
00:06:48.960 Library does you're off into codeland
00:06:51.000 and it's not always the most
00:06:52.380 understandable code even if you
00:06:53.759 understand C there's no Ruby code you
00:06:56.100 can debug you can't step into it and
00:06:57.780 understand what it's doing there's no
00:06:59.639 Ruby code to use profiling tools or
00:07:01.680 coverage tools it's all C extension code
00:07:05.759 and the bad thing about C extension code
00:07:07.860 is did you know C code can be worse for
00:07:09.960 performance than Ruby code we'll explain
00:07:12.120 why later and this problem gets worse as
00:07:14.819 Ruby gets more sophisticated over time
00:07:16.860 with things like yjet
00:07:20.160 can we get the best of the Both Worlds
00:07:22.080 so the best of the advantages without
00:07:23.520 some of the disadvantages
00:07:26.460 what we're talking about doing is taking
00:07:27.900 that core Library part of the Tower and
00:07:30.300 splitting it up into two parts one which
00:07:32.940 we'll call the new core Library
00:07:34.259 implemented in Ruby and that'll be
00:07:36.419 sitting on top of a smaller set of
00:07:38.160 Primitives which are implemented as they
00:07:40.259 currently are in C or in Java in
00:07:43.020 something like jruby or truffle Ruby
00:07:44.580 split it up into core and Primitives
00:07:48.780 this should hopefully give us the best
00:07:50.280 of both worlds we'd have the bulk of our
00:07:52.380 code in Ruby where Ruby programmers we
00:07:54.960 like seeing code and Ruby we can
00:07:56.699 understand it this means it can read
00:07:58.740 understood debugged by anyone it also
00:08:01.620 means it can be better optimized by the
00:08:03.060 VM we're T we're building things like
00:08:05.280 yjit to optimize Ruby code so the more
00:08:07.979 Ruby code we have it can optimize the
00:08:09.960 better
00:08:11.340 we'd have a small set of underlying
00:08:13.560 Primitives implemented in the same way
00:08:15.479 as C extensions and then we can teach
00:08:17.639 the compiler specifically about them
00:08:19.680 because there's a smaller set so we can
00:08:21.960 make the VM completely aware of this
00:08:23.759 small set of Primitives so it can work
00:08:25.680 with them and understand exactly what
00:08:27.180 they do
00:08:29.960 Rubio implementations already do this to
00:08:33.060 some extent
00:08:34.680 MRI also known as C Ruby does it just a
00:08:37.620 tiny bit at the moment JB does it a bit
00:08:40.320 more and truffle Ruby does it a bit more
00:08:42.479 still we'll talk about rabbinius later
00:08:44.520 because there's some history here with
00:08:46.080 rubinius pioneering this technique
00:08:49.920 so let's talk about how MRI or C Ruby
00:08:52.380 does it to a little extent today
00:08:55.500 so this is from the the MRI source code
00:08:58.500 it's a very simple method you might have
00:09:00.360 heard of called tap tap allows you to
00:09:02.640 run a block with a value and then return
00:09:04.440 the value you can use it to inject into
00:09:06.779 like a pipeline of method calls to for
00:09:09.540 example print out an intermediate value
00:09:11.100 from a chain of method call or something
00:09:12.660 like that it's very simple and all it
00:09:14.700 does is it yields its value and then it
00:09:17.160 returns its value so you can keep using
00:09:18.839 it so we can express this in pure Ruby
00:09:21.839 and this is what MRI already does
00:09:23.399 there's no need for this speech written
00:09:24.839 in C so we have the kernel module
00:09:26.760 written as pure Ruby code and we have
00:09:28.440 tap and it just yields self and then it
00:09:30.779 returns self we can understand that
00:09:32.519 written in Ruby and this is actual MRI
00:09:34.560 code today so part of Ruby's core is
00:09:36.839 written in Ruby
00:09:39.000 a more complicated example is something
00:09:40.920 like frozen so you can ask an object if
00:09:43.200 it's frozen that's actually a method on
00:09:44.760 kernel which all objects include so how
00:09:47.580 can we Implement that because of how can
00:09:49.440 you read the Frozen status if you're
00:09:51.120 trying to implement the method call to
00:09:52.740 read the Frozen status
00:09:54.420 what MRI includes is something that lets
00:09:57.300 you include C code in your Ruby code and
00:10:00.540 you can use this to write the lower
00:10:01.620 level stuff you can't do in pure Ruby so
00:10:04.200 what we're saying here is I want to run
00:10:06.240 the C code to call the C extension
00:10:08.700 method RB object Frozen p p being like a
00:10:12.540 question mark in Ruby so this means that
00:10:14.519 we can Implement more stuff in Ruby
00:10:16.080 because we can call into the C route the
00:10:18.180 C runtime code to do it
00:10:22.200 and then that lower level primitive is
00:10:24.420 then implemented in see Itself by
00:10:26.580 directly accessing the objects into that
00:10:28.200 and the great thing is this method can
00:10:29.940 be very small with C function it's only
00:10:31.740 one line it just gives us that tiny
00:10:33.180 little bit information this is much
00:10:35.100 better C code to have left in our Ruby
00:10:37.260 VM after we've moved everything else out
00:10:39.000 to Ruby
00:10:41.160 MRI has
00:10:42.800 2194 core methods implemented in C so
00:10:46.079 the vast vast majority of stood in C
00:10:48.300 it's got 64 core Primitives which is
00:10:51.120 like a special kind of method
00:10:52.380 implemented in C distinction doesn't
00:10:53.940 matter too much and it's got only 31
00:10:56.040 instances of that inline C so it's a
00:10:58.200 great idea it's not being used much at
00:11:00.420 the moment it's got seven special
00:11:03.240 optimized core methods again the
00:11:05.160 distinction doesn't matter too much and
00:11:06.779 it only has 101 core methods implemented
00:11:09.300 in Ruby so it's only taking baby steps
00:11:11.459 towards re-implementing stuff in Ruby
00:11:13.140 when we talk about some of the
00:11:14.399 advantages and disadvantages we'll see
00:11:16.079 why that probably is right for the
00:11:18.060 moment
00:11:19.800 trophobi takes it quite a lot further
00:11:23.279 so trophobi has the same methods that
00:11:25.260 can Bloomington is in pure Ruby right
00:11:27.420 now so it has exactly the same tap
00:11:29.279 method coming from MRI and we can start
00:11:32.279 to see one of the benefits here and that
00:11:33.600 this code is exactly the same as MRI so
00:11:35.640 perhaps MRI jruby and chop Ruby could
00:11:39.000 all start to share this code a bit
00:11:42.180 but trophy takes it much further so the
00:11:44.519 hash class for example has a couple of
00:11:46.440 routines you might know about called key
00:11:48.120 that gives you the key for a value it's
00:11:50.279 like the opposite of looking up in a
00:11:52.260 hash right it goes from the value back
00:11:54.660 to the key rather than the key to the
00:11:56.160 value I went 2A gives you an array of
00:11:59.160 tuples for each key value in the the
00:12:01.980 hash
00:12:03.000 so these can both be implemented on top
00:12:04.860 of a primitive called each pair so to
00:12:07.800 get the key uh to implement key we can
00:12:10.260 do each pair and then simply say if the
00:12:11.940 value is what's expected return the key
00:12:13.500 otherwise return nil to do 2A we can
00:12:16.320 create an array and we can simply push
00:12:17.700 each key value into it by using each
00:12:19.560 pair so we Implement two records in Ruby
00:12:22.740 nice and understandably simple compact
00:12:25.800 code and then implements a single
00:12:27.480 primitive each pair that does the heavy
00:12:29.519 lifting
00:12:30.540 so one primitive gives us two Ruby
00:12:33.120 methods
00:12:34.740 in trouble would be there's 611 core
00:12:37.380 methods implemented in Java so we're
00:12:39.300 using Java rather than C and 353 core
00:12:42.420 Primitives implemented in Java again
00:12:44.160 it's just a slightly different type of
00:12:45.420 core method but we have 2 386 or core
00:12:49.260 methods implemented in Ruby you may be
00:12:51.480 wondering why this doesn't add up to
00:12:52.680 2250 it's because there's helper methods
00:12:55.320 and stuff like that makes it quite fuzzy
00:12:56.820 but the point is the the majority of
00:12:58.800 stuff is now implemented in Ruby instead
00:13:01.260 of in Java and we apply some some
00:13:03.839 techniques to think about do we want
00:13:05.100 something in Java or do we want it in
00:13:06.540 Ruby to divide them up
00:13:08.940 JB does this technique as well I'm not
00:13:11.100 going to talk too much about Joey
00:13:12.180 because I don't want to speak on their
00:13:13.079 behalf too much basically a great
00:13:14.940 example from jruby integer times so this
00:13:18.360 is a routine you might know about and
00:13:19.740 again this is a great simple
00:13:20.700 implementation it uses a while loop and
00:13:23.459 yield to implement the the times method
00:13:27.720 what the advantages of doing are core in
00:13:29.820 Ruby
00:13:31.079 one advantage is understandability you
00:13:33.959 can browse the Ruby code to understand
00:13:35.519 it if you know Ruby you can see go and
00:13:37.680 see how the core Library routine works
00:13:39.420 you can answer your own questions about
00:13:42.180 what core methods really do does this
00:13:44.880 method do this does it do that you can
00:13:46.620 try and read the documentation
00:13:47.660 documentation isn't always great if we
00:13:50.579 have the caller interested in Ruby it's
00:13:52.740 a single sort of Truth written in the
00:13:54.839 one language that has Ruby programmers
00:13:56.700 we all share and all understand
00:13:59.639 you can use your normal debugger
00:14:01.380 coverage and profiler tools it's no
00:14:03.660 longer a black box that's impenetrable
00:14:05.399 written in C and A really esoteric
00:14:07.620 version of C as well
00:14:09.899 another Advantage if we can share this
00:14:11.519 code MRI truffle Ruby jruby artichoke
00:14:15.300 whatever comes next
00:14:17.279 can all share the same core Library each
00:14:20.339 would Implement just a smaller set of
00:14:22.620 Primitives their own way so they can
00:14:24.480 still do things differently to make it
00:14:26.339 suit what they want to achieve but they
00:14:28.980 can share the core library on top on
00:14:30.540 modified VM people can focus on making
00:14:33.180 their Primitives work well while the
00:14:35.339 rest of the community worries about the
00:14:36.899 core Library making that work well it
00:14:38.880 would mean that people were more free to
00:14:40.199 make contributions to the core Library
00:14:42.000 based on what they know about from their
00:14:43.920 application developer that application
00:14:46.079 development and let the VM people worry
00:14:48.000 about The Primitives underlying it
00:14:51.019 a surprising benefit is going to be
00:14:54.060 optimization
00:14:55.500 so we said that CU can sometimes be
00:14:57.540 slower than Ruby code why is that this
00:15:00.360 is from the same key routine on hash
00:15:03.139 what we have here is that in the middle
00:15:05.399 it compares the value against the
00:15:07.139 expected value so we do RB equal
00:15:09.899 now that can that RB equal routine has
00:15:12.180 to be able to handle comparing anything
00:15:13.800 for equality because it's just static
00:15:16.019 compiled C code
00:15:18.000 if we write it in Ruby instead we can
00:15:19.980 use the same techniques we use for
00:15:21.300 optimizing Ruby such as specializing for
00:15:23.459 this case I'm comparing two strings and
00:15:25.740 that's what I'll expect and I'll
00:15:26.880 optimize for that and I'll generate
00:15:28.500 special machine code for it so it can
00:15:30.240 end up being faster doing this stuff in
00:15:31.860 Ruby than it would be to do it in C
00:15:35.880 there are some disadvantages of Ruby
00:15:37.800 call though it's not a an obvious choice
00:15:40.920 that's got no downsides it does have
00:15:42.839 some disadvantages the first one is past
00:15:45.420 time we have to pass all this Ruby code
00:15:47.880 at startup so trophy has 2 000 methods
00:15:50.459 written in Ruby that means we have to
00:15:52.260 load those two thyroid methods Into The
00:15:54.600 Interpreter adding on to the bulk of
00:15:56.880 your application code
00:15:58.620 we said it's better for optimization but
00:16:00.660 that's only when the optimizations have
00:16:02.339 had time to run
00:16:03.959 so these optimizations take time to
00:16:05.880 apply while your application is running
00:16:08.040 and that take that means that it's
00:16:09.660 slower to start with before it gets
00:16:11.760 faster over time
00:16:13.320 people have already had to do things
00:16:15.180 like disable equals gems that turns off
00:16:17.820 the support for ruby gems for command
00:16:20.100 line tools to reduce startup time and
00:16:22.440 this would make it much worse so you may
00:16:24.300 be familiar by a tool called Ruby format
00:16:27.420 by Fable that they had to disable ruby
00:16:30.899 gems in order to get faster startup time
00:16:33.000 to make sure the the command line
00:16:34.980 interface wasn't too slow to be useful
00:16:37.019 and this would make it much worse
00:16:40.199 however I think we can mitigate this
00:16:42.600 MRI embeds the yarv byte code into the
00:16:46.440 executable as data and executable it
00:16:48.959 doesn't actually pass it the parser
00:16:50.579 doesn't have to run it can simply load
00:16:52.380 that yav bytecode
00:16:53.940 trophobi goes even further it embeds the
00:16:56.940 objects generated by passing the code
00:16:59.940 into the executable so again it's not
00:17:02.040 passing it this is really effective at
00:17:04.260 mitigating it probably can actually
00:17:06.360 start up more quickly than MRI in some
00:17:08.760 situations
00:17:10.319 um due to this and Benoit de los has a
00:17:12.059 blog post about how this is possible so
00:17:14.339 again this is actually
00:17:15.439 counterintuitively you think Ruby code
00:17:17.400 would be slower it's not you think it'll
00:17:18.839 be slower to start up actually it could
00:17:20.280 be faster
00:17:22.860 another disadvantage is memory Ruby code
00:17:26.160 although it's more compact on the screen
00:17:28.020 is bigger in memory than compiled C code
00:17:30.960 which is really compact
00:17:32.700 the profiles inlining and splitting
00:17:35.640 that's a technique I'll talk a little
00:17:36.960 bit about later and things that make
00:17:38.640 Ruby code faster also take up more
00:17:41.580 memory and ends up being potentially
00:17:43.740 quite a lot more memory
00:17:45.600 and the optimizations we say that that
00:17:47.700 we get out of this also take memory
00:17:50.400 memory to run so the jit compiler things
00:17:52.620 like that that all takes memory and all
00:17:54.360 adds up to being quite a lot of memory
00:17:56.940 can we mitigate this ah actually I don't
00:17:59.580 really have any great ideas about how to
00:18:01.260 mitigate that does anybody else I'm open
00:18:03.840 to ideas it's an unsolved problem at
00:18:06.299 least it's per process not per user
00:18:08.400 right if you had a big VM instance the
00:18:11.700 cost is paid once so if you can squeeze
00:18:13.679 more users onto it you can amortize that
00:18:16.500 cost of that memory
00:18:19.620 there's an alternative we can use
00:18:21.240 instead that trophy Ruby uses to run
00:18:23.160 Legacy C code and that's so long so long
00:18:26.280 is an interpreter for C code
00:18:28.799 now that sounds really counterintuitive
00:18:30.660 isn't C A compiled language there's no
00:18:33.360 reason to divide languages between
00:18:34.799 compiled and interpreted you can
00:18:36.840 interpret any language and you can parl
00:18:39.120 any language with varying degrees of
00:18:41.039 success so sulong is a c interpreter and
00:18:43.860 it just in time compiles your C code
00:18:45.539 Chopper view uses to run C extensions it
00:18:49.260 requires some truly heroic work to
00:18:51.539 restore the performance of native C code
00:18:53.220 so it's very slow to start up but does
00:18:55.740 mean we can optimize C code like that RB
00:18:59.160 equal call so there is one alternative
00:19:01.140 but it's pretty heroic to make it work
00:19:05.539 here's a practical demonstration of how
00:19:07.679 some of this stuff can work
00:19:10.020 so I've got a routine here called Foo it
00:19:12.600 takes a hash and a value and it uses
00:19:14.820 that key routine we saw earlier so it
00:19:16.799 does hash dot key passing the value
00:19:19.799 I've got a hash which contains a to 14
00:19:22.559 so I'm going to look up 14 I'm expecting
00:19:24.780 to get the symbol a back and then I have
00:19:26.820 a loop I've run a loop in order to
00:19:28.799 trigger just in time optimization and
00:19:30.720 compilation and I just call it with the
00:19:32.760 hash and with 14.
00:19:35.660 what I can do is I can ask truff Ruby to
00:19:38.760 explain to me how it's optimizing this
00:19:41.280 and why and we can see the benefits here
00:19:43.559 so what I've asked it to do is explain
00:19:45.000 to me what is it inlining in this case
00:19:46.980 inlining is taking one method and
00:19:48.900 inserting it into another dynamically so
00:19:51.179 that you get a single method containing
00:19:53.039 all your code and it can be all
00:19:54.480 optimized together what this tells me is
00:19:56.760 it's starting to look at food in line
00:19:58.919 stuff and then inlines hash key why can
00:20:02.580 it inline that method because it's just
00:20:04.320 more Ruby source code right the
00:20:06.539 optimizations we wrote to teach the
00:20:08.160 compiler about a how to inline Ruby code
00:20:10.860 into other Ruby code just work now for
00:20:13.260 the core Library it's no longer a
00:20:14.880 barrier to optimization it's no longer a
00:20:17.039 back box but then it keeps in lighting
00:20:19.320 we said we had that primitive each pair
00:20:21.720 and it's inline that as well how's it
00:20:24.000 able to do that that's a primitive not
00:20:25.740 Ruby code well because there's a smaller
00:20:28.320 number of Primitives now we can teach
00:20:30.360 the compiler individually about these
00:20:32.280 Primitives and about how to optimize
00:20:34.320 those how to inline those this could
00:20:36.780 work in yjet and other systems as well
00:20:39.480 and then actually it goes even further
00:20:41.760 so each pair remember that primitive it
00:20:43.980 takes a block a block of Ruby code and
00:20:46.200 it can inline from that back into the
00:20:47.640 Ruby code
00:20:48.660 so we get the benefits of being able to
00:20:50.340 optimize the Ruby code and because
00:20:52.080 there's smaller number of Primitives we
00:20:53.940 can optimize that as well
00:20:56.820 we can teach to compile about it and we
00:20:58.679 get the whole thing optimized into one
00:21:00.299 and this is possible because we've
00:21:01.860 written the the core Library into Ruby
00:21:04.559 so the benefits aren't just
00:21:05.580 understandability we can actually show
00:21:07.440 we can get better performance out of
00:21:08.820 this for the longer term
00:21:12.120 I'm a big fan of explaining what
00:21:13.620 compilers are doing using a data
00:21:15.120 structure called a graph so I can tell
00:21:17.100 the compiler explain to me how you
00:21:19.740 understand this program at a low level
00:21:22.559 by telling me about your data structures
00:21:24.900 and this is a graph data structure it's
00:21:26.700 a a flow chart basically of your program
00:21:28.799 and all the operations I don't go into a
00:21:31.080 huge amount of detail on the specifics
00:21:32.880 because it's fairly complicated but I'll
00:21:34.440 just zoom in
00:21:36.780 what we have here is the so the Red
00:21:38.880 Arrows represent the control flow in the
00:21:40.860 program so from one operation to the
00:21:42.659 next like going from one statement to a
00:21:44.700 next and then the green arrows represent
00:21:47.280 the data flow so how the data flows
00:21:49.020 through the program and what we have
00:21:50.640 here is we're showing that the return
00:21:52.140 value flows from the load indexed at the
00:21:55.260 top the load index is loading fraught
00:21:57.179 the value from the hash so what this is
00:21:59.520 showing us here is we've achieved taking
00:22:01.500 all that code through user code
00:22:04.500 the um the core Library written in Ruby
00:22:07.200 and the Primitive implemented in a low
00:22:09.179 level we've taken that combined it all
00:22:11.580 into one single thing that Tower we had
00:22:14.760 of different types of Ruby code we've
00:22:16.679 collapsed it into one we've been able to
00:22:18.659 compile it all together optimize it
00:22:20.400 together into something really low level
00:22:21.659 just reading from the hash to get the
00:22:24.000 value out of it and that's a fantastic
00:22:25.559 achievement and it's possible because of
00:22:27.539 the core library is written in Ruby it
00:22:29.460 wouldn't be possible if we had to teach
00:22:31.140 the compiler about every single
00:22:32.760 primitive like key because there's not
00:22:35.760 enough time in the world to teach and
00:22:37.500 compile about all of those this makes it
00:22:39.299 manageable
00:22:41.580 this is a potential Way Forward I think
00:22:43.559 for Ruby we can move the majority of
00:22:46.020 core into Ruby we'd leave behind a
00:22:49.679 smaller better defined set of Primitives
00:22:52.200 would create a new version of Ruby that
00:22:54.720 is like core Ruby that we could
00:22:56.580 understand much better and other
00:22:58.260 programming languages do this at the
00:22:59.460 moment for example Haskell has something
00:23:00.960 called Haskell core which is much
00:23:03.419 smaller and simpler everything can be
00:23:05.340 expressed in it but it's small enough to
00:23:07.320 get inside your head and reason about
00:23:09.000 and to write tools to reason about as
00:23:10.919 well
00:23:11.880 we could use truffle Ruby's substantial
00:23:14.039 core as a starting point we can teach
00:23:16.860 our compilers and our static analysis
00:23:18.840 tools like our typing tools our Robo
00:23:21.360 companies like that more about these
00:23:23.520 Primitives and then let it understand
00:23:25.919 the rest of the Ruby code as it would
00:23:27.720 your user code
00:23:29.820 this would give us a smaller more
00:23:31.799 manageable more analyzable Ruby but it
00:23:34.500 works exactly the same now as that
00:23:35.760 before application developers if you're
00:23:37.860 just worried about using Ruby you don't
00:23:39.360 need to worry about it it would work the
00:23:40.559 same as before
00:23:41.760 does it literally need to be a gem I've
00:23:43.679 pitched this as Ruby's core gem I don't
00:23:45.780 think it literally needs to be a gem it
00:23:47.159 could simply be bundled in the standard
00:23:48.539 version of Ruby but potentially we can
00:23:50.280 make it so you can install a newer
00:23:51.900 version if you'd like to
00:23:54.360 I want to give an attribution here to
00:23:55.799 rabinius a lot of traffic Ruby's core
00:23:57.900 Library originated from rabinis that was
00:24:00.000 an earlier implementation of Ruby but it
00:24:03.059 has been maintained by us meaning the
00:24:04.919 trough Ruby team for a few years now so
00:24:07.020 this is building on excellent work by
00:24:08.280 Evan Phoenix by and shirai and many
00:24:10.440 other people as well
00:24:13.080 he's an even more radical idea right
00:24:16.740 Ruby has this extension API that people
00:24:19.140 use to write C extensions and obviously
00:24:21.059 it is written in C
00:24:22.799 could we Implement those C extension
00:24:24.960 Library routines as Ruby as well traffic
00:24:27.600 does this today so there's a core c
00:24:30.179 Library routine called RB
00:24:32.820 um Str new Frozen and Shopper Ruby
00:24:35.820 implements that in Ruby and what it does
00:24:38.039 is the Ruby stub on the bottom right
00:24:39.720 that simply calls back into Ruby to run
00:24:41.820 that routine we get the same benefits
00:24:43.799 here rbstr new Frozen you can look at
00:24:46.500 the C code to understand what it does
00:24:48.000 that's quite complicated you could try
00:24:50.039 and read some of the documentation that
00:24:51.960 documentation isn't always great here
00:24:53.700 you can simply read it what does it do
00:24:55.200 if the value is already Frozen it
00:24:57.000 returns it if not it duplicates and
00:24:59.340 freezes that new copy of it that's great
00:25:01.559 we can understand it again and it
00:25:02.940 optimizes in the same way again so we
00:25:04.740 could go even further potentially
00:25:07.679 what conclusions can we draw from this
00:25:10.919 uh we've got this core of this Tower of
00:25:14.220 different parts of the libraries of Ruby
00:25:15.960 we can split the core Library into core
00:25:18.000 and we write it in Ruby and a smaller
00:25:20.039 set of Primitives on top
00:25:22.919 part on the right hand side would become
00:25:24.659 the new shipped version of Ruby and
00:25:27.120 people who worry about Ruby
00:25:28.080 implementation would just worry about
00:25:29.220 those Primitives not the core on top of
00:25:31.260 it and someone else could worry about
00:25:32.700 the core and what we want to do to
00:25:34.080 expand that separately
00:25:37.440 is it a good idea yes there's tons of
00:25:39.779 benefits it's more understandable it's
00:25:42.240 more shareable less work for the
00:25:43.860 different Ruby implementations to do
00:25:45.179 it's more debuggable you can use your
00:25:46.919 standard tools it's more optimizable by
00:25:49.620 new tools like yjit
00:25:52.260 um it's more analyzable so tools which
00:25:54.360 look at typing and look for bugs and
00:25:56.460 look for other problems can understand
00:25:57.779 it more because it's more compact there
00:26:00.120 are some downsides it might have an
00:26:01.799 impact on Startup time we think we've
00:26:03.480 got a solution for that it may have an
00:26:05.220 impact on memory usage I'm not so sure
00:26:07.260 we've got a great solution for that so
00:26:09.120 there's still some open questions and
00:26:10.980 it's surely worth trying we've already
00:26:12.900 got a core in shopruby we could start
00:26:14.820 trying this out with it's going to
00:26:16.740 become more relevant as MRI gets more
00:26:18.779 sophisticated with things like widget I
00:26:21.240 think this could be a future of Ruby to
00:26:22.799 build it into a direction that is higher
00:26:25.200 performance has better tooling is better
00:26:27.600 able to develop and adapt over time with
00:26:29.700 the a better core Library
00:26:32.460 I want to give you some other things to
00:26:33.779 check out if you're interested in this
00:26:34.860 kind of work
00:26:36.059 so trough Ruby is where this core
00:26:37.559 library is implemented at the moment uh
00:26:39.179 sorry for the young the long URL but I
00:26:41.820 encourage you to go and look at the core
00:26:43.080 library in Clinton Ruby it's just Ruby
00:26:45.120 code we all know Ruby here you can all
00:26:46.980 read it and understand it
00:26:48.679 growlvm.org Ruby is the official truffle
00:26:51.240 Ruby website and my work's all at
00:26:53.460 christine.com trophy Ruby
00:26:57.299 a lot of this optimization I've said is
00:26:58.919 possible is down to a technique called
00:27:00.659 splitting this is a really powerful
00:27:02.940 sophisticated optimization that we're
00:27:04.799 bringing to Ruby and trough Ruby and
00:27:06.840 Benoit de los has a talk today at 3 P.M
00:27:09.299 in room a and I encourage you to go and
00:27:11.279 watch
00:27:12.900 people are doing academic research on
00:27:15.120 top of some of these ideas this is a
00:27:16.980 paper that Sophie caliber has just
00:27:18.360 published it's cool what she does is she
00:27:20.700 analyzes what cool sites and calls look
00:27:23.640 like in Ruby and she shows how powerful
00:27:26.159 our optimizations in geography are for
00:27:28.799 optimizing Ruby code and that would
00:27:30.539 apply to the core Library as well so I'd
00:27:32.580 encourage you to go and Google this
00:27:34.080 paper and have a read of it
00:27:36.360 I said something about rabbinius
00:27:38.400 um rubinius was Ruby implemented in Ruby
00:27:41.279 as well as the core library of Ruby at
00:27:43.620 one point they were writing much of the
00:27:44.940 VM in Ruby did you know that their
00:27:46.860 garbage collector was at one point
00:27:48.120 written in Ruby you may know the term
00:27:49.799 Mark sweep as a type of garbage
00:27:51.779 collector this is the sweep routine from
00:27:53.940 rabinius implemented in Ruby what it
00:27:56.159 says is for each object if the object
00:27:58.679 has been marked by the mark phase then
00:28:01.140 leave it alone otherwise deallocated and
00:28:03.720 again isn't that nice and understandable
00:28:05.100 if you go to rubycompilers.com business
00:28:07.980 you can read about the history of
00:28:09.299 rubinius and how it did this and where
00:28:10.740 its core Library came from
00:28:12.779 and finally I want to publicize another
00:28:14.520 side projects mine it's called the Ruby
00:28:16.400 bibliography if you're a fan of reading
00:28:18.960 about Ruby research and stuff like that
00:28:20.520 this page lists all the Ruby research
00:28:22.799 that is out there
00:28:24.539 that's into my talk thank you very much
00:28:35.580 I think we've got two minutes for
00:28:36.960 questions if anyone would like to ask
00:28:38.279 one
00:28:39.960 so the question is how do we decide what
00:28:41.640 premises do we have and um there's
00:28:44.039 trade-offs either way and how would the
00:28:46.080 implementations agree which Primitives
00:28:47.760 that's a great question and that's
00:28:49.380 another open issue we'd have to work on
00:28:51.960 to resolve what we're doing trough Ruby
00:28:54.240 is we Implement stuff in Ruby by default
00:28:56.100 and then we create a primitive instead
00:28:58.200 if we've got some compelling reason too
00:29:00.600 but the the compelling reasons are
00:29:02.940 different based on how you implement
00:29:04.380 Ruby so trough Ruby for example
00:29:06.960 um has different ways of implementing
00:29:09.120 hashes based on how you're using them
00:29:10.919 and therefore we have a different set of
00:29:12.600 Primitives that would make sense if you
00:29:14.220 had a more simple hash like MRI does so
00:29:17.460 I don't have a great solution to that
00:29:18.720 that is another open problem that would
00:29:20.820 have to to work and resolve but the
00:29:22.980 great thing about we've already got a
00:29:24.299 working core Library we could start from
00:29:26.159 that point of it working and then adjust
00:29:27.659 those sort of premises over time
00:29:29.820 yeah all right I'll leave that come and
00:29:31.679 find me if you've got any questions
00:29:32.580 afterwards thank you very much