00:00:00.000
ready for takeoff
00:00:17.340
right thanks for coming everyone I'll
00:00:19.080
get cracking so I want to talk to you
00:00:20.520
today about an idea that's been around
00:00:23.100
for a while which we can call Ruby's
00:00:25.140
core gem
00:00:26.460
I'm a big fan of giving the the big idea
00:00:28.980
up front to get straight into what we're
00:00:31.080
talking about achieving
00:00:32.700
so Ruby has a core library right it's
00:00:34.500
those those methods like those classes
00:00:36.300
like array hash string that sort of
00:00:38.219
thing and currently this is implemented
00:00:40.079
in C in the standard version of Ruby so
00:00:43.559
this is the code to implement uh loop do
00:00:45.899
for example which is a core Library
00:00:47.280
routine and you can see it's written in
00:00:49.140
C has to be broken apart into a couple
00:00:51.960
of methods because that's the way things
00:00:53.879
like rescue work in the C version of
00:00:55.860
Ruby and this isn't very readable right
00:00:57.899
this is hard for us to understand as
00:00:59.640
application programs turns out it's also
00:01:01.860
a bit hard for the Ruby VM to understand
00:01:03.899
it to do anything meaningful with
00:01:06.299
so the big idea is let's rewrite this
00:01:09.540
into Ruby using the language we use for
00:01:12.119
our applications let's use it for the
00:01:14.280
core Library as well and you can already
00:01:16.260
see some benefits here this is much more
00:01:18.240
understandable if you want to know what
00:01:19.979
Loop do does you can simply read this
00:01:22.140
code you can see it runs a while loop
00:01:24.119
and it yields the block each iteration
00:01:26.040
you can maybe even see some things you
00:01:27.840
didn't know before like did you know if
00:01:29.400
you call Loop without a block it gives
00:01:31.560
you an infinite enumerator where you can
00:01:33.720
see that from the source code even if
00:01:35.820
you couldn't see that from the C code as
00:01:37.259
easily we can also see that can you
00:01:39.479
break out of a loop do yes you can you
00:01:41.280
can use the stop iteration exception and
00:01:43.860
that's clear again from the Ruby code
00:01:45.180
and it turns out this has many benefits
00:01:47.159
not only for understandability but also
00:01:49.320
for how the VM can optimize it how you
00:01:52.020
can use tooling on it all sorts of
00:01:54.000
things and it also turns out to be a
00:01:56.100
great way to talk about the potential
00:01:57.659
future for Ruby and how we can make Ruby
00:01:59.880
much better in the longer term
00:02:02.399
bit of context about me and my work I'm
00:02:04.320
from Cheshire in the UK that's Cheshire
00:02:06.600
as in the cat I've got a PhD in
00:02:08.399
compiling Ruby I founded truffle Ruby
00:02:10.920
which is an alternative implementation
00:02:12.300
of Ruby I'm using as an example for some
00:02:14.520
of the work I talked about today I was
00:02:16.440
formerly at Oracle Labs I'm now at
00:02:17.940
Shopify uh which is a really supportive
00:02:19.800
place with great people I'm interested
00:02:22.140
in specifically in optimizing idiomatic
00:02:24.420
Ruby code so I like talking about
00:02:25.739
optimizing Ruby as it is rather than
00:02:28.140
transforming Ruby to be something else
00:02:29.760
in order to be optimized I lead a
00:02:32.459
British Cavalry Squadron in my spare
00:02:33.780
time I'm interested in meeting other
00:02:35.160
Ruby reservists and Veterans if you're
00:02:37.500
out there
00:02:39.239
so one of the Core Concepts I want to
00:02:40.980
talk about is Ruby's Tower of libraries
00:02:43.019
we can talk about Ruby's different sort
00:02:44.459
of libraries and where they sit in a
00:02:46.019
tower
00:02:47.220
so we have the language the core Ruby
00:02:50.280
language that we have in the Ruby
00:02:51.720
interpreter we can talk about the core
00:02:53.220
libraries being one level above that and
00:02:55.800
then above that we have the standard
00:02:57.420
Library which is things like Json that
00:02:59.280
you can require without installing
00:03:00.599
anything on top of that we can talk
00:03:02.099
about gems and user code
00:03:04.200
so the bottom we can also talk about
00:03:05.940
this being Ruby code and C code and the
00:03:08.760
further down the stack you are more
00:03:10.680
stuff is written in low level C and the
00:03:12.900
higher you get it's more written in Ruby
00:03:15.120
and currently the crawl library is
00:03:16.440
written in C almost entirely and that's
00:03:18.900
what we're going to talk about today
00:03:20.819
so at the bottom there's the language
00:03:22.379
this is a very small number of things
00:03:24.720
provided by the language Ruby it's the
00:03:26.459
Ruby language itself so that's things
00:03:28.260
like classes modules methods uh method
00:03:31.980
calls and some control structures like
00:03:33.900
if while case and or things like that
00:03:36.959
but it's actually a really small subset
00:03:39.120
very little else is provided by the
00:03:41.220
language so in the code example here
00:03:43.080
we've got if and an and a method call
00:03:45.299
and that's really all the language
00:03:46.440
provides
00:03:48.299
the next level up we have the core
00:03:49.920
Library this is things like array hash
00:03:52.560
but also lower level things like numbers
00:03:54.599
and strings numbers and strings are
00:03:56.519
actually part of the core Library
00:03:57.840
they're not really provided by the
00:03:59.459
language
00:04:00.840
great thing about uh and also some
00:04:02.760
control structures so things like Loop
00:04:04.560
and um array each and hash each these
00:04:07.799
are provided by the core Library again
00:04:09.060
even though they're like control flow
00:04:10.200
structures the core library is
00:04:12.180
automatically available you don't have
00:04:13.439
to require it it's just magically always
00:04:15.120
there it's implemented as a c extension
00:04:18.720
um it's a c extension that's just built
00:04:20.400
into Ruby but it's the same API it uses
00:04:22.199
and there's around
00:04:23.960
2250 methods so it's really large
00:04:26.940
there's a lot in Ruby's batteries
00:04:29.340
included core Library so if we have
00:04:31.320
something like a hash and we do dot
00:04:33.620
values.sort.first.add values sort first
00:04:36.540
and add are all provided by the core
00:04:38.160
Library
00:04:39.540
on top of that is the standard Library
00:04:41.280
this needs to be required with some
00:04:43.620
exceptions but it's available without
00:04:45.120
installing anything so it's just there
00:04:46.800
it's part of the Ruby distribution still
00:04:48.600
we won't worry too much about the
00:04:50.340
standard library in this talk it's not
00:04:51.600
really relevant
00:04:52.919
um the code example here shows Json for
00:04:55.080
example so json.generate that's a
00:04:57.060
standard Library feature something
00:04:58.440
slightly interesting about it though is
00:04:59.699
it's being lifted in the tower so over
00:05:01.860
time the standard library is becoming a
00:05:04.259
gem it's being gemified and made
00:05:06.840
something you can install separately if
00:05:08.520
you wanted to
00:05:10.380
on top of this you have your gems and
00:05:12.660
your user code this is Ruby code that's
00:05:14.580
loaded at runtime from outside The
00:05:16.680
Interpreter outside the Ruby
00:05:17.759
distribution it can be from a gem or it
00:05:20.040
can be from coding your repo that makes
00:05:21.900
a big difference to us as programmers we
00:05:23.639
think about gems as being something
00:05:24.780
separate from user code but for the VM
00:05:27.720
it doesn't really make any difference
00:05:28.740
it's all code loaded from disk at
00:05:31.080
runtime
00:05:32.880
um so for example some rails code and a
00:05:35.220
controller that's all user code and
00:05:37.380
sometimes gems and user code are written
00:05:39.000
in C as well right a gem like nokugiri
00:05:41.699
or like openssl there's a lot of C code
00:05:44.280
there so it can still include C code but
00:05:46.199
it's required at runtime
00:05:49.259
there's many great things about core as
00:05:51.300
it is it's always available it can't go
00:05:53.520
wrong you can't end up with the wrong
00:05:55.620
version of core or find yourself without
00:05:57.960
core installed that's great and it can
00:06:00.419
be used to build bigger things because
00:06:01.740
it's always available so ruby gems for
00:06:03.539
example requiring gem stuff like that
00:06:05.520
that's built on top of core it's
00:06:07.680
available instantly as soon as The
00:06:09.479
Interpreter starts it's just there ready
00:06:10.979
to go which is great for application
00:06:12.419
boot time it can use VM internals to do
00:06:15.479
things you can't do in Ruby so a
00:06:17.820
low-level thing like file i o that can't
00:06:20.759
be done in pure Ruby but it can be
00:06:22.740
implemented as the core library with a
00:06:24.300
file object
00:06:26.160
Pilots can be taught about it and we'll
00:06:27.960
explain more about that later because
00:06:29.280
that's a key point
00:06:30.960
but there's bad things about Coreys is
00:06:32.819
it's far too big 2 000 some methods
00:06:36.000
that's too many methods for us to
00:06:38.220
understand and to work with as VM
00:06:40.800
implementers
00:06:42.060
there's no Ruby code that you can read
00:06:43.620
so you can't go and see what a method
00:06:45.300
does with your knowledge of how to read
00:06:47.400
Ruby and understand what the core
00:06:48.960
Library does you're off into codeland
00:06:51.000
and it's not always the most
00:06:52.380
understandable code even if you
00:06:53.759
understand C there's no Ruby code you
00:06:56.100
can debug you can't step into it and
00:06:57.780
understand what it's doing there's no
00:06:59.639
Ruby code to use profiling tools or
00:07:01.680
coverage tools it's all C extension code
00:07:05.759
and the bad thing about C extension code
00:07:07.860
is did you know C code can be worse for
00:07:09.960
performance than Ruby code we'll explain
00:07:12.120
why later and this problem gets worse as
00:07:14.819
Ruby gets more sophisticated over time
00:07:16.860
with things like yjet
00:07:20.160
can we get the best of the Both Worlds
00:07:22.080
so the best of the advantages without
00:07:23.520
some of the disadvantages
00:07:26.460
what we're talking about doing is taking
00:07:27.900
that core Library part of the Tower and
00:07:30.300
splitting it up into two parts one which
00:07:32.940
we'll call the new core Library
00:07:34.259
implemented in Ruby and that'll be
00:07:36.419
sitting on top of a smaller set of
00:07:38.160
Primitives which are implemented as they
00:07:40.259
currently are in C or in Java in
00:07:43.020
something like jruby or truffle Ruby
00:07:44.580
split it up into core and Primitives
00:07:48.780
this should hopefully give us the best
00:07:50.280
of both worlds we'd have the bulk of our
00:07:52.380
code in Ruby where Ruby programmers we
00:07:54.960
like seeing code and Ruby we can
00:07:56.699
understand it this means it can read
00:07:58.740
understood debugged by anyone it also
00:08:01.620
means it can be better optimized by the
00:08:03.060
VM we're T we're building things like
00:08:05.280
yjit to optimize Ruby code so the more
00:08:07.979
Ruby code we have it can optimize the
00:08:09.960
better
00:08:11.340
we'd have a small set of underlying
00:08:13.560
Primitives implemented in the same way
00:08:15.479
as C extensions and then we can teach
00:08:17.639
the compiler specifically about them
00:08:19.680
because there's a smaller set so we can
00:08:21.960
make the VM completely aware of this
00:08:23.759
small set of Primitives so it can work
00:08:25.680
with them and understand exactly what
00:08:27.180
they do
00:08:29.960
Rubio implementations already do this to
00:08:33.060
some extent
00:08:34.680
MRI also known as C Ruby does it just a
00:08:37.620
tiny bit at the moment JB does it a bit
00:08:40.320
more and truffle Ruby does it a bit more
00:08:42.479
still we'll talk about rabbinius later
00:08:44.520
because there's some history here with
00:08:46.080
rubinius pioneering this technique
00:08:49.920
so let's talk about how MRI or C Ruby
00:08:52.380
does it to a little extent today
00:08:55.500
so this is from the the MRI source code
00:08:58.500
it's a very simple method you might have
00:09:00.360
heard of called tap tap allows you to
00:09:02.640
run a block with a value and then return
00:09:04.440
the value you can use it to inject into
00:09:06.779
like a pipeline of method calls to for
00:09:09.540
example print out an intermediate value
00:09:11.100
from a chain of method call or something
00:09:12.660
like that it's very simple and all it
00:09:14.700
does is it yields its value and then it
00:09:17.160
returns its value so you can keep using
00:09:18.839
it so we can express this in pure Ruby
00:09:21.839
and this is what MRI already does
00:09:23.399
there's no need for this speech written
00:09:24.839
in C so we have the kernel module
00:09:26.760
written as pure Ruby code and we have
00:09:28.440
tap and it just yields self and then it
00:09:30.779
returns self we can understand that
00:09:32.519
written in Ruby and this is actual MRI
00:09:34.560
code today so part of Ruby's core is
00:09:36.839
written in Ruby
00:09:39.000
a more complicated example is something
00:09:40.920
like frozen so you can ask an object if
00:09:43.200
it's frozen that's actually a method on
00:09:44.760
kernel which all objects include so how
00:09:47.580
can we Implement that because of how can
00:09:49.440
you read the Frozen status if you're
00:09:51.120
trying to implement the method call to
00:09:52.740
read the Frozen status
00:09:54.420
what MRI includes is something that lets
00:09:57.300
you include C code in your Ruby code and
00:10:00.540
you can use this to write the lower
00:10:01.620
level stuff you can't do in pure Ruby so
00:10:04.200
what we're saying here is I want to run
00:10:06.240
the C code to call the C extension
00:10:08.700
method RB object Frozen p p being like a
00:10:12.540
question mark in Ruby so this means that
00:10:14.519
we can Implement more stuff in Ruby
00:10:16.080
because we can call into the C route the
00:10:18.180
C runtime code to do it
00:10:22.200
and then that lower level primitive is
00:10:24.420
then implemented in see Itself by
00:10:26.580
directly accessing the objects into that
00:10:28.200
and the great thing is this method can
00:10:29.940
be very small with C function it's only
00:10:31.740
one line it just gives us that tiny
00:10:33.180
little bit information this is much
00:10:35.100
better C code to have left in our Ruby
00:10:37.260
VM after we've moved everything else out
00:10:39.000
to Ruby
00:10:41.160
MRI has
00:10:42.800
2194 core methods implemented in C so
00:10:46.079
the vast vast majority of stood in C
00:10:48.300
it's got 64 core Primitives which is
00:10:51.120
like a special kind of method
00:10:52.380
implemented in C distinction doesn't
00:10:53.940
matter too much and it's got only 31
00:10:56.040
instances of that inline C so it's a
00:10:58.200
great idea it's not being used much at
00:11:00.420
the moment it's got seven special
00:11:03.240
optimized core methods again the
00:11:05.160
distinction doesn't matter too much and
00:11:06.779
it only has 101 core methods implemented
00:11:09.300
in Ruby so it's only taking baby steps
00:11:11.459
towards re-implementing stuff in Ruby
00:11:13.140
when we talk about some of the
00:11:14.399
advantages and disadvantages we'll see
00:11:16.079
why that probably is right for the
00:11:18.060
moment
00:11:19.800
trophobi takes it quite a lot further
00:11:23.279
so trophobi has the same methods that
00:11:25.260
can Bloomington is in pure Ruby right
00:11:27.420
now so it has exactly the same tap
00:11:29.279
method coming from MRI and we can start
00:11:32.279
to see one of the benefits here and that
00:11:33.600
this code is exactly the same as MRI so
00:11:35.640
perhaps MRI jruby and chop Ruby could
00:11:39.000
all start to share this code a bit
00:11:42.180
but trophy takes it much further so the
00:11:44.519
hash class for example has a couple of
00:11:46.440
routines you might know about called key
00:11:48.120
that gives you the key for a value it's
00:11:50.279
like the opposite of looking up in a
00:11:52.260
hash right it goes from the value back
00:11:54.660
to the key rather than the key to the
00:11:56.160
value I went 2A gives you an array of
00:11:59.160
tuples for each key value in the the
00:12:01.980
hash
00:12:03.000
so these can both be implemented on top
00:12:04.860
of a primitive called each pair so to
00:12:07.800
get the key uh to implement key we can
00:12:10.260
do each pair and then simply say if the
00:12:11.940
value is what's expected return the key
00:12:13.500
otherwise return nil to do 2A we can
00:12:16.320
create an array and we can simply push
00:12:17.700
each key value into it by using each
00:12:19.560
pair so we Implement two records in Ruby
00:12:22.740
nice and understandably simple compact
00:12:25.800
code and then implements a single
00:12:27.480
primitive each pair that does the heavy
00:12:29.519
lifting
00:12:30.540
so one primitive gives us two Ruby
00:12:33.120
methods
00:12:34.740
in trouble would be there's 611 core
00:12:37.380
methods implemented in Java so we're
00:12:39.300
using Java rather than C and 353 core
00:12:42.420
Primitives implemented in Java again
00:12:44.160
it's just a slightly different type of
00:12:45.420
core method but we have 2 386 or core
00:12:49.260
methods implemented in Ruby you may be
00:12:51.480
wondering why this doesn't add up to
00:12:52.680
2250 it's because there's helper methods
00:12:55.320
and stuff like that makes it quite fuzzy
00:12:56.820
but the point is the the majority of
00:12:58.800
stuff is now implemented in Ruby instead
00:13:01.260
of in Java and we apply some some
00:13:03.839
techniques to think about do we want
00:13:05.100
something in Java or do we want it in
00:13:06.540
Ruby to divide them up
00:13:08.940
JB does this technique as well I'm not
00:13:11.100
going to talk too much about Joey
00:13:12.180
because I don't want to speak on their
00:13:13.079
behalf too much basically a great
00:13:14.940
example from jruby integer times so this
00:13:18.360
is a routine you might know about and
00:13:19.740
again this is a great simple
00:13:20.700
implementation it uses a while loop and
00:13:23.459
yield to implement the the times method
00:13:27.720
what the advantages of doing are core in
00:13:29.820
Ruby
00:13:31.079
one advantage is understandability you
00:13:33.959
can browse the Ruby code to understand
00:13:35.519
it if you know Ruby you can see go and
00:13:37.680
see how the core Library routine works
00:13:39.420
you can answer your own questions about
00:13:42.180
what core methods really do does this
00:13:44.880
method do this does it do that you can
00:13:46.620
try and read the documentation
00:13:47.660
documentation isn't always great if we
00:13:50.579
have the caller interested in Ruby it's
00:13:52.740
a single sort of Truth written in the
00:13:54.839
one language that has Ruby programmers
00:13:56.700
we all share and all understand
00:13:59.639
you can use your normal debugger
00:14:01.380
coverage and profiler tools it's no
00:14:03.660
longer a black box that's impenetrable
00:14:05.399
written in C and A really esoteric
00:14:07.620
version of C as well
00:14:09.899
another Advantage if we can share this
00:14:11.519
code MRI truffle Ruby jruby artichoke
00:14:15.300
whatever comes next
00:14:17.279
can all share the same core Library each
00:14:20.339
would Implement just a smaller set of
00:14:22.620
Primitives their own way so they can
00:14:24.480
still do things differently to make it
00:14:26.339
suit what they want to achieve but they
00:14:28.980
can share the core library on top on
00:14:30.540
modified VM people can focus on making
00:14:33.180
their Primitives work well while the
00:14:35.339
rest of the community worries about the
00:14:36.899
core Library making that work well it
00:14:38.880
would mean that people were more free to
00:14:40.199
make contributions to the core Library
00:14:42.000
based on what they know about from their
00:14:43.920
application developer that application
00:14:46.079
development and let the VM people worry
00:14:48.000
about The Primitives underlying it
00:14:51.019
a surprising benefit is going to be
00:14:54.060
optimization
00:14:55.500
so we said that CU can sometimes be
00:14:57.540
slower than Ruby code why is that this
00:15:00.360
is from the same key routine on hash
00:15:03.139
what we have here is that in the middle
00:15:05.399
it compares the value against the
00:15:07.139
expected value so we do RB equal
00:15:09.899
now that can that RB equal routine has
00:15:12.180
to be able to handle comparing anything
00:15:13.800
for equality because it's just static
00:15:16.019
compiled C code
00:15:18.000
if we write it in Ruby instead we can
00:15:19.980
use the same techniques we use for
00:15:21.300
optimizing Ruby such as specializing for
00:15:23.459
this case I'm comparing two strings and
00:15:25.740
that's what I'll expect and I'll
00:15:26.880
optimize for that and I'll generate
00:15:28.500
special machine code for it so it can
00:15:30.240
end up being faster doing this stuff in
00:15:31.860
Ruby than it would be to do it in C
00:15:35.880
there are some disadvantages of Ruby
00:15:37.800
call though it's not a an obvious choice
00:15:40.920
that's got no downsides it does have
00:15:42.839
some disadvantages the first one is past
00:15:45.420
time we have to pass all this Ruby code
00:15:47.880
at startup so trophy has 2 000 methods
00:15:50.459
written in Ruby that means we have to
00:15:52.260
load those two thyroid methods Into The
00:15:54.600
Interpreter adding on to the bulk of
00:15:56.880
your application code
00:15:58.620
we said it's better for optimization but
00:16:00.660
that's only when the optimizations have
00:16:02.339
had time to run
00:16:03.959
so these optimizations take time to
00:16:05.880
apply while your application is running
00:16:08.040
and that take that means that it's
00:16:09.660
slower to start with before it gets
00:16:11.760
faster over time
00:16:13.320
people have already had to do things
00:16:15.180
like disable equals gems that turns off
00:16:17.820
the support for ruby gems for command
00:16:20.100
line tools to reduce startup time and
00:16:22.440
this would make it much worse so you may
00:16:24.300
be familiar by a tool called Ruby format
00:16:27.420
by Fable that they had to disable ruby
00:16:30.899
gems in order to get faster startup time
00:16:33.000
to make sure the the command line
00:16:34.980
interface wasn't too slow to be useful
00:16:37.019
and this would make it much worse
00:16:40.199
however I think we can mitigate this
00:16:42.600
MRI embeds the yarv byte code into the
00:16:46.440
executable as data and executable it
00:16:48.959
doesn't actually pass it the parser
00:16:50.579
doesn't have to run it can simply load
00:16:52.380
that yav bytecode
00:16:53.940
trophobi goes even further it embeds the
00:16:56.940
objects generated by passing the code
00:16:59.940
into the executable so again it's not
00:17:02.040
passing it this is really effective at
00:17:04.260
mitigating it probably can actually
00:17:06.360
start up more quickly than MRI in some
00:17:08.760
situations
00:17:10.319
um due to this and Benoit de los has a
00:17:12.059
blog post about how this is possible so
00:17:14.339
again this is actually
00:17:15.439
counterintuitively you think Ruby code
00:17:17.400
would be slower it's not you think it'll
00:17:18.839
be slower to start up actually it could
00:17:20.280
be faster
00:17:22.860
another disadvantage is memory Ruby code
00:17:26.160
although it's more compact on the screen
00:17:28.020
is bigger in memory than compiled C code
00:17:30.960
which is really compact
00:17:32.700
the profiles inlining and splitting
00:17:35.640
that's a technique I'll talk a little
00:17:36.960
bit about later and things that make
00:17:38.640
Ruby code faster also take up more
00:17:41.580
memory and ends up being potentially
00:17:43.740
quite a lot more memory
00:17:45.600
and the optimizations we say that that
00:17:47.700
we get out of this also take memory
00:17:50.400
memory to run so the jit compiler things
00:17:52.620
like that that all takes memory and all
00:17:54.360
adds up to being quite a lot of memory
00:17:56.940
can we mitigate this ah actually I don't
00:17:59.580
really have any great ideas about how to
00:18:01.260
mitigate that does anybody else I'm open
00:18:03.840
to ideas it's an unsolved problem at
00:18:06.299
least it's per process not per user
00:18:08.400
right if you had a big VM instance the
00:18:11.700
cost is paid once so if you can squeeze
00:18:13.679
more users onto it you can amortize that
00:18:16.500
cost of that memory
00:18:19.620
there's an alternative we can use
00:18:21.240
instead that trophy Ruby uses to run
00:18:23.160
Legacy C code and that's so long so long
00:18:26.280
is an interpreter for C code
00:18:28.799
now that sounds really counterintuitive
00:18:30.660
isn't C A compiled language there's no
00:18:33.360
reason to divide languages between
00:18:34.799
compiled and interpreted you can
00:18:36.840
interpret any language and you can parl
00:18:39.120
any language with varying degrees of
00:18:41.039
success so sulong is a c interpreter and
00:18:43.860
it just in time compiles your C code
00:18:45.539
Chopper view uses to run C extensions it
00:18:49.260
requires some truly heroic work to
00:18:51.539
restore the performance of native C code
00:18:53.220
so it's very slow to start up but does
00:18:55.740
mean we can optimize C code like that RB
00:18:59.160
equal call so there is one alternative
00:19:01.140
but it's pretty heroic to make it work
00:19:05.539
here's a practical demonstration of how
00:19:07.679
some of this stuff can work
00:19:10.020
so I've got a routine here called Foo it
00:19:12.600
takes a hash and a value and it uses
00:19:14.820
that key routine we saw earlier so it
00:19:16.799
does hash dot key passing the value
00:19:19.799
I've got a hash which contains a to 14
00:19:22.559
so I'm going to look up 14 I'm expecting
00:19:24.780
to get the symbol a back and then I have
00:19:26.820
a loop I've run a loop in order to
00:19:28.799
trigger just in time optimization and
00:19:30.720
compilation and I just call it with the
00:19:32.760
hash and with 14.
00:19:35.660
what I can do is I can ask truff Ruby to
00:19:38.760
explain to me how it's optimizing this
00:19:41.280
and why and we can see the benefits here
00:19:43.559
so what I've asked it to do is explain
00:19:45.000
to me what is it inlining in this case
00:19:46.980
inlining is taking one method and
00:19:48.900
inserting it into another dynamically so
00:19:51.179
that you get a single method containing
00:19:53.039
all your code and it can be all
00:19:54.480
optimized together what this tells me is
00:19:56.760
it's starting to look at food in line
00:19:58.919
stuff and then inlines hash key why can
00:20:02.580
it inline that method because it's just
00:20:04.320
more Ruby source code right the
00:20:06.539
optimizations we wrote to teach the
00:20:08.160
compiler about a how to inline Ruby code
00:20:10.860
into other Ruby code just work now for
00:20:13.260
the core Library it's no longer a
00:20:14.880
barrier to optimization it's no longer a
00:20:17.039
back box but then it keeps in lighting
00:20:19.320
we said we had that primitive each pair
00:20:21.720
and it's inline that as well how's it
00:20:24.000
able to do that that's a primitive not
00:20:25.740
Ruby code well because there's a smaller
00:20:28.320
number of Primitives now we can teach
00:20:30.360
the compiler individually about these
00:20:32.280
Primitives and about how to optimize
00:20:34.320
those how to inline those this could
00:20:36.780
work in yjet and other systems as well
00:20:39.480
and then actually it goes even further
00:20:41.760
so each pair remember that primitive it
00:20:43.980
takes a block a block of Ruby code and
00:20:46.200
it can inline from that back into the
00:20:47.640
Ruby code
00:20:48.660
so we get the benefits of being able to
00:20:50.340
optimize the Ruby code and because
00:20:52.080
there's smaller number of Primitives we
00:20:53.940
can optimize that as well
00:20:56.820
we can teach to compile about it and we
00:20:58.679
get the whole thing optimized into one
00:21:00.299
and this is possible because we've
00:21:01.860
written the the core Library into Ruby
00:21:04.559
so the benefits aren't just
00:21:05.580
understandability we can actually show
00:21:07.440
we can get better performance out of
00:21:08.820
this for the longer term
00:21:12.120
I'm a big fan of explaining what
00:21:13.620
compilers are doing using a data
00:21:15.120
structure called a graph so I can tell
00:21:17.100
the compiler explain to me how you
00:21:19.740
understand this program at a low level
00:21:22.559
by telling me about your data structures
00:21:24.900
and this is a graph data structure it's
00:21:26.700
a a flow chart basically of your program
00:21:28.799
and all the operations I don't go into a
00:21:31.080
huge amount of detail on the specifics
00:21:32.880
because it's fairly complicated but I'll
00:21:34.440
just zoom in
00:21:36.780
what we have here is the so the Red
00:21:38.880
Arrows represent the control flow in the
00:21:40.860
program so from one operation to the
00:21:42.659
next like going from one statement to a
00:21:44.700
next and then the green arrows represent
00:21:47.280
the data flow so how the data flows
00:21:49.020
through the program and what we have
00:21:50.640
here is we're showing that the return
00:21:52.140
value flows from the load indexed at the
00:21:55.260
top the load index is loading fraught
00:21:57.179
the value from the hash so what this is
00:21:59.520
showing us here is we've achieved taking
00:22:01.500
all that code through user code
00:22:04.500
the um the core Library written in Ruby
00:22:07.200
and the Primitive implemented in a low
00:22:09.179
level we've taken that combined it all
00:22:11.580
into one single thing that Tower we had
00:22:14.760
of different types of Ruby code we've
00:22:16.679
collapsed it into one we've been able to
00:22:18.659
compile it all together optimize it
00:22:20.400
together into something really low level
00:22:21.659
just reading from the hash to get the
00:22:24.000
value out of it and that's a fantastic
00:22:25.559
achievement and it's possible because of
00:22:27.539
the core library is written in Ruby it
00:22:29.460
wouldn't be possible if we had to teach
00:22:31.140
the compiler about every single
00:22:32.760
primitive like key because there's not
00:22:35.760
enough time in the world to teach and
00:22:37.500
compile about all of those this makes it
00:22:39.299
manageable
00:22:41.580
this is a potential Way Forward I think
00:22:43.559
for Ruby we can move the majority of
00:22:46.020
core into Ruby we'd leave behind a
00:22:49.679
smaller better defined set of Primitives
00:22:52.200
would create a new version of Ruby that
00:22:54.720
is like core Ruby that we could
00:22:56.580
understand much better and other
00:22:58.260
programming languages do this at the
00:22:59.460
moment for example Haskell has something
00:23:00.960
called Haskell core which is much
00:23:03.419
smaller and simpler everything can be
00:23:05.340
expressed in it but it's small enough to
00:23:07.320
get inside your head and reason about
00:23:09.000
and to write tools to reason about as
00:23:10.919
well
00:23:11.880
we could use truffle Ruby's substantial
00:23:14.039
core as a starting point we can teach
00:23:16.860
our compilers and our static analysis
00:23:18.840
tools like our typing tools our Robo
00:23:21.360
companies like that more about these
00:23:23.520
Primitives and then let it understand
00:23:25.919
the rest of the Ruby code as it would
00:23:27.720
your user code
00:23:29.820
this would give us a smaller more
00:23:31.799
manageable more analyzable Ruby but it
00:23:34.500
works exactly the same now as that
00:23:35.760
before application developers if you're
00:23:37.860
just worried about using Ruby you don't
00:23:39.360
need to worry about it it would work the
00:23:40.559
same as before
00:23:41.760
does it literally need to be a gem I've
00:23:43.679
pitched this as Ruby's core gem I don't
00:23:45.780
think it literally needs to be a gem it
00:23:47.159
could simply be bundled in the standard
00:23:48.539
version of Ruby but potentially we can
00:23:50.280
make it so you can install a newer
00:23:51.900
version if you'd like to
00:23:54.360
I want to give an attribution here to
00:23:55.799
rabinius a lot of traffic Ruby's core
00:23:57.900
Library originated from rabinis that was
00:24:00.000
an earlier implementation of Ruby but it
00:24:03.059
has been maintained by us meaning the
00:24:04.919
trough Ruby team for a few years now so
00:24:07.020
this is building on excellent work by
00:24:08.280
Evan Phoenix by and shirai and many
00:24:10.440
other people as well
00:24:13.080
he's an even more radical idea right
00:24:16.740
Ruby has this extension API that people
00:24:19.140
use to write C extensions and obviously
00:24:21.059
it is written in C
00:24:22.799
could we Implement those C extension
00:24:24.960
Library routines as Ruby as well traffic
00:24:27.600
does this today so there's a core c
00:24:30.179
Library routine called RB
00:24:32.820
um Str new Frozen and Shopper Ruby
00:24:35.820
implements that in Ruby and what it does
00:24:38.039
is the Ruby stub on the bottom right
00:24:39.720
that simply calls back into Ruby to run
00:24:41.820
that routine we get the same benefits
00:24:43.799
here rbstr new Frozen you can look at
00:24:46.500
the C code to understand what it does
00:24:48.000
that's quite complicated you could try
00:24:50.039
and read some of the documentation that
00:24:51.960
documentation isn't always great here
00:24:53.700
you can simply read it what does it do
00:24:55.200
if the value is already Frozen it
00:24:57.000
returns it if not it duplicates and
00:24:59.340
freezes that new copy of it that's great
00:25:01.559
we can understand it again and it
00:25:02.940
optimizes in the same way again so we
00:25:04.740
could go even further potentially
00:25:07.679
what conclusions can we draw from this
00:25:10.919
uh we've got this core of this Tower of
00:25:14.220
different parts of the libraries of Ruby
00:25:15.960
we can split the core Library into core
00:25:18.000
and we write it in Ruby and a smaller
00:25:20.039
set of Primitives on top
00:25:22.919
part on the right hand side would become
00:25:24.659
the new shipped version of Ruby and
00:25:27.120
people who worry about Ruby
00:25:28.080
implementation would just worry about
00:25:29.220
those Primitives not the core on top of
00:25:31.260
it and someone else could worry about
00:25:32.700
the core and what we want to do to
00:25:34.080
expand that separately
00:25:37.440
is it a good idea yes there's tons of
00:25:39.779
benefits it's more understandable it's
00:25:42.240
more shareable less work for the
00:25:43.860
different Ruby implementations to do
00:25:45.179
it's more debuggable you can use your
00:25:46.919
standard tools it's more optimizable by
00:25:49.620
new tools like yjit
00:25:52.260
um it's more analyzable so tools which
00:25:54.360
look at typing and look for bugs and
00:25:56.460
look for other problems can understand
00:25:57.779
it more because it's more compact there
00:26:00.120
are some downsides it might have an
00:26:01.799
impact on Startup time we think we've
00:26:03.480
got a solution for that it may have an
00:26:05.220
impact on memory usage I'm not so sure
00:26:07.260
we've got a great solution for that so
00:26:09.120
there's still some open questions and
00:26:10.980
it's surely worth trying we've already
00:26:12.900
got a core in shopruby we could start
00:26:14.820
trying this out with it's going to
00:26:16.740
become more relevant as MRI gets more
00:26:18.779
sophisticated with things like widget I
00:26:21.240
think this could be a future of Ruby to
00:26:22.799
build it into a direction that is higher
00:26:25.200
performance has better tooling is better
00:26:27.600
able to develop and adapt over time with
00:26:29.700
the a better core Library
00:26:32.460
I want to give you some other things to
00:26:33.779
check out if you're interested in this
00:26:34.860
kind of work
00:26:36.059
so trough Ruby is where this core
00:26:37.559
library is implemented at the moment uh
00:26:39.179
sorry for the young the long URL but I
00:26:41.820
encourage you to go and look at the core
00:26:43.080
library in Clinton Ruby it's just Ruby
00:26:45.120
code we all know Ruby here you can all
00:26:46.980
read it and understand it
00:26:48.679
growlvm.org Ruby is the official truffle
00:26:51.240
Ruby website and my work's all at
00:26:53.460
christine.com trophy Ruby
00:26:57.299
a lot of this optimization I've said is
00:26:58.919
possible is down to a technique called
00:27:00.659
splitting this is a really powerful
00:27:02.940
sophisticated optimization that we're
00:27:04.799
bringing to Ruby and trough Ruby and
00:27:06.840
Benoit de los has a talk today at 3 P.M
00:27:09.299
in room a and I encourage you to go and
00:27:11.279
watch
00:27:12.900
people are doing academic research on
00:27:15.120
top of some of these ideas this is a
00:27:16.980
paper that Sophie caliber has just
00:27:18.360
published it's cool what she does is she
00:27:20.700
analyzes what cool sites and calls look
00:27:23.640
like in Ruby and she shows how powerful
00:27:26.159
our optimizations in geography are for
00:27:28.799
optimizing Ruby code and that would
00:27:30.539
apply to the core Library as well so I'd
00:27:32.580
encourage you to go and Google this
00:27:34.080
paper and have a read of it
00:27:36.360
I said something about rabbinius
00:27:38.400
um rubinius was Ruby implemented in Ruby
00:27:41.279
as well as the core library of Ruby at
00:27:43.620
one point they were writing much of the
00:27:44.940
VM in Ruby did you know that their
00:27:46.860
garbage collector was at one point
00:27:48.120
written in Ruby you may know the term
00:27:49.799
Mark sweep as a type of garbage
00:27:51.779
collector this is the sweep routine from
00:27:53.940
rabinius implemented in Ruby what it
00:27:56.159
says is for each object if the object
00:27:58.679
has been marked by the mark phase then
00:28:01.140
leave it alone otherwise deallocated and
00:28:03.720
again isn't that nice and understandable
00:28:05.100
if you go to rubycompilers.com business
00:28:07.980
you can read about the history of
00:28:09.299
rubinius and how it did this and where
00:28:10.740
its core Library came from
00:28:12.779
and finally I want to publicize another
00:28:14.520
side projects mine it's called the Ruby
00:28:16.400
bibliography if you're a fan of reading
00:28:18.960
about Ruby research and stuff like that
00:28:20.520
this page lists all the Ruby research
00:28:22.799
that is out there
00:28:24.539
that's into my talk thank you very much
00:28:35.580
I think we've got two minutes for
00:28:36.960
questions if anyone would like to ask
00:28:38.279
one
00:28:39.960
so the question is how do we decide what
00:28:41.640
premises do we have and um there's
00:28:44.039
trade-offs either way and how would the
00:28:46.080
implementations agree which Primitives
00:28:47.760
that's a great question and that's
00:28:49.380
another open issue we'd have to work on
00:28:51.960
to resolve what we're doing trough Ruby
00:28:54.240
is we Implement stuff in Ruby by default
00:28:56.100
and then we create a primitive instead
00:28:58.200
if we've got some compelling reason too
00:29:00.600
but the the compelling reasons are
00:29:02.940
different based on how you implement
00:29:04.380
Ruby so trough Ruby for example
00:29:06.960
um has different ways of implementing
00:29:09.120
hashes based on how you're using them
00:29:10.919
and therefore we have a different set of
00:29:12.600
Primitives that would make sense if you
00:29:14.220
had a more simple hash like MRI does so
00:29:17.460
I don't have a great solution to that
00:29:18.720
that is another open problem that would
00:29:20.820
have to to work and resolve but the
00:29:22.980
great thing about we've already got a
00:29:24.299
working core Library we could start from
00:29:26.159
that point of it working and then adjust
00:29:27.659
those sort of premises over time
00:29:29.820
yeah all right I'll leave that come and
00:29:31.679
find me if you've got any questions
00:29:32.580
afterwards thank you very much