00:00:00.000
ready for takeoff
00:00:16.920
hello everyone hope you had a nice lunch
00:00:19.140
and thank you for coming to my talk
00:00:20.939
today I'm excited I'm Varun I work as a
00:00:24.420
software engineer at sourcecraft and
00:00:26.220
today I'm excited to tell you all more
00:00:27.900
about one of the projects that I've been
00:00:29.640
working on recently which is Skip Ruby
00:00:31.820
it's a precise indexer for Ruby source
00:00:34.920
code and I'm gonna slowly break that
00:00:36.420
down for you on like what exactly that
00:00:38.340
means and I'm come all the way from
00:00:40.620
Taipei so I'm kind of glad to meet so
00:00:42.600
many wonderful people here yeah so let's
00:00:45.719
get started so just to provide a bit of
00:00:48.239
background here at sourcecraft our core
00:00:50.700
product is a SAS developer tool which
00:00:53.100
helps you understand and change code at
00:00:55.379
scale so part of that includes searching
00:00:58.079
code navigating code making large scale
00:01:00.840
changes setting up dashboards for
00:01:03.059
migrations alerts for code changes all
00:01:05.880
the kind of fun fun things so today I'm
00:01:08.400
going to be focusing only on one sub
00:01:10.140
part of this which is how we get the
00:01:12.360
language specific information into
00:01:14.340
Source graph which is increasingly
00:01:16.380
becoming a foundational thing for
00:01:18.299
everything else because when you want to
00:01:19.619
make code changes or you want a search
00:01:21.540
code you want it to be aware of the
00:01:23.460
language semantics so before I say a
00:01:25.979
thousand more words first let me just
00:01:27.780
show you a couple of pictures on like
00:01:29.460
what we're trying to achieve with skip
00:01:31.439
Ruby
00:01:33.180
so here's a source graph screenshot
00:01:35.400
involving one of shopify's Open Source
00:01:37.680
repositories as an example this
00:01:40.020
repository has been indexed by Skip Ruby
00:01:42.479
before I took the screenshot so I've
00:01:44.759
hovered over the cursor or sub and
00:01:48.000
clicked on find references similar to an
00:01:50.159
editor right and Below you'll see that
00:01:52.380
there's a reference panel with different
00:01:54.420
results and there's two kinds of results
00:01:56.579
search based and precise so search based
00:01:59.759
can sometimes return false positives as
00:02:02.159
you're seeing the first search space
00:02:03.540
result is for the string replacement
00:02:06.000
method sub right which is not exactly
00:02:08.399
what we were looking for we were looking
00:02:09.899
for the property sub and the precise
00:02:13.200
result does return the correct thing
00:02:15.120
right so we need to do something more
00:02:17.340
clever than string searching right we
00:02:19.080
need to be aware of the language
00:02:20.040
semantics classes methods properties
00:02:22.319
things like that right so we want
00:02:24.780
something like this the precise thing to
00:02:26.280
work and that's powered by Skip Ruby
00:02:28.040
just one more example so in this case
00:02:31.260
I'm looking at a code for gem called
00:02:34.200
zeit work and I'm trying to find
00:02:36.680
references to the module defined in that
00:02:39.660
gem right and if you look at the search
00:02:42.239
results on the left side of the
00:02:43.980
reference panel there are many different
00:02:45.720
repositories which return these results
00:02:47.459
and this includes GitHub repositories so
00:02:50.040
which means like you can search across
00:02:51.840
gems across GitHub repositories gitlab
00:02:54.000
perforce different sources right and
00:02:57.060
again like here the name is unique
00:02:58.500
enough that maybe a string search would
00:03:00.000
be fine but normally like what you want
00:03:02.519
is to capture the semantics of
00:03:04.379
dependencies between gems so that you
00:03:07.500
can understand like okay what's
00:03:08.700
connected to what right or if there's a
00:03:10.560
vulnerability then how is it affecting
00:03:12.300
your code so basically it's like we're
00:03:15.239
kind of building IDE level navigation
00:03:18.060
right in Source graph
00:03:20.040
um
00:03:20.879
so this brings us back to the title of
00:03:23.159
the talk which I'm going to try to break
00:03:24.360
down bit by bit so you can think of the
00:03:26.700
indexer part as the bit which is like
00:03:28.500
the spinner in your IDE right where it's
00:03:31.319
aggregating all this information across
00:03:33.300
source files and presenting it to you in
00:03:35.519
a way where you can query it in many
00:03:36.900
different ways uh references definitions
00:03:39.420
Etc right and the thing is that
00:03:41.940
navigation needs to be fast right like
00:03:43.680
if it's slow enough then you're not
00:03:45.299
going to use it
00:03:47.640
so at a very high level right like this
00:03:49.860
is kind of where indexing fits into the
00:03:52.140
pipeline right and indexer will take in
00:03:54.360
all these uh all the source files
00:03:56.159
configuration and emit like a single
00:03:58.500
file which is in the script data format
00:04:00.540
that's an open source protobus schema
00:04:04.200
um and the index gets uploaded into a
00:04:06.360
database usually in a CI pipeline where
00:04:08.760
it's like running on each and every
00:04:10.260
commit or it can also run as a repeating
00:04:13.260
job inside source graph itself so you
00:04:15.000
don't need to set up your CI
00:04:17.400
um after that like when a client like
00:04:19.199
for example you're trying to navigate in
00:04:20.940
the web browser right and you're trying
00:04:22.919
to perform actions like find references
00:04:24.720
go to definition they will directly
00:04:26.520
query the database right they won't uh
00:04:28.800
talk to the index or there's no running
00:04:31.080
language server process like solar graph
00:04:33.660
for example uh there's nothing running
00:04:35.639
in the background like that because what
00:04:37.620
we've done here is we've separated out
00:04:39.300
the analysis phase from the query phase
00:04:41.460
right and we're able to do this because
00:04:43.380
the code in Source Graphics read only
00:04:45.300
you're not editing as you go right so
00:04:48.300
because the code is read only we just do
00:04:49.860
the analysis once right and now the
00:04:51.900
query can be optimized like a typical
00:04:53.820
database query right the other benefit
00:04:56.580
of this is that the database the backend
00:04:58.919
the client all those things they're
00:05:00.840
they're kind of mostly language agnostic
00:05:02.880
whereas uh if you were to use a language
00:05:05.460
server then the query part also needs to
00:05:08.220
be language aware and so the language
00:05:11.040
aware part is actually restricted only
00:05:13.320
to the analysis which is happening
00:05:15.060
during indexing
00:05:17.580
so that's kind of like roughly where uh
00:05:20.100
indexing is placed right so that's the
00:05:22.500
indexer part right you can think of it
00:05:23.940
as like an ahead of time language server
00:05:26.220
instead of like running just in time
00:05:28.259
while you're navigating the code uh and
00:05:30.600
the other thing is like it prevents us
00:05:31.979
like it doesn't need us to run a
00:05:33.600
language server for every client right
00:05:35.100
because you may be browsing with
00:05:36.300
different kinds of code and there may be
00:05:38.039
millions of lines of code
00:05:40.139
so what's up with the survey bit
00:05:42.539
um
00:05:43.440
so if you're unfamiliar with sorbet it's
00:05:45.539
a type checker for Ruby by stripe which
00:05:48.360
adds support for gradual typing and with
00:05:50.639
optional type signatures and this is
00:05:52.680
similar to typescript if you've written
00:05:54.180
typescript except survey is very fast
00:05:56.699
compared to basically every other
00:05:58.560
production type tracker that I know of
00:06:01.199
um so so skip Ruby is based on an open
00:06:03.240
source work of survey and this bit is
00:06:05.580
important because I'm going to be
00:06:06.660
referencing a bunch of survey internals
00:06:08.699
later in the stock so um that's why uh
00:06:11.580
survey internals come in and I know in
00:06:13.860
the opening keynote Matt's mentioned
00:06:15.840
that performance doesn't matter that
00:06:17.580
much for some cases but in this case for
00:06:20.460
us like surveys performance is an
00:06:22.259
important feature because customers can
00:06:24.419
have very large code bases right and
00:06:26.160
ideally we would index each and every
00:06:28.319
commit so you can just navigate without
00:06:30.180
like having to think oh am I on the main
00:06:32.039
branch on a different branch something
00:06:34.199
like that
00:06:36.060
um so the other reason for building on
00:06:38.280
top of sorbet is that survey already
00:06:40.199
understand this the semantics of Ruby
00:06:42.419
code at a deep level so the index can be
00:06:44.699
precise like an editor right survey also
00:06:46.680
has an LSP
00:06:48.120
um and so that's a precise part in the
00:06:49.860
subtitle we do not want to look at this
00:06:51.660
just the syntax or like rely on some
00:06:54.000
kind of heuristics or like machine
00:06:55.380
learning or what have you and like that
00:06:57.240
we don't know when to compromise the
00:06:58.740
quality of the end result right like it
00:07:00.120
needs to be as good as it can be and so
00:07:02.759
that's another reason to be learned
00:07:04.139
Apple survey
00:07:05.699
so the cross Ripple bit is something
00:07:07.979
that I already kind of showed you
00:07:09.360
earlier right we want to be able to
00:07:10.979
navigate between gems GitHub repels
00:07:13.020
gitlab repos perforce Etc right we do
00:07:15.720
not want to be limited to within a
00:07:17.639
single repository or like even within a
00:07:19.740
single core host because customers are
00:07:21.599
using different kinds of photos
00:07:24.479
um last bit is for every Ruby developer
00:07:26.880
so sourcecraft.com it's free to use for
00:07:29.220
open source maintainers you can upload
00:07:31.139
indexes for your open source
00:07:32.880
repositories that you own and anyone can
00:07:35.699
navigate that code
00:07:37.080
um otherwise you can also get a license
00:07:38.460
for a single tenant managed instance or
00:07:40.919
a self-hosted instance
00:07:43.440
um well so that's like just the opening
00:07:45.419
slide so let's let's get into the
00:07:47.280
details now
00:07:48.840
um so earlier I showed you this abstract
00:07:50.699
picture of indexing source files Go in
00:07:52.860
index comes out right um the only tricky
00:07:55.860
part here is we need to figure out what
00:07:57.240
what goes in the middle right
00:07:59.580
um so since we've decided to build on
00:08:00.960
top of survey uh we need first need to
00:08:03.240
figure out like where exactly will the
00:08:05.520
index sort of fit inside survey right
00:08:07.440
like because survey is kind of not a
00:08:09.240
small code base so you need to figure
00:08:10.740
that out first
00:08:12.840
um so this is a simplified version of
00:08:15.060
surveys internal pipeline we start with
00:08:17.400
the source code in the top left right
00:08:19.259
and we end up with this control flow
00:08:21.060
graph at the bottom right there's a
00:08:22.500
bunch of things in between right like
00:08:23.699
there's some tree representations if
00:08:25.680
you've attended the Talks by um the
00:08:27.840
rulebook of talk by Carla aloe vera on
00:08:30.180
the first day or the language attack by
00:08:32.760
winning stock on the first day right you
00:08:34.500
must have seen like what the price tree
00:08:36.180
or the abstract syntax tree look like in
00:08:38.760
our case those aren't super important
00:08:40.740
because the main thing we care about
00:08:42.479
right like we want to have access to
00:08:44.760
type information so that we can show it
00:08:46.500
in like for example hover documentation
00:08:48.480
right and you'll notice that type
00:08:50.700
checking and inference that's happening
00:08:52.260
on the control flow graph right at the
00:08:54.000
bottom right so we want to be operating
00:08:56.160
at that layer after
00:08:58.560
um type checking has finished running
00:09:01.140
so that's where we'll emit the index
00:09:03.779
after we have access to type information
00:09:05.940
so that sounds reasonable right so far
00:09:08.640
so the next question is like okay what
00:09:10.980
does this control flow graph look like
00:09:12.600
right if you're going to get the index
00:09:15.240
out of it somehow
00:09:17.040
so here's like a very small
00:09:19.260
um Ruby code snippet or like somewhat
00:09:21.000
artificial right I've got a fragment of
00:09:23.519
a fizzbuzz function and you perform
00:09:25.980
modulus operation called plus equals if
00:09:28.920
that operation succeeds
00:09:31.260
um and so I just like so that this fits
00:09:33.300
on a slide and so now let's see what the
00:09:35.700
control flow graph for this like code
00:09:37.560
snippet looks like
00:09:39.540
so okay there's a fair bit of things
00:09:42.060
going on so let's walk through it bit by
00:09:44.040
bit right so first let's just try to
00:09:45.839
identify some patterns in the structure
00:09:48.000
first thing you'll notice is that um
00:09:50.100
this flat bit of code has like actually
00:09:52.200
been broken up into graph structure
00:09:54.120
right like there's a control flow graph
00:09:55.500
and there's these uh different blocks
00:09:57.779
which are called uh basic blocks and
00:09:59.519
compiler jargon
00:10:01.260
um and you've got these explicit arrows
00:10:03.180
depicting control flow so for example
00:10:05.399
the first block as an if at the very end
00:10:08.160
right and it's got these two edges
00:10:09.959
depicting like what happens if the if
00:10:12.360
it's true or if it's false right and so
00:10:15.120
basic blocks are kind of like small
00:10:16.560
functions like within a function right
00:10:18.480
except that control flow is like not a
00:10:21.600
part of the basic block it's external to
00:10:23.760
the basic block it's a part of the edges
00:10:25.620
right the other thing to note is that
00:10:28.260
like within each each line inside a
00:10:30.720
basic block which in survey is called an
00:10:33.000
instruction and they kind of look like
00:10:34.740
Ruby code if it's quite a bit right okay
00:10:37.140
there are lots of dollar signs like
00:10:38.820
there's a bunch of things to like marked
00:10:40.680
with like temp right those temporary
00:10:42.600
variables cannot super clear where they
00:10:44.459
came from
00:10:45.660
um and like it's more verbose but like
00:10:47.459
if you understand Ruby you can
00:10:48.899
understand the control flow graph
00:10:50.339
structure too right and so the index it
00:10:53.160
needs to describe the source code right
00:10:54.420
like it doesn't like the control flow
00:10:56.160
graph is an implementation detail but
00:10:58.320
we're working with the control flow
00:10:59.940
graph so we need to understand the
00:11:01.140
correspondence between the source code
00:11:02.760
and the control flow graph if you were
00:11:04.680
to emit the index correctly
00:11:07.079
so first look let's look at the
00:11:09.120
expression inside the if right which is
00:11:11.399
this I mod 3 equals equals zero right so
00:11:14.100
as you can see it's getting translated
00:11:15.720
into four different instructions each
00:11:18.480
literal assignment like it's literal is
00:11:21.000
becoming like a temporary variable right
00:11:22.980
and um the percentage and the equals
00:11:25.680
equals those have become method calls
00:11:28.560
right if you've done implemented like
00:11:30.540
operator overloading you've seen that
00:11:32.100
percentage and equals equals are even
00:11:34.560
though they look kind of different from
00:11:36.120
method syntax they're actually just
00:11:37.740
methods right and so that's made very
00:11:40.260
explicit in the control flow graph
00:11:42.180
structure
00:11:43.440
um the other thing is that the overall
00:11:45.300
logic is has to be the same right uh if
00:11:48.600
you want to sound fancy you can say that
00:11:50.220
the translation is semantics preserving
00:11:52.140
but but the basic idea is like okay yeah
00:11:54.779
it needs to mean the same thing right
00:11:57.140
the result of the modulus operation
00:11:59.519
right which is temp one in the second
00:12:01.560
instruction it's used as a receiver for
00:12:03.720
uh equals equals the receiver is like
00:12:06.660
just before the DOT sign right and so
00:12:09.300
what's happening in the source code too
00:12:11.399
like the same thing needs to happen in
00:12:13.320
the control flow graph even if like with
00:12:14.940
a bit of indirection right and so one
00:12:18.240
nice benefit of using these temporary
00:12:20.220
values is that now every method receiver
00:12:23.339
as well as the thing before the dot
00:12:25.860
right as well as every method argument
00:12:27.839
now it's a variable right it's either
00:12:30.600
like a named variable like dollar I
00:12:32.640
right or it's like a temporary which we
00:12:34.680
just made up on the fly but um we don't
00:12:37.680
when we're trying to Traverse these
00:12:39.720
things later on right like we do not
00:12:41.339
need to Traverse like a tree structure
00:12:42.959
we don't need recursion we don't need a
00:12:44.820
visitor pattern right everything becomes
00:12:46.860
uh simplified when we're handling it so
00:12:48.959
this flattening is kind of very very
00:12:50.399
useful for further processing
00:12:53.399
so um the other thing as I mentioned
00:12:55.500
right like a control flow is made
00:12:57.180
explicit in the edges
00:12:59.279
um between different basic blocks so the
00:13:01.380
if has to
00:13:02.880
um edges pointing externally to bb1 and
00:13:05.399
basic block 2 right and the other thing
00:13:07.500
that's kind of interesting here is that
00:13:09.540
regardless of whether the if statement
00:13:11.639
gets executed or not right what's
00:13:14.220
happening in the rest of the function
00:13:15.600
still needs to happen right which is why
00:13:17.639
there's a further like fall through Edge
00:13:20.160
from basic block one to basic block two
00:13:22.440
because basic block 2 is still going to
00:13:24.600
get executed in the original function
00:13:26.399
regardless of whether basic block one
00:13:29.040
executed or not right and so the this
00:13:33.420
also means that there's no edges like
00:13:34.980
that go into the middle of a basic block
00:13:36.959
right that's the idea around like yeah
00:13:39.660
we'll make control flow explicit in
00:13:41.700
edges right like there's no control flow
00:13:43.440
within a node
00:13:45.779
so um the other thing is like okay the
00:13:48.180
body of the if statement right there's a
00:13:50.279
out plus equals face and so that's also
00:13:52.920
like kind of done similarly right like
00:13:55.079
the literal becomes assigned to a
00:13:56.880
temporary you call the plus equals
00:13:58.320
method even though you're out in the
00:14:00.540
source code out is like an L value right
00:14:02.700
it's on the left hand side of the plus
00:14:04.079
equals but um from a practical
00:14:06.839
perspective it's essentially calling the
00:14:09.060
plus equals operator
00:14:11.339
um on out right and so that's what you
00:14:13.740
see is the second instruction right
00:14:16.560
um so again like the theme there's a
00:14:18.720
constant theme right of like simplifying
00:14:20.579
uh everything that's in the source code
00:14:22.139
to a handful uh of instructions and
00:14:25.200
you'll kind of see this um repeatedly in
00:14:27.779
the control flow graph where the control
00:14:28.860
flow graph actually just has 17
00:14:30.420
different kinds of instructions whereas
00:14:32.700
if you look at like a parse tree or
00:14:34.260
something is like 90 nodes or something
00:14:36.060
like that right and like Ruby syntax is
00:14:37.680
very flexible the control flow graph is
00:14:39.300
kind of very simple
00:14:41.160
so okay so that's like what the control
00:14:43.680
flow graph looks like right so how do we
00:14:45.959
emit an index for this structure right
00:14:48.300
so we can break that question up into
00:14:50.519
two sub parts right like how do we
00:14:52.199
handle each node in the graph right and
00:14:55.019
how do we handle each Edge in the graph
00:14:56.880
right so for the edges the first main
00:14:59.459
thing we need to do is Traverse the
00:15:01.079
graph in like topological order
00:15:03.779
um by that what I mean is like uh here
00:15:06.000
like I've drawn the graph with the
00:15:07.440
arrows like constantly pointing
00:15:09.000
downwards right and so that's kind of
00:15:11.399
essentially breaking up the graph into
00:15:12.839
different layers right and so
00:15:14.699
topological order all it means is like
00:15:17.040
okay you're going like traversing the
00:15:18.720
blocks from top to bottom right you
00:15:20.820
don't um for example you wouldn't
00:15:22.320
Traverse basic block 2 before basic
00:15:24.360
block one right
00:15:26.220
um that's all there is right and the
00:15:28.199
reason you do it is because definitions
00:15:29.760
they need to come before usages
00:15:32.820
um so okay so that that's the edges
00:15:35.940
right and now let's look at the node so
00:15:37.560
let's just look at the first basic block
00:15:39.660
um for simplification
00:15:41.639
so um here I've got like two parameters
00:15:46.940
dollar I and dollar out uh and so what's
00:15:50.519
going to happen is like we're gonna
00:15:51.839
record that these are parameters to the
00:15:54.060
function right so uh we'll either
00:15:56.399
maintain like a couple of arrays for
00:15:58.500
definitions and references or like they
00:16:00.240
could be hashes we also need to record
00:16:02.579
like Source locations so we know like
00:16:04.500
what's the original Source location this
00:16:06.240
corresponds to so when you hover over
00:16:07.740
something we know oh it was I the
00:16:10.199
parameter or like the out parameter
00:16:12.120
right
00:16:13.500
um and so okay so iron out they're both
00:16:15.600
named variables right like they're not
00:16:17.160
Temporaries right and so that's why we
00:16:19.079
need to record them
00:16:20.880
and then essentially it's a process of
00:16:23.220
like okay iterate over each instruction
00:16:24.660
right and like you look at what kind of
00:16:27.120
instruction it is are there any named
00:16:29.279
variables or is it just Temporaries uh
00:16:32.160
and so temp zero equals three right
00:16:33.720
there's no named variables right like
00:16:35.100
there's just a temporary which is like
00:16:36.480
entirely made up right and so we don't
00:16:38.639
need to emit anything for that
00:16:41.100
um nothing to limit for three it's just
00:16:42.660
a constant value
00:16:44.519
um now we look at the next one here the
00:16:47.399
receiver dollar I right like that's a
00:16:49.440
named variable right that's a reference
00:16:51.420
to the original I in the parameter list
00:16:54.540
right and the percentage right that's an
00:16:57.480
operator right so that's again a named
00:16:59.639
uh method in like we need to emit a
00:17:02.699
reference for that so when you do find
00:17:04.020
references on percentage you can find
00:17:05.819
this call
00:17:08.100
so
00:17:09.600
um we'll add two references here
00:17:12.660
um to the references array marking the
00:17:14.819
parameter and the method and essentially
00:17:17.280
we just repeat this process where we're
00:17:19.500
like okay is this a method called look
00:17:21.480
at the receiver look at the arguments
00:17:22.799
look at the method itself
00:17:25.079
um look at the left hand side in this
00:17:26.520
case all the left hand side are
00:17:27.839
temporary so it doesn't matter
00:17:30.240
um and then again like when we get to
00:17:32.340
the fourth instruction we'll see oh
00:17:33.780
there's an equal SQL so we need to emit
00:17:36.059
an extra reference for that right so in
00:17:38.940
essence right like even though Ruby code
00:17:41.520
might seem kind of complicated or like
00:17:43.140
there's just like so many different
00:17:44.220
kinds of syntax right uh we've kind of
00:17:47.220
reduced the problem of like how do we
00:17:50.460
power this navigation tool to
00:17:52.200
essentially five loops and like one
00:17:54.299
extra function which is like okay which
00:17:56.580
we'll look at like oh what what is this
00:17:58.380
a named variable is this a temporary
00:18:00.660
right and then it will emit optional
00:18:03.960
emit uh definition or a reference right
00:18:06.360
so I think like one takeaway I'm and one
00:18:08.580
for people here like is especially like
00:18:10.620
if you're a junior like you had a
00:18:12.419
compiler's course in University and that
00:18:14.220
is pretty complicated right is that
00:18:16.320
actually
00:18:17.580
um a lot of compiler stuff kind of boils
00:18:19.620
down to like finding the right
00:18:20.880
abstraction right in this case the core
00:18:23.520
indexer that I wrote like over the past
00:18:25.440
three or four months has been like about
00:18:27.480
2000 lines of code 1500 lines of test
00:18:30.000
inputs right
00:18:31.620
um and conceptually the implementation
00:18:33.360
is essentially a pure function which is
00:18:35.100
saving things into like a hash table or
00:18:37.140
something right there's not a lot more
00:18:39.419
going on right like once you understand
00:18:40.860
the core abstractions
00:18:43.500
so um that's mostly all what the indexer
00:18:46.620
does right like because uh We've
00:18:48.720
essentially traversed each and every
00:18:50.100
piece of uh Ruby code snippet right like
00:18:52.440
that there was to Traverse right
00:18:54.960
um except
00:18:56.580
not quite we need some hacks to actually
00:18:59.280
get it to work as well um as it should
00:19:01.740
right and so I'm going to describe one
00:19:03.720
of them here
00:19:05.220
um so remember the sorbet pipeline from
00:19:08.520
before right like it had all these um
00:19:10.440
different representations right and
00:19:12.480
there's like type checking going on
00:19:13.679
right so we decided to work with the
00:19:15.419
control flow graph because that's where
00:19:16.740
the type checking and inference was
00:19:18.059
happening right however in practice or
00:19:20.640
we will sometimes it will not even do
00:19:22.500
type checking right for certain files
00:19:25.080
um so survey has a concept as strictness
00:19:26.820
level so if you don't add a magic
00:19:28.860
comment
00:19:29.940
um at the top of the file which says
00:19:31.500
type true then it will not even try to
00:19:33.600
type check the file right but the
00:19:36.240
problem is like some of our customers
00:19:37.679
they're not using survey at all uh some
00:19:40.080
of them they're only partially adopted
00:19:41.940
survey right but we want code navigation
00:19:44.039
to work for everyone right that's the
00:19:45.780
every Ruby developer thing it's not like
00:19:47.700
oh only it should only work if you if
00:19:50.100
you're using survey so how do you kind
00:19:51.900
of square that Circle right because like
00:19:53.880
we want access to this control flow
00:19:55.980
graph right other options like oh do we
00:19:57.840
need a separate indexer for the parts
00:19:59.580
tree or the abstract syntax tree or do
00:20:02.160
we need like a whole new indexer which
00:20:04.020
is not based on survey right and so uh
00:20:06.360
so far like our thinking has been that's
00:20:08.220
maybe perhaps too complicated thing to
00:20:10.559
do right um instead let's add this hack
00:20:14.100
which is like literally ended up being
00:20:16.020
like a single line in the code base
00:20:17.880
which uh what this basically says is
00:20:20.220
like okay
00:20:21.600
um if we're running in skip Ruby mode
00:20:23.460
then um typed false which is like a
00:20:26.400
surveys
00:20:27.660
um we are saying like Okay I'm not going
00:20:29.220
to type check this file uh what we do is
00:20:31.559
like even in that case we force survey
00:20:34.200
to type check the file and generate the
00:20:36.720
control flow graph and everything and
00:20:38.220
just keep proceeding right and what this
00:20:41.220
means is like we do end up with a
00:20:42.720
control flow graph like it may not be
00:20:44.460
necessarily perfect but it ends up
00:20:47.220
working surprisingly well where survey
00:20:49.260
can handle like lots of errors in code
00:20:51.299
anyways uh encode navigation does seem
00:20:54.360
to work um like they is on my testing
00:20:58.200
like of course you're welcome to try it
00:21:00.120
out and break it and report back um I'll
00:21:02.520
share instructions at the end on how you
00:21:04.140
can do that but like this is kind of
00:21:06.299
surprising to me as well like oh and how
00:21:07.980
robust uh survey is with um code which
00:21:10.980
is actually doesn't touch it well it can
00:21:12.960
generate thousands of errors but we can
00:21:14.400
just suppress them right because it's
00:21:15.660
not terribly interesting what what's of
00:21:17.700
interest is uh the navigation
00:21:19.860
and um so okay so that's how the indexer
00:21:23.280
works right like okay
00:21:25.020
vanilla survey plus some hacks right let
00:21:28.260
me briefly describe how we test the
00:21:30.179
indexer
00:21:31.440
um so essentially we rely on what's
00:21:33.360
you'll commonly called expect tests or
00:21:35.520
snapshot tests or golden tests right we
00:21:37.799
serialize the index into uh human
00:21:39.900
readable format uh and like annotate the
00:21:42.480
source code with like comments like this
00:21:43.980
which like show The Source ranges for
00:21:46.080
different definitions and references
00:21:47.640
right this way it becomes very easy to
00:21:49.799
identify if a patch like oh did it add
00:21:51.900
the right definition that I was
00:21:53.220
expecting it to which was getting missed
00:21:54.840
earlier
00:21:55.919
um did it actually start skipping some
00:21:57.900
definitions and like we need to fix that
00:21:59.520
right um this become very easy right
00:22:02.700
um and so this is kind of one of the
00:22:03.780
nice benefits of the layering that I
00:22:05.520
showed you from before right we're only
00:22:07.080
doing analysis right like we're not
00:22:08.640
concerned about what the browser is
00:22:10.020
doing right we're not concerned about
00:22:12.539
like okay should we be testing along
00:22:15.360
with the browser because like browser
00:22:16.620
test can sometimes get flaky right
00:22:18.419
testing is now very simple it's fast and
00:22:21.000
like you can understand this right so
00:22:22.679
it's kind of very predictable as well
00:22:24.900
and so so that's um yeah that's kind of
00:22:28.860
um most of what I had to say right um so
00:22:30.720
skip Ruby is open source and it's
00:22:32.280
available on GitHub right if you're an
00:22:34.380
open source maintainer
00:22:36.179
um an existing sourcecraft customer or
00:22:38.340
like um interested in potentially in
00:22:40.080
using search graph then you can find
00:22:41.880
instructions in the skip Ruby readme or
00:22:43.980
to describe how to try it out I've tried
00:22:45.900
to make configuration as simple as
00:22:47.400
possible like if you're using bundler or
00:22:48.900
like some standard configuration it
00:22:50.580
should mostly just work like with a
00:22:52.620
couple of minutes of setup and you
00:22:54.659
should be able to upload an index to
00:22:56.640
sourcecraft.com for open source code uh
00:22:59.580
and just like start navigating your code
00:23:01.260
there's a couple of example repos as
00:23:03.419
well on sourcecraft.com which is this uh
00:23:06.059
Shopify one and uh Homebrew so the
00:23:08.520
Shopify one as an example is like it's
00:23:10.440
100 survey in some case in some ways
00:23:12.960
it's like kind of best case like what
00:23:14.400
you can get right I'm sure there's still
00:23:16.020
bugs there but um that's kind of like
00:23:18.059
one example the other is uh Homebrew
00:23:20.760
slash Brew so I'm sure many of you know
00:23:22.980
and love uh Homebrew and so The Homebrew
00:23:25.799
code base is kind of uh I think it's
00:23:27.480
like maybe 40 survey or like 30 adoption
00:23:30.900
of survey but you'll be able to navigate
00:23:33.059
core even within fault uh files which
00:23:35.940
aren't supposed to be type checked by
00:23:37.919
sorbet um because of that hack I
00:23:39.720
mentioned before and so that's another
00:23:41.280
example of like what kind of
00:23:43.380
um navigation you can get even when you
00:23:45.480
only partially uh adopted survey right
00:23:48.500
and provide feedback there's a couple of
00:23:50.940
different ways to do it um the QR code
00:23:52.679
there is for our Discord uh the source
00:23:55.440
Crafters card and there's a skip Ruby
00:23:56.940
Channel you can also find the Discord
00:23:58.440
link on our community website
00:24:01.260
um and the other way you can provide
00:24:02.520
feedback to me is uh yeah GitHub issues
00:24:04.679
so those are totally fine too also like
00:24:06.840
if you just have a question right like
00:24:08.280
you don't even like have an issue per se
00:24:10.500
like it's totally fine to open up a
00:24:12.120
GitHub issue
00:24:13.620
um and yeah thank you uh if you have
00:24:15.840
questions for me right now I think we
00:24:17.520
have a bit of time for questions or yeah
00:24:19.559
I can take them afterwards as well thank
00:24:21.299
you
00:24:24.179
foreign