Talks

scip-ruby - A Ruby indexer built with Sorbet

scip-ruby is an open source indexer that lets you browse Ruby code online, with IDE functionality like “Go to definition” and “Find usages”. We originally built scip-ruby to improve Ruby support in Sourcegraph, a code intelligence platform. In this talk, you will learn how we built scip-ruby on top of Sorbet, a Ruby typechecker, and how scip-ruby compares to IDEs and other online code navigation tools. Along the way, we will discuss how quintessential ideas like layering code into a functional core and an imperative shell apply to developer tools, and enable easier testing.

RubyConf 2022

00:00:00.000 ready for takeoff
00:00:16.920 hello everyone hope you had a nice lunch
00:00:19.140 and thank you for coming to my talk
00:00:20.939 today I'm excited I'm Varun I work as a
00:00:24.420 software engineer at sourcecraft and
00:00:26.220 today I'm excited to tell you all more
00:00:27.900 about one of the projects that I've been
00:00:29.640 working on recently which is Skip Ruby
00:00:31.820 it's a precise indexer for Ruby source
00:00:34.920 code and I'm gonna slowly break that
00:00:36.420 down for you on like what exactly that
00:00:38.340 means and I'm come all the way from
00:00:40.620 Taipei so I'm kind of glad to meet so
00:00:42.600 many wonderful people here yeah so let's
00:00:45.719 get started so just to provide a bit of
00:00:48.239 background here at sourcecraft our core
00:00:50.700 product is a SAS developer tool which
00:00:53.100 helps you understand and change code at
00:00:55.379 scale so part of that includes searching
00:00:58.079 code navigating code making large scale
00:01:00.840 changes setting up dashboards for
00:01:03.059 migrations alerts for code changes all
00:01:05.880 the kind of fun fun things so today I'm
00:01:08.400 going to be focusing only on one sub
00:01:10.140 part of this which is how we get the
00:01:12.360 language specific information into
00:01:14.340 Source graph which is increasingly
00:01:16.380 becoming a foundational thing for
00:01:18.299 everything else because when you want to
00:01:19.619 make code changes or you want a search
00:01:21.540 code you want it to be aware of the
00:01:23.460 language semantics so before I say a
00:01:25.979 thousand more words first let me just
00:01:27.780 show you a couple of pictures on like
00:01:29.460 what we're trying to achieve with skip
00:01:31.439 Ruby
00:01:33.180 so here's a source graph screenshot
00:01:35.400 involving one of shopify's Open Source
00:01:37.680 repositories as an example this
00:01:40.020 repository has been indexed by Skip Ruby
00:01:42.479 before I took the screenshot so I've
00:01:44.759 hovered over the cursor or sub and
00:01:48.000 clicked on find references similar to an
00:01:50.159 editor right and Below you'll see that
00:01:52.380 there's a reference panel with different
00:01:54.420 results and there's two kinds of results
00:01:56.579 search based and precise so search based
00:01:59.759 can sometimes return false positives as
00:02:02.159 you're seeing the first search space
00:02:03.540 result is for the string replacement
00:02:06.000 method sub right which is not exactly
00:02:08.399 what we were looking for we were looking
00:02:09.899 for the property sub and the precise
00:02:13.200 result does return the correct thing
00:02:15.120 right so we need to do something more
00:02:17.340 clever than string searching right we
00:02:19.080 need to be aware of the language
00:02:20.040 semantics classes methods properties
00:02:22.319 things like that right so we want
00:02:24.780 something like this the precise thing to
00:02:26.280 work and that's powered by Skip Ruby
00:02:28.040 just one more example so in this case
00:02:31.260 I'm looking at a code for gem called
00:02:34.200 zeit work and I'm trying to find
00:02:36.680 references to the module defined in that
00:02:39.660 gem right and if you look at the search
00:02:42.239 results on the left side of the
00:02:43.980 reference panel there are many different
00:02:45.720 repositories which return these results
00:02:47.459 and this includes GitHub repositories so
00:02:50.040 which means like you can search across
00:02:51.840 gems across GitHub repositories gitlab
00:02:54.000 perforce different sources right and
00:02:57.060 again like here the name is unique
00:02:58.500 enough that maybe a string search would
00:03:00.000 be fine but normally like what you want
00:03:02.519 is to capture the semantics of
00:03:04.379 dependencies between gems so that you
00:03:07.500 can understand like okay what's
00:03:08.700 connected to what right or if there's a
00:03:10.560 vulnerability then how is it affecting
00:03:12.300 your code so basically it's like we're
00:03:15.239 kind of building IDE level navigation
00:03:18.060 right in Source graph
00:03:20.040 um
00:03:20.879 so this brings us back to the title of
00:03:23.159 the talk which I'm going to try to break
00:03:24.360 down bit by bit so you can think of the
00:03:26.700 indexer part as the bit which is like
00:03:28.500 the spinner in your IDE right where it's
00:03:31.319 aggregating all this information across
00:03:33.300 source files and presenting it to you in
00:03:35.519 a way where you can query it in many
00:03:36.900 different ways uh references definitions
00:03:39.420 Etc right and the thing is that
00:03:41.940 navigation needs to be fast right like
00:03:43.680 if it's slow enough then you're not
00:03:45.299 going to use it
00:03:47.640 so at a very high level right like this
00:03:49.860 is kind of where indexing fits into the
00:03:52.140 pipeline right and indexer will take in
00:03:54.360 all these uh all the source files
00:03:56.159 configuration and emit like a single
00:03:58.500 file which is in the script data format
00:04:00.540 that's an open source protobus schema
00:04:04.200 um and the index gets uploaded into a
00:04:06.360 database usually in a CI pipeline where
00:04:08.760 it's like running on each and every
00:04:10.260 commit or it can also run as a repeating
00:04:13.260 job inside source graph itself so you
00:04:15.000 don't need to set up your CI
00:04:17.400 um after that like when a client like
00:04:19.199 for example you're trying to navigate in
00:04:20.940 the web browser right and you're trying
00:04:22.919 to perform actions like find references
00:04:24.720 go to definition they will directly
00:04:26.520 query the database right they won't uh
00:04:28.800 talk to the index or there's no running
00:04:31.080 language server process like solar graph
00:04:33.660 for example uh there's nothing running
00:04:35.639 in the background like that because what
00:04:37.620 we've done here is we've separated out
00:04:39.300 the analysis phase from the query phase
00:04:41.460 right and we're able to do this because
00:04:43.380 the code in Source Graphics read only
00:04:45.300 you're not editing as you go right so
00:04:48.300 because the code is read only we just do
00:04:49.860 the analysis once right and now the
00:04:51.900 query can be optimized like a typical
00:04:53.820 database query right the other benefit
00:04:56.580 of this is that the database the backend
00:04:58.919 the client all those things they're
00:05:00.840 they're kind of mostly language agnostic
00:05:02.880 whereas uh if you were to use a language
00:05:05.460 server then the query part also needs to
00:05:08.220 be language aware and so the language
00:05:11.040 aware part is actually restricted only
00:05:13.320 to the analysis which is happening
00:05:15.060 during indexing
00:05:17.580 so that's kind of like roughly where uh
00:05:20.100 indexing is placed right so that's the
00:05:22.500 indexer part right you can think of it
00:05:23.940 as like an ahead of time language server
00:05:26.220 instead of like running just in time
00:05:28.259 while you're navigating the code uh and
00:05:30.600 the other thing is like it prevents us
00:05:31.979 like it doesn't need us to run a
00:05:33.600 language server for every client right
00:05:35.100 because you may be browsing with
00:05:36.300 different kinds of code and there may be
00:05:38.039 millions of lines of code
00:05:40.139 so what's up with the survey bit
00:05:42.539 um
00:05:43.440 so if you're unfamiliar with sorbet it's
00:05:45.539 a type checker for Ruby by stripe which
00:05:48.360 adds support for gradual typing and with
00:05:50.639 optional type signatures and this is
00:05:52.680 similar to typescript if you've written
00:05:54.180 typescript except survey is very fast
00:05:56.699 compared to basically every other
00:05:58.560 production type tracker that I know of
00:06:01.199 um so so skip Ruby is based on an open
00:06:03.240 source work of survey and this bit is
00:06:05.580 important because I'm going to be
00:06:06.660 referencing a bunch of survey internals
00:06:08.699 later in the stock so um that's why uh
00:06:11.580 survey internals come in and I know in
00:06:13.860 the opening keynote Matt's mentioned
00:06:15.840 that performance doesn't matter that
00:06:17.580 much for some cases but in this case for
00:06:20.460 us like surveys performance is an
00:06:22.259 important feature because customers can
00:06:24.419 have very large code bases right and
00:06:26.160 ideally we would index each and every
00:06:28.319 commit so you can just navigate without
00:06:30.180 like having to think oh am I on the main
00:06:32.039 branch on a different branch something
00:06:34.199 like that
00:06:36.060 um so the other reason for building on
00:06:38.280 top of sorbet is that survey already
00:06:40.199 understand this the semantics of Ruby
00:06:42.419 code at a deep level so the index can be
00:06:44.699 precise like an editor right survey also
00:06:46.680 has an LSP
00:06:48.120 um and so that's a precise part in the
00:06:49.860 subtitle we do not want to look at this
00:06:51.660 just the syntax or like rely on some
00:06:54.000 kind of heuristics or like machine
00:06:55.380 learning or what have you and like that
00:06:57.240 we don't know when to compromise the
00:06:58.740 quality of the end result right like it
00:07:00.120 needs to be as good as it can be and so
00:07:02.759 that's another reason to be learned
00:07:04.139 Apple survey
00:07:05.699 so the cross Ripple bit is something
00:07:07.979 that I already kind of showed you
00:07:09.360 earlier right we want to be able to
00:07:10.979 navigate between gems GitHub repels
00:07:13.020 gitlab repos perforce Etc right we do
00:07:15.720 not want to be limited to within a
00:07:17.639 single repository or like even within a
00:07:19.740 single core host because customers are
00:07:21.599 using different kinds of photos
00:07:24.479 um last bit is for every Ruby developer
00:07:26.880 so sourcecraft.com it's free to use for
00:07:29.220 open source maintainers you can upload
00:07:31.139 indexes for your open source
00:07:32.880 repositories that you own and anyone can
00:07:35.699 navigate that code
00:07:37.080 um otherwise you can also get a license
00:07:38.460 for a single tenant managed instance or
00:07:40.919 a self-hosted instance
00:07:43.440 um well so that's like just the opening
00:07:45.419 slide so let's let's get into the
00:07:47.280 details now
00:07:48.840 um so earlier I showed you this abstract
00:07:50.699 picture of indexing source files Go in
00:07:52.860 index comes out right um the only tricky
00:07:55.860 part here is we need to figure out what
00:07:57.240 what goes in the middle right
00:07:59.580 um so since we've decided to build on
00:08:00.960 top of survey uh we need first need to
00:08:03.240 figure out like where exactly will the
00:08:05.520 index sort of fit inside survey right
00:08:07.440 like because survey is kind of not a
00:08:09.240 small code base so you need to figure
00:08:10.740 that out first
00:08:12.840 um so this is a simplified version of
00:08:15.060 surveys internal pipeline we start with
00:08:17.400 the source code in the top left right
00:08:19.259 and we end up with this control flow
00:08:21.060 graph at the bottom right there's a
00:08:22.500 bunch of things in between right like
00:08:23.699 there's some tree representations if
00:08:25.680 you've attended the Talks by um the
00:08:27.840 rulebook of talk by Carla aloe vera on
00:08:30.180 the first day or the language attack by
00:08:32.760 winning stock on the first day right you
00:08:34.500 must have seen like what the price tree
00:08:36.180 or the abstract syntax tree look like in
00:08:38.760 our case those aren't super important
00:08:40.740 because the main thing we care about
00:08:42.479 right like we want to have access to
00:08:44.760 type information so that we can show it
00:08:46.500 in like for example hover documentation
00:08:48.480 right and you'll notice that type
00:08:50.700 checking and inference that's happening
00:08:52.260 on the control flow graph right at the
00:08:54.000 bottom right so we want to be operating
00:08:56.160 at that layer after
00:08:58.560 um type checking has finished running
00:09:01.140 so that's where we'll emit the index
00:09:03.779 after we have access to type information
00:09:05.940 so that sounds reasonable right so far
00:09:08.640 so the next question is like okay what
00:09:10.980 does this control flow graph look like
00:09:12.600 right if you're going to get the index
00:09:15.240 out of it somehow
00:09:17.040 so here's like a very small
00:09:19.260 um Ruby code snippet or like somewhat
00:09:21.000 artificial right I've got a fragment of
00:09:23.519 a fizzbuzz function and you perform
00:09:25.980 modulus operation called plus equals if
00:09:28.920 that operation succeeds
00:09:31.260 um and so I just like so that this fits
00:09:33.300 on a slide and so now let's see what the
00:09:35.700 control flow graph for this like code
00:09:37.560 snippet looks like
00:09:39.540 so okay there's a fair bit of things
00:09:42.060 going on so let's walk through it bit by
00:09:44.040 bit right so first let's just try to
00:09:45.839 identify some patterns in the structure
00:09:48.000 first thing you'll notice is that um
00:09:50.100 this flat bit of code has like actually
00:09:52.200 been broken up into graph structure
00:09:54.120 right like there's a control flow graph
00:09:55.500 and there's these uh different blocks
00:09:57.779 which are called uh basic blocks and
00:09:59.519 compiler jargon
00:10:01.260 um and you've got these explicit arrows
00:10:03.180 depicting control flow so for example
00:10:05.399 the first block as an if at the very end
00:10:08.160 right and it's got these two edges
00:10:09.959 depicting like what happens if the if
00:10:12.360 it's true or if it's false right and so
00:10:15.120 basic blocks are kind of like small
00:10:16.560 functions like within a function right
00:10:18.480 except that control flow is like not a
00:10:21.600 part of the basic block it's external to
00:10:23.760 the basic block it's a part of the edges
00:10:25.620 right the other thing to note is that
00:10:28.260 like within each each line inside a
00:10:30.720 basic block which in survey is called an
00:10:33.000 instruction and they kind of look like
00:10:34.740 Ruby code if it's quite a bit right okay
00:10:37.140 there are lots of dollar signs like
00:10:38.820 there's a bunch of things to like marked
00:10:40.680 with like temp right those temporary
00:10:42.600 variables cannot super clear where they
00:10:44.459 came from
00:10:45.660 um and like it's more verbose but like
00:10:47.459 if you understand Ruby you can
00:10:48.899 understand the control flow graph
00:10:50.339 structure too right and so the index it
00:10:53.160 needs to describe the source code right
00:10:54.420 like it doesn't like the control flow
00:10:56.160 graph is an implementation detail but
00:10:58.320 we're working with the control flow
00:10:59.940 graph so we need to understand the
00:11:01.140 correspondence between the source code
00:11:02.760 and the control flow graph if you were
00:11:04.680 to emit the index correctly
00:11:07.079 so first look let's look at the
00:11:09.120 expression inside the if right which is
00:11:11.399 this I mod 3 equals equals zero right so
00:11:14.100 as you can see it's getting translated
00:11:15.720 into four different instructions each
00:11:18.480 literal assignment like it's literal is
00:11:21.000 becoming like a temporary variable right
00:11:22.980 and um the percentage and the equals
00:11:25.680 equals those have become method calls
00:11:28.560 right if you've done implemented like
00:11:30.540 operator overloading you've seen that
00:11:32.100 percentage and equals equals are even
00:11:34.560 though they look kind of different from
00:11:36.120 method syntax they're actually just
00:11:37.740 methods right and so that's made very
00:11:40.260 explicit in the control flow graph
00:11:42.180 structure
00:11:43.440 um the other thing is that the overall
00:11:45.300 logic is has to be the same right uh if
00:11:48.600 you want to sound fancy you can say that
00:11:50.220 the translation is semantics preserving
00:11:52.140 but but the basic idea is like okay yeah
00:11:54.779 it needs to mean the same thing right
00:11:57.140 the result of the modulus operation
00:11:59.519 right which is temp one in the second
00:12:01.560 instruction it's used as a receiver for
00:12:03.720 uh equals equals the receiver is like
00:12:06.660 just before the DOT sign right and so
00:12:09.300 what's happening in the source code too
00:12:11.399 like the same thing needs to happen in
00:12:13.320 the control flow graph even if like with
00:12:14.940 a bit of indirection right and so one
00:12:18.240 nice benefit of using these temporary
00:12:20.220 values is that now every method receiver
00:12:23.339 as well as the thing before the dot
00:12:25.860 right as well as every method argument
00:12:27.839 now it's a variable right it's either
00:12:30.600 like a named variable like dollar I
00:12:32.640 right or it's like a temporary which we
00:12:34.680 just made up on the fly but um we don't
00:12:37.680 when we're trying to Traverse these
00:12:39.720 things later on right like we do not
00:12:41.339 need to Traverse like a tree structure
00:12:42.959 we don't need recursion we don't need a
00:12:44.820 visitor pattern right everything becomes
00:12:46.860 uh simplified when we're handling it so
00:12:48.959 this flattening is kind of very very
00:12:50.399 useful for further processing
00:12:53.399 so um the other thing as I mentioned
00:12:55.500 right like a control flow is made
00:12:57.180 explicit in the edges
00:12:59.279 um between different basic blocks so the
00:13:01.380 if has to
00:13:02.880 um edges pointing externally to bb1 and
00:13:05.399 basic block 2 right and the other thing
00:13:07.500 that's kind of interesting here is that
00:13:09.540 regardless of whether the if statement
00:13:11.639 gets executed or not right what's
00:13:14.220 happening in the rest of the function
00:13:15.600 still needs to happen right which is why
00:13:17.639 there's a further like fall through Edge
00:13:20.160 from basic block one to basic block two
00:13:22.440 because basic block 2 is still going to
00:13:24.600 get executed in the original function
00:13:26.399 regardless of whether basic block one
00:13:29.040 executed or not right and so the this
00:13:33.420 also means that there's no edges like
00:13:34.980 that go into the middle of a basic block
00:13:36.959 right that's the idea around like yeah
00:13:39.660 we'll make control flow explicit in
00:13:41.700 edges right like there's no control flow
00:13:43.440 within a node
00:13:45.779 so um the other thing is like okay the
00:13:48.180 body of the if statement right there's a
00:13:50.279 out plus equals face and so that's also
00:13:52.920 like kind of done similarly right like
00:13:55.079 the literal becomes assigned to a
00:13:56.880 temporary you call the plus equals
00:13:58.320 method even though you're out in the
00:14:00.540 source code out is like an L value right
00:14:02.700 it's on the left hand side of the plus
00:14:04.079 equals but um from a practical
00:14:06.839 perspective it's essentially calling the
00:14:09.060 plus equals operator
00:14:11.339 um on out right and so that's what you
00:14:13.740 see is the second instruction right
00:14:16.560 um so again like the theme there's a
00:14:18.720 constant theme right of like simplifying
00:14:20.579 uh everything that's in the source code
00:14:22.139 to a handful uh of instructions and
00:14:25.200 you'll kind of see this um repeatedly in
00:14:27.779 the control flow graph where the control
00:14:28.860 flow graph actually just has 17
00:14:30.420 different kinds of instructions whereas
00:14:32.700 if you look at like a parse tree or
00:14:34.260 something is like 90 nodes or something
00:14:36.060 like that right and like Ruby syntax is
00:14:37.680 very flexible the control flow graph is
00:14:39.300 kind of very simple
00:14:41.160 so okay so that's like what the control
00:14:43.680 flow graph looks like right so how do we
00:14:45.959 emit an index for this structure right
00:14:48.300 so we can break that question up into
00:14:50.519 two sub parts right like how do we
00:14:52.199 handle each node in the graph right and
00:14:55.019 how do we handle each Edge in the graph
00:14:56.880 right so for the edges the first main
00:14:59.459 thing we need to do is Traverse the
00:15:01.079 graph in like topological order
00:15:03.779 um by that what I mean is like uh here
00:15:06.000 like I've drawn the graph with the
00:15:07.440 arrows like constantly pointing
00:15:09.000 downwards right and so that's kind of
00:15:11.399 essentially breaking up the graph into
00:15:12.839 different layers right and so
00:15:14.699 topological order all it means is like
00:15:17.040 okay you're going like traversing the
00:15:18.720 blocks from top to bottom right you
00:15:20.820 don't um for example you wouldn't
00:15:22.320 Traverse basic block 2 before basic
00:15:24.360 block one right
00:15:26.220 um that's all there is right and the
00:15:28.199 reason you do it is because definitions
00:15:29.760 they need to come before usages
00:15:32.820 um so okay so that that's the edges
00:15:35.940 right and now let's look at the node so
00:15:37.560 let's just look at the first basic block
00:15:39.660 um for simplification
00:15:41.639 so um here I've got like two parameters
00:15:46.940 dollar I and dollar out uh and so what's
00:15:50.519 going to happen is like we're gonna
00:15:51.839 record that these are parameters to the
00:15:54.060 function right so uh we'll either
00:15:56.399 maintain like a couple of arrays for
00:15:58.500 definitions and references or like they
00:16:00.240 could be hashes we also need to record
00:16:02.579 like Source locations so we know like
00:16:04.500 what's the original Source location this
00:16:06.240 corresponds to so when you hover over
00:16:07.740 something we know oh it was I the
00:16:10.199 parameter or like the out parameter
00:16:12.120 right
00:16:13.500 um and so okay so iron out they're both
00:16:15.600 named variables right like they're not
00:16:17.160 Temporaries right and so that's why we
00:16:19.079 need to record them
00:16:20.880 and then essentially it's a process of
00:16:23.220 like okay iterate over each instruction
00:16:24.660 right and like you look at what kind of
00:16:27.120 instruction it is are there any named
00:16:29.279 variables or is it just Temporaries uh
00:16:32.160 and so temp zero equals three right
00:16:33.720 there's no named variables right like
00:16:35.100 there's just a temporary which is like
00:16:36.480 entirely made up right and so we don't
00:16:38.639 need to emit anything for that
00:16:41.100 um nothing to limit for three it's just
00:16:42.660 a constant value
00:16:44.519 um now we look at the next one here the
00:16:47.399 receiver dollar I right like that's a
00:16:49.440 named variable right that's a reference
00:16:51.420 to the original I in the parameter list
00:16:54.540 right and the percentage right that's an
00:16:57.480 operator right so that's again a named
00:16:59.639 uh method in like we need to emit a
00:17:02.699 reference for that so when you do find
00:17:04.020 references on percentage you can find
00:17:05.819 this call
00:17:08.100 so
00:17:09.600 um we'll add two references here
00:17:12.660 um to the references array marking the
00:17:14.819 parameter and the method and essentially
00:17:17.280 we just repeat this process where we're
00:17:19.500 like okay is this a method called look
00:17:21.480 at the receiver look at the arguments
00:17:22.799 look at the method itself
00:17:25.079 um look at the left hand side in this
00:17:26.520 case all the left hand side are
00:17:27.839 temporary so it doesn't matter
00:17:30.240 um and then again like when we get to
00:17:32.340 the fourth instruction we'll see oh
00:17:33.780 there's an equal SQL so we need to emit
00:17:36.059 an extra reference for that right so in
00:17:38.940 essence right like even though Ruby code
00:17:41.520 might seem kind of complicated or like
00:17:43.140 there's just like so many different
00:17:44.220 kinds of syntax right uh we've kind of
00:17:47.220 reduced the problem of like how do we
00:17:50.460 power this navigation tool to
00:17:52.200 essentially five loops and like one
00:17:54.299 extra function which is like okay which
00:17:56.580 we'll look at like oh what what is this
00:17:58.380 a named variable is this a temporary
00:18:00.660 right and then it will emit optional
00:18:03.960 emit uh definition or a reference right
00:18:06.360 so I think like one takeaway I'm and one
00:18:08.580 for people here like is especially like
00:18:10.620 if you're a junior like you had a
00:18:12.419 compiler's course in University and that
00:18:14.220 is pretty complicated right is that
00:18:16.320 actually
00:18:17.580 um a lot of compiler stuff kind of boils
00:18:19.620 down to like finding the right
00:18:20.880 abstraction right in this case the core
00:18:23.520 indexer that I wrote like over the past
00:18:25.440 three or four months has been like about
00:18:27.480 2000 lines of code 1500 lines of test
00:18:30.000 inputs right
00:18:31.620 um and conceptually the implementation
00:18:33.360 is essentially a pure function which is
00:18:35.100 saving things into like a hash table or
00:18:37.140 something right there's not a lot more
00:18:39.419 going on right like once you understand
00:18:40.860 the core abstractions
00:18:43.500 so um that's mostly all what the indexer
00:18:46.620 does right like because uh We've
00:18:48.720 essentially traversed each and every
00:18:50.100 piece of uh Ruby code snippet right like
00:18:52.440 that there was to Traverse right
00:18:54.960 um except
00:18:56.580 not quite we need some hacks to actually
00:18:59.280 get it to work as well um as it should
00:19:01.740 right and so I'm going to describe one
00:19:03.720 of them here
00:19:05.220 um so remember the sorbet pipeline from
00:19:08.520 before right like it had all these um
00:19:10.440 different representations right and
00:19:12.480 there's like type checking going on
00:19:13.679 right so we decided to work with the
00:19:15.419 control flow graph because that's where
00:19:16.740 the type checking and inference was
00:19:18.059 happening right however in practice or
00:19:20.640 we will sometimes it will not even do
00:19:22.500 type checking right for certain files
00:19:25.080 um so survey has a concept as strictness
00:19:26.820 level so if you don't add a magic
00:19:28.860 comment
00:19:29.940 um at the top of the file which says
00:19:31.500 type true then it will not even try to
00:19:33.600 type check the file right but the
00:19:36.240 problem is like some of our customers
00:19:37.679 they're not using survey at all uh some
00:19:40.080 of them they're only partially adopted
00:19:41.940 survey right but we want code navigation
00:19:44.039 to work for everyone right that's the
00:19:45.780 every Ruby developer thing it's not like
00:19:47.700 oh only it should only work if you if
00:19:50.100 you're using survey so how do you kind
00:19:51.900 of square that Circle right because like
00:19:53.880 we want access to this control flow
00:19:55.980 graph right other options like oh do we
00:19:57.840 need a separate indexer for the parts
00:19:59.580 tree or the abstract syntax tree or do
00:20:02.160 we need like a whole new indexer which
00:20:04.020 is not based on survey right and so uh
00:20:06.360 so far like our thinking has been that's
00:20:08.220 maybe perhaps too complicated thing to
00:20:10.559 do right um instead let's add this hack
00:20:14.100 which is like literally ended up being
00:20:16.020 like a single line in the code base
00:20:17.880 which uh what this basically says is
00:20:20.220 like okay
00:20:21.600 um if we're running in skip Ruby mode
00:20:23.460 then um typed false which is like a
00:20:26.400 surveys
00:20:27.660 um we are saying like Okay I'm not going
00:20:29.220 to type check this file uh what we do is
00:20:31.559 like even in that case we force survey
00:20:34.200 to type check the file and generate the
00:20:36.720 control flow graph and everything and
00:20:38.220 just keep proceeding right and what this
00:20:41.220 means is like we do end up with a
00:20:42.720 control flow graph like it may not be
00:20:44.460 necessarily perfect but it ends up
00:20:47.220 working surprisingly well where survey
00:20:49.260 can handle like lots of errors in code
00:20:51.299 anyways uh encode navigation does seem
00:20:54.360 to work um like they is on my testing
00:20:58.200 like of course you're welcome to try it
00:21:00.120 out and break it and report back um I'll
00:21:02.520 share instructions at the end on how you
00:21:04.140 can do that but like this is kind of
00:21:06.299 surprising to me as well like oh and how
00:21:07.980 robust uh survey is with um code which
00:21:10.980 is actually doesn't touch it well it can
00:21:12.960 generate thousands of errors but we can
00:21:14.400 just suppress them right because it's
00:21:15.660 not terribly interesting what what's of
00:21:17.700 interest is uh the navigation
00:21:19.860 and um so okay so that's how the indexer
00:21:23.280 works right like okay
00:21:25.020 vanilla survey plus some hacks right let
00:21:28.260 me briefly describe how we test the
00:21:30.179 indexer
00:21:31.440 um so essentially we rely on what's
00:21:33.360 you'll commonly called expect tests or
00:21:35.520 snapshot tests or golden tests right we
00:21:37.799 serialize the index into uh human
00:21:39.900 readable format uh and like annotate the
00:21:42.480 source code with like comments like this
00:21:43.980 which like show The Source ranges for
00:21:46.080 different definitions and references
00:21:47.640 right this way it becomes very easy to
00:21:49.799 identify if a patch like oh did it add
00:21:51.900 the right definition that I was
00:21:53.220 expecting it to which was getting missed
00:21:54.840 earlier
00:21:55.919 um did it actually start skipping some
00:21:57.900 definitions and like we need to fix that
00:21:59.520 right um this become very easy right
00:22:02.700 um and so this is kind of one of the
00:22:03.780 nice benefits of the layering that I
00:22:05.520 showed you from before right we're only
00:22:07.080 doing analysis right like we're not
00:22:08.640 concerned about what the browser is
00:22:10.020 doing right we're not concerned about
00:22:12.539 like okay should we be testing along
00:22:15.360 with the browser because like browser
00:22:16.620 test can sometimes get flaky right
00:22:18.419 testing is now very simple it's fast and
00:22:21.000 like you can understand this right so
00:22:22.679 it's kind of very predictable as well
00:22:24.900 and so so that's um yeah that's kind of
00:22:28.860 um most of what I had to say right um so
00:22:30.720 skip Ruby is open source and it's
00:22:32.280 available on GitHub right if you're an
00:22:34.380 open source maintainer
00:22:36.179 um an existing sourcecraft customer or
00:22:38.340 like um interested in potentially in
00:22:40.080 using search graph then you can find
00:22:41.880 instructions in the skip Ruby readme or
00:22:43.980 to describe how to try it out I've tried
00:22:45.900 to make configuration as simple as
00:22:47.400 possible like if you're using bundler or
00:22:48.900 like some standard configuration it
00:22:50.580 should mostly just work like with a
00:22:52.620 couple of minutes of setup and you
00:22:54.659 should be able to upload an index to
00:22:56.640 sourcecraft.com for open source code uh
00:22:59.580 and just like start navigating your code
00:23:01.260 there's a couple of example repos as
00:23:03.419 well on sourcecraft.com which is this uh
00:23:06.059 Shopify one and uh Homebrew so the
00:23:08.520 Shopify one as an example is like it's
00:23:10.440 100 survey in some case in some ways
00:23:12.960 it's like kind of best case like what
00:23:14.400 you can get right I'm sure there's still
00:23:16.020 bugs there but um that's kind of like
00:23:18.059 one example the other is uh Homebrew
00:23:20.760 slash Brew so I'm sure many of you know
00:23:22.980 and love uh Homebrew and so The Homebrew
00:23:25.799 code base is kind of uh I think it's
00:23:27.480 like maybe 40 survey or like 30 adoption
00:23:30.900 of survey but you'll be able to navigate
00:23:33.059 core even within fault uh files which
00:23:35.940 aren't supposed to be type checked by
00:23:37.919 sorbet um because of that hack I
00:23:39.720 mentioned before and so that's another
00:23:41.280 example of like what kind of
00:23:43.380 um navigation you can get even when you
00:23:45.480 only partially uh adopted survey right
00:23:48.500 and provide feedback there's a couple of
00:23:50.940 different ways to do it um the QR code
00:23:52.679 there is for our Discord uh the source
00:23:55.440 Crafters card and there's a skip Ruby
00:23:56.940 Channel you can also find the Discord
00:23:58.440 link on our community website
00:24:01.260 um and the other way you can provide
00:24:02.520 feedback to me is uh yeah GitHub issues
00:24:04.679 so those are totally fine too also like
00:24:06.840 if you just have a question right like
00:24:08.280 you don't even like have an issue per se
00:24:10.500 like it's totally fine to open up a
00:24:12.120 GitHub issue
00:24:13.620 um and yeah thank you uh if you have
00:24:15.840 questions for me right now I think we
00:24:17.520 have a bit of time for questions or yeah
00:24:19.559 I can take them afterwards as well thank
00:24:21.299 you
00:24:24.179 foreign