RubyConf 2022

Analyzing an analyzer - A dive into how RuboCop works

To help us with aspects like linting, security or style, many of us have Rubocop analyzing our code. It's a very useful tool that is widely used, easy to set up and configure. Rubocop can even automatically auto-correct your source code as needed. How is this even possible? It turns out that Ruby is really good at taking Ruby code as input and doing various things based on that input. In this talk, I will go through some of the internals of Rubocop to show how it analyzes and makes changes to your source code.

RubyConf 2022

00:00:00.000 ready for takeoff
00:00:16.920 hello everybody and welcome to my talk
00:00:20.039 today analyzing an analyzer a dive into
00:00:23.039 how a Robocop works RoboCop is quite
00:00:26.640 complex and I don't think we can do an
00:00:28.980 exhaustive study on it so this will be
00:00:30.960 more of a regular dive not a deep dive
00:00:33.840 uh my name is Kyle dollar I'm based out
00:00:37.680 of Vancouver Canada
00:00:39.600 I've been working with Ruby and uh rails
00:00:43.800 for well over a decade now I'm in love
00:00:46.200 with the language I love the community I
00:00:48.719 love spaces like this where we can
00:00:50.820 interact with each other and get a sense
00:00:52.860 of what the Ruby Community feels like
00:00:56.039 I'm really drawn to tools that can
00:00:58.800 benefit the entire community and RoboCop
00:01:01.440 is no exception to that and my hope is
00:01:04.860 that after this listening to this talk
00:01:07.020 you'll understand some of the basics
00:01:08.880 about how RoboCop can analyze and
00:01:12.540 correct code and maybe some of you will
00:01:15.000 feel inspired to either contribute
00:01:17.280 custom rules for yourself your
00:01:19.200 organization or even to the open source
00:01:21.540 repository or even start playing around
00:01:23.820 with new tools that utilize similar
00:01:26.340 ideas or concepts
00:01:28.920 I have been working at aha for the past
00:01:31.619 two years and it is one of the best
00:01:33.780 workplaces that I've been a part of we
00:01:36.479 are a human-centric company that's
00:01:38.220 helping other companies build the
00:01:39.659 products that matter for them with our
00:01:41.700 suite of products and we have an amazing
00:01:44.400 team that's uh distributed and all
00:01:46.860 distributed by Design all over the world
00:01:48.439 we have one of the best company cultures
00:01:51.119 that I've seen and it's powered by the
00:01:53.939 responsive method which helps gives us a
00:01:56.640 framework of shared values that we all
00:01:59.280 agree upon and embody and it really
00:02:01.619 helps Empower us so that we can move
00:02:03.540 quickly and stay aligned so if you'd
00:02:06.240 like to be part of that culture of
00:02:07.799 course uh that's higher
00:02:10.879 linters are a static code analysis tool
00:02:15.420 that can be used to flag
00:02:17.819 programming errors suspicious constructs
00:02:20.160 stylistic errors but it can be used to
00:02:23.280 do more too it can be used to alert
00:02:25.200 around security it could be used as a
00:02:28.140 tool for training other engineers
00:02:30.239 and RoboCop is one of the most popular
00:02:32.400 linters for Ruby and I'm just a little
00:02:35.640 bit curious just from a quick show of
00:02:37.500 hands does and has anyone here not
00:02:39.540 worked with RoboCop before
00:02:43.319 so I don't think I see a single hand and
00:02:45.360 that's about what I expected
00:02:48.120 um rails is one of those gems that is
00:02:50.760 very closely tied to Ruby
00:02:53.280 so much so is that there's often the
00:02:54.840 Assumption if you say you're working
00:02:55.860 with Ruby they just assume you're
00:02:57.840 working with rails
00:02:59.700 and to put things in a little bit of
00:03:01.200 perspective rails has been downloaded
00:03:03.720 about 387 million times off from ruby
00:03:06.840 gems and is about the 40th most popular
00:03:09.000 gem
00:03:10.379 RoboCop has been downloaded about 270
00:03:13.319 million times and is about the 76 while
00:03:15.360 not as popular as rails RoboCop is up
00:03:18.239 there that when you talk about Ruby
00:03:20.159 you're also thinking about RoboCop
00:03:23.220 and the original creator of rubble cop
00:03:25.260 gave another talk about this in 2018 at
00:03:28.440 Ruby kaige so after this talk if you're
00:03:31.379 really curious about learning more about
00:03:32.879 RoboCop this is a great resource
00:03:35.879 um you can learn more about RoboCop from
00:03:38.159 him
00:03:39.900 and I wanted to begin today's talk with
00:03:42.120 a little personal story
00:03:43.739 I first got into using robocops several
00:03:46.860 years ago it was at a point where we
00:03:48.959 were trying to figure out how to have an
00:03:50.400 agreed-upon style for all the code that
00:03:53.159 we were writing but we would end up with
00:03:55.440 pull requests that were just full of
00:03:56.879 these nitpick comments that wouldn't
00:03:59.220 really be talking about the content of
00:04:01.319 the pull request but instead talking
00:04:03.239 about how the code looked
00:04:05.340 and sometimes these would be fine but
00:04:08.040 other times we would get into big and
00:04:09.659 often pointless arguments over what
00:04:11.760 seemed like the most trivial kind of
00:04:14.040 things
00:04:15.239 we would get into big debates of whether
00:04:17.280 or not we should use single quotes or
00:04:18.959 double quotes
00:04:20.160 uh what is the maximum line with uh
00:04:22.740 whether or not we need to have an extra
00:04:24.120 line at the end of a guard clause
00:04:26.699 and we thought that if RoboCop could
00:04:28.979 handle our linting and styling for us we
00:04:31.139 could focus all of the comments of pull
00:04:33.000 requests on the actual content of the
00:04:35.280 pull request and this helped a lot
00:04:38.460 except for that was also incredibly
00:04:40.139 frustrating to work with we would write
00:04:42.180 code we pushed up to CI and CI would
00:04:44.460 reject it because RoboCop would have
00:04:46.560 flag violations and we would go back and
00:04:48.600 we'd fix them and do it again
00:04:51.120 um I had a project in which I needed to
00:04:54.000 name space a huge number of constant
00:04:55.800 references
00:04:57.000 so all of the lines became longer and
00:04:59.340 the maximum line length rule became the
00:05:01.199 bane of my existence I'd have to hunt
00:05:03.479 down from CI every line that was over
00:05:05.820 the length figure out how do I break
00:05:07.560 this up into pieces
00:05:09.300 and I hated it
00:05:10.979 and many Engineers had the similar
00:05:12.840 experience and the message of not liking
00:05:15.060 of how RoboCop was rolled out started
00:05:17.580 getting mixed up into the message of we
00:05:19.740 hate RoboCop
00:05:21.600 but then we started really leaning into
00:05:23.280 Robocop and learning about how we could
00:05:25.620 use autocorrect and a lot of the
00:05:27.419 frustration of Robocop started to
00:05:29.580 disappear we could have RoboCop fix our
00:05:32.340 code for us if it detected something
00:05:35.039 that was a violation we could have it
00:05:38.039 handle it automatically and most the
00:05:40.139 time those changes were perfect
00:05:42.780 and so we kept leaning into this as well
00:05:44.699 we started writing our own custom cops
00:05:46.860 to help us migrate uh bad patterns to
00:05:49.259 good ones we wrote some to keep
00:05:52.320 deprecations under control as we were
00:05:54.419 upgrading rails all the while we were
00:05:56.759 using error messages from ruakop as a
00:05:59.220 way to explain Concepts linked to
00:06:01.139 Internal Documentation and give
00:06:04.520 reasons why certain decisions were being
00:06:06.900 made
00:06:07.979 I gave a talk at railsconf at 2020 uh
00:06:12.240 illustrating how RoboCop can be used to
00:06:14.699 communicate information about bad
00:06:16.139 patterns in code and that is another
00:06:17.699 resource that you can look up later if
00:06:20.160 you are curious about learning more
00:06:22.919 and this year RoboCop turns 10. and in
00:06:26.940 the open source world this is a pretty
00:06:28.919 big deal uh development of the rubicop
00:06:32.100 has been pretty much pretty consistent
00:06:33.479 throughout the years and with where
00:06:35.639 we're at I don't see RoboCop going away
00:06:38.000 anytime soon
00:06:40.440 but it's large enough that understanding
00:06:43.020 how RoboCop Works can be really tricky
00:06:46.199 there are thousands of commits thousands
00:06:49.139 of closed issues hundreds of
00:06:51.660 contributors and releases
00:06:54.660 this is not going to be a code review of
00:06:57.660 Robocop uh I don't think I could cover
00:07:00.000 it and do it justice and it would
00:07:01.560 probably be very overwhelming
00:07:03.479 in the history of with the history of
00:07:05.759 Robocop it's a bit of a marathon and
00:07:07.740 we've got 30 minutes here it's a bit of
00:07:09.539 a Sprint
00:07:10.500 so instead this talk is going to be a
00:07:12.960 dive into how does the basics work I'll
00:07:16.139 outline uh the basics and help
00:07:18.539 illustrate how some of the processes
00:07:20.340 function inside of Robocop
00:07:22.919 um with the aim to stay as close to how
00:07:25.560 RoboCop works as possible but I might
00:07:27.479 make some simplifications in this talk
00:07:29.280 just for the purposes of the talk
00:07:32.460 so let's dig into this how does RoboCop
00:07:36.180 work and I thought of a good way to dive
00:07:38.819 into the details of how RoboCop works is
00:07:41.280 to start by looking at the command line
00:07:42.900 interface
00:07:44.099 how would this how does this work for a
00:07:46.680 single file and a single cup
00:07:49.620 so what happens when you run the RoboCop
00:07:52.560 command
00:07:53.759 well at first it will run the executable
00:07:56.699 file which loads up the Ruby library and
00:08:00.240 processes
00:08:01.500 and then load some configuration and
00:08:04.080 these two steps are the easy part I'll
00:08:06.539 touch them a little bit for completeness
00:08:08.099 but there shouldn't be too much
00:08:09.660 surprises here but then we'll get into
00:08:11.580 the the meat of Robocop at some point
00:08:15.479 RoboCop will need to process a file
00:08:18.660 and that is to take a file that exists
00:08:20.580 and do something to it so that it can
00:08:24.000 make decisions about that file and the
00:08:26.460 code inside of it
00:08:29.340 once it's processed this code it will
00:08:31.740 need to run it through a series of cops
00:08:34.260 that will make decisions on whether or
00:08:36.180 not there are any offenses and if there
00:08:39.539 are it can write those or rewrite those
00:08:42.599 that source code and change the file
00:08:45.660 and it will loop back once it's finished
00:08:47.940 uh changing the file and potentially
00:08:50.459 start again
00:08:51.540 and this Loop is important as multiple
00:08:53.820 different cops can adjust the same line
00:08:56.100 of code and so you may need to process
00:08:58.620 the code a couple times also be wary of
00:09:00.839 getting into Infinite loops
00:09:03.600 so we'll break down this process into
00:09:06.060 these steps
00:09:07.920 and at the start of it all is the
00:09:09.720 command line interface now this is
00:09:11.880 pretty straightforward and I'm going to
00:09:13.440 go pretty quick here as this isn't to
00:09:15.420 talk about command line interfaces this
00:09:16.980 is a talk about Rubicon
00:09:18.899 effectively there's just a executable
00:09:21.839 file called RoboCop it's configured to
00:09:24.120 use Ruby it loads the appropriate
00:09:26.339 libraries into the load path requires
00:09:28.440 Robocop and then does some kind of
00:09:31.260 processing
00:09:32.399 and this is where we go from the command
00:09:35.940 line into the world of Ruby and this is
00:09:38.040 our rubricops entry point to doing
00:09:39.779 whatever it wants to do
00:09:42.000 and this the part that it wants to do
00:09:44.160 next is start loading some configuration
00:09:45.839 determine which cops are active uh what
00:09:49.140 options do we need to provide to them in
00:09:51.839 general this is all done through yaml
00:09:54.000 files I won't go through the options
00:09:56.279 here uh they have a pretty well
00:09:58.320 documented online about what options are
00:10:00.240 available or not
00:10:02.220 um effectively there is a big yaml file
00:10:04.500 where you can give it some options that
00:10:06.420 are provided to all cops for instance
00:10:08.399 and you could say which Ruby version
00:10:09.779 you're looking for here are some file
00:10:11.880 patterns to include here's some patterns
00:10:14.220 to exclude you can also provide specific
00:10:17.100 configuration for specific cops you can
00:10:20.100 Nest the configuration based off the
00:10:21.839 comp's name
00:10:23.399 and then that configuration is then
00:10:25.500 available inside of the comp itself
00:10:29.399 now that's pretty quick that's
00:10:32.040 at the end of the day it's just all
00:10:33.420 loaded with uh yaml and then it is a
00:10:37.200 hash inside the Ruby process there's not
00:10:39.000 too much fancy going on there but this
00:10:41.760 is where we get to the more interesting
00:10:43.140 part of the whole process processing the
00:10:45.300 code
00:10:46.200 and this is a bit of a meta topic
00:10:48.300 because we need to talk about code that
00:10:50.519 understands code
00:10:54.600 and to illustrate how this really works
00:10:57.300 I'm going to be leaning into this
00:10:59.459 specific example RoboCop has a style cop
00:11:03.420 called array join that its purpose is to
00:11:07.260 check whether or not there is a star
00:11:09.660 method that is being used to join values
00:11:12.720 of an array
00:11:14.220 so imagine we had some code and we want
00:11:16.620 to determine if any of these bad
00:11:18.779 patterns exist in the code and if it
00:11:21.120 does we want to flag the code AS
00:11:23.279 violating this rule
00:11:25.800 so imagine we've got our array
00:11:28.079 uh or here we've got a an array of
00:11:30.600 strings which contain Foo bar baz that
00:11:33.899 are being joined with the star method
00:11:36.180 and joined with a comma if we wanted to
00:11:38.640 ask does this code violate the rule as a
00:11:43.260 human we can say yes obviously but how
00:11:45.540 do we write code that determines this
00:11:48.180 one option
00:11:50.100 is that we could reach for our fancy
00:11:53.220 tool of regular expressions
00:11:55.320 and we can start with something that
00:11:56.880 already looks pretty complicated
00:11:58.920 uh it looks for an open brace a bunch of
00:12:01.920 non-closing braces closing brace space
00:12:05.700 star space uh it's pretty complicated uh
00:12:09.240 this would match this code though so in
00:12:12.000 this specific example we could use a
00:12:13.740 regular expression but in the world of
00:12:15.240 Ruby things can change what happens if
00:12:17.160 we use single quotes
00:12:19.380 well we could update our regular
00:12:21.720 expression to do something with this
00:12:23.820 we'll figure out which quote you have a
00:12:25.800 back reference for it again we can write
00:12:27.779 things in different ways if there's no
00:12:29.220 spaces what do we do
00:12:31.800 we can continue adjusting the regular
00:12:34.200 expression
00:12:35.459 but at some point we could write this in
00:12:37.920 a completely different way we use a
00:12:39.480 percent W notation to write a series of
00:12:42.540 white space delimited uh strings uh at
00:12:46.500 this point I don't really know how we
00:12:48.360 would do a regular expression to to kind
00:12:51.360 of match this kind of code and get more
00:12:54.300 complex you don't even need parentheses
00:12:55.560 you can use almost any character to
00:12:57.600 delimit this array so this just gets
00:13:00.120 more confusing
00:13:02.279 um
00:13:03.000 and we can continue down this path what
00:13:04.860 if this code was in a comment what was
00:13:07.260 what if this code was in uh nested array
00:13:09.480 what if there's Dynamic strings in here
00:13:11.120 and using a tool like regular
00:13:13.620 Expressions just wouldn't be sufficient
00:13:15.060 to make decisions about code like this I
00:13:17.700 have to include the obligatory XKCD
00:13:19.800 comic whenever talking about regular
00:13:21.300 Expressions because often you end up
00:13:23.100 with more problems
00:13:24.300 but there needs to be a different way
00:13:27.180 and what we need to be able to do is we
00:13:29.459 need to take Ruby code and break it into
00:13:32.339 the abstract syntax tree that what it
00:13:35.040 represents
00:13:36.600 and this abstract syntax tree or the AST
00:13:39.320 is a representation of the code
00:13:44.040 so for example we have a begin rescue
00:13:46.500 block like we do in the corner this is
00:13:48.600 what the corresponding abstract syntax
00:13:50.760 tree would look like now it is a lot so
00:13:53.700 I will break this down
00:13:55.980 and Ruby has a gem that can handle
00:13:58.260 converting code into an ASD called
00:14:00.540 parser it comes with Ruby you don't need
00:14:03.420 to you just need to require it and you
00:14:05.459 can get a representation of what any
00:14:07.560 Ruby code looks like as an AST
00:14:10.740 so let's come back to our array example
00:14:14.100 what does the abstract syntax tree here
00:14:16.620 look like
00:14:18.180 well first this entire thing is a method
00:14:21.300 call
00:14:22.200 and this is represented by the send no
00:14:25.740 no the send node has many children
00:14:29.160 the first child of it is what is this
00:14:32.399 method being called on so in this case
00:14:34.680 it's being called on an array and that
00:14:37.079 array contains three elements a string
00:14:39.240 Foo a string bar and a string bass
00:14:42.899 the second child of the send method is
00:14:45.600 the name of the method that is being
00:14:47.220 called so in this case the method name
00:14:49.380 is star
00:14:51.060 and the rest of the children are these
00:14:53.760 of the send node are the arguments that
00:14:55.440 are provided to the method uh so in this
00:14:57.540 case there is only one argument and
00:14:59.100 that's just a string that is a comma
00:15:02.220 now if we come back to what the code
00:15:04.620 looked like and we wanted to check what
00:15:06.480 the abstract syntax tree looked like uh
00:15:09.180 with parser we can run this code through
00:15:11.459 parser and when we format it it would
00:15:13.800 look something like this
00:15:16.500 um and what's great about this abstract
00:15:18.300 tree or abstract syntax tree
00:15:20.040 representation is if we start writing
00:15:21.839 this code in a different way we use
00:15:23.459 single quotes instead of double quotes
00:15:25.560 the abstract syntax tree doesn't change
00:15:27.779 if we write it without spaces
00:15:30.240 the abstract syntax tree doesn't change
00:15:32.760 and this is true for the percent W
00:15:34.860 notation as well so this is what RoboCop
00:15:37.800 uses under the hood to be able to make
00:15:40.800 decisions about code
00:15:43.380 they have a utility gem available called
00:15:46.320 RoboCop AST
00:15:48.240 that extends the functionality of the
00:15:50.579 parser gem to make things a little bit
00:15:52.860 more simple or easier to read
00:15:56.040 so RoboCop has this process Source class
00:15:58.860 that you could use if you wanted to
00:16:00.420 where you could provide it's some code
00:16:02.519 in a string form so this could be read
00:16:04.199 directly from a file
00:16:05.940 and the version of Ruby you are
00:16:08.220 interested in and you can look at the
00:16:09.720 abstract syntax tree and start to ask
00:16:11.820 questions about it
00:16:13.560 so if we looked at this code
00:16:15.480 and we wanted to know if it violated
00:16:17.339 that array join rule we could start
00:16:19.500 asking some questions we could ask is
00:16:21.540 this a send type and we could say yeah
00:16:24.180 it is
00:16:25.560 we could look at the receiver of this
00:16:27.720 method and ask it is it an array type
00:16:30.180 it is we could check to see if the
00:16:33.180 method name is star
00:16:35.760 and we could also look at the arguments
00:16:37.500 is there one argument is that argument a
00:16:40.440 string
00:16:41.759 so this gives us a way to start asking
00:16:44.940 meaningful questions of code in a very
00:16:48.600 repeatable manner
00:16:51.060 okay so now that we have a little bit of
00:16:53.519 understanding about how RoboCop is
00:16:55.139 processing the code
00:16:56.579 how does it actually start applying the
00:16:59.399 specific cops
00:17:02.579 to understand that we need to make our
00:17:04.799 example just a little bit more complex
00:17:07.559 what if our array joining code is
00:17:10.199 wrapped in a method
00:17:11.819 Lis seems to throw a bit of a wrench
00:17:13.799 into things we can no longer ask if this
00:17:16.439 is a send type anymore it's not this is
00:17:18.600 a method definition
00:17:20.160 we can't ask if the receiver is an array
00:17:23.459 type because there is no receiver for
00:17:25.559 this method definition
00:17:27.600 the method name here is not star
00:17:30.600 we do have one argument though so that
00:17:32.940 we do have that going for us
00:17:35.460 what we need to do is we need to break
00:17:37.500 this code down into its abstract tree
00:17:39.960 representation abstract syntax tree
00:17:41.760 representation and when we do it looks
00:17:43.860 something like this you can see the
00:17:45.720 original send node that we wanted to
00:17:47.340 focus on is there but it's wrapped in
00:17:49.620 another node this def node
00:17:52.740 so in order to navigate and find these
00:17:55.620 various nodes we need to walk through
00:17:57.840 the abstract syntax tree
00:18:00.240 so what we'll go through next is a very
00:18:02.280 simplified version of how do we walk in
00:18:04.799 abstract syntax tree
00:18:07.020 to do so we could start with a method
00:18:08.820 like walk
00:18:10.380 and the goal of this will can we visit
00:18:12.240 every single node inside of this
00:18:14.820 abstract syntax tree
00:18:16.980 to do so we'll start with passing it the
00:18:18.960 entire abstract syntax tree
00:18:21.240 and we'll need some sort of convention
00:18:22.860 on determining what do we call the
00:18:25.080 methods for our various nodes so for
00:18:27.720 this we'll create a method for each node
00:18:29.640 type that's prefixed by on
00:18:32.280 so here we'll need a method called on
00:18:35.100 Def because def is the top level node of
00:18:38.760 this abstract syntax tree
00:18:41.820 from here what does on Def do well we in
00:18:45.480 order to understand this method we need
00:18:47.820 to know what what does the def node in
00:18:50.100 the abstract syntax tree look like
00:18:53.160 it has many children its first child is
00:18:56.100 the name of the method that's being
00:18:57.840 defined this is a symbol so we don't
00:19:00.660 need to do anything further here
00:19:02.940 uh the second child of the death note is
00:19:05.280 the arguments that are passed in
00:19:07.380 so to deal with this convention we'll
00:19:09.539 create an on args method we'll hand it
00:19:11.580 the arguments that will continue to
00:19:13.559 recursively visit these nodes
00:19:16.320 the last child is the body of the method
00:19:19.679 it could be a few different things but
00:19:21.780 to keep it simple we'll focus on what it
00:19:23.340 is here so we'll need to write in on
00:19:26.039 send method
00:19:28.380 so we now we can start looking at those
00:19:31.200 methods what does the on args do
00:19:34.740 this has many children and each child is
00:19:38.700 an ARG so we could walk through each of
00:19:41.160 the the children of this piece of the
00:19:43.260 AST and call on ARG on Egypt
00:19:46.559 and the odd and ARG itself doesn't need
00:19:49.200 to do anything further it doesn't have
00:19:51.000 anything further nested so we don't need
00:19:52.679 to do anything to navigate this entire
00:19:54.419 tree
00:19:56.820 moving back over to the on send method
00:19:59.460 the send node itself we've gone over
00:20:01.500 before
00:20:02.640 and we know that the first child is is
00:20:04.679 the receiver so we can call the ONG
00:20:06.720 array here
00:20:08.280 the second is the name of the method
00:20:11.160 that's being called which in this case
00:20:12.660 is star it's a symbol we don't need to
00:20:14.160 do anything further and the rest of the
00:20:16.320 children are the arguments to the method
00:20:18.780 so in this case there's only one
00:20:20.220 argument which is the string so we'll
00:20:21.840 call the on Str method
00:20:25.320 almost done here the on array method
00:20:28.500 looks very similar to what the on args
00:20:30.780 look like we go over all of the children
00:20:33.059 and we need to call a method based off
00:20:35.280 what type is inside of the array so in
00:20:38.520 this case we just call on STR on each
00:20:41.520 one of these uh elements of the array
00:20:45.539 and lastly the on SDR method doesn't
00:20:48.360 have any further children so we don't
00:20:50.220 need to do anything
00:20:52.140 and so utilizing all of these methods we
00:20:55.200 can visit all elements of this abstract
00:20:58.020 syntax tree and based off where we are
00:21:00.960 in the true we can start to tie in
00:21:02.640 behavior of what we want to do
00:21:05.820 and this is all fairly complicated still
00:21:08.160 but RoboCop AST that gem that I talked
00:21:11.460 about earlier has abstracted this away
00:21:13.020 into a module called RoboCop AST
00:21:15.419 traversal and using this it can Traverse
00:21:18.600 over any arbitrary abstract syntax tree
00:21:22.140 it'll visit each node of the tree in a
00:21:25.200 depth first search
00:21:27.780 um and if any of the cops that are
00:21:30.780 active
00:21:31.799 Define methods that are interested in
00:21:34.080 each piece of that abstract syntax tree
00:21:36.240 it will hand that node over to the cop
00:21:38.460 to handle its own logic
00:21:42.179 so what this could look like is we could
00:21:44.280 have a class that includes this module
00:21:46.260 and this would allow it to Traverse over
00:21:48.659 any arbitrary abstract syntax tree
00:21:51.659 and the object could hold a reference to
00:21:53.760 whatever cops are available and enabled
00:21:56.820 and then for each type of node it can
00:22:00.179 walk through the cops that are available
00:22:01.799 determine if they respond to that method
00:22:04.440 and if so
00:22:05.520 give them that piece of the abstract
00:22:07.260 syntax tree
00:22:08.640 I've only listed a single method here
00:22:10.679 the onsend but effectively this is what
00:22:13.200 RoboCop does it does this for every
00:22:15.659 possible type of node
00:22:18.299 and if we were to look at what the cop
00:22:20.640 for that array join cop looks like the
00:22:24.360 source code for this
00:22:26.280 um there's while there's a lot going on
00:22:28.020 here the main thing here is that the cop
00:22:29.940 just defines an on send method and this
00:22:32.460 is past every send node so this cop when
00:22:36.240 run through RoboCop will see every
00:22:38.159 method call that is in your source code
00:22:40.080 and this will allow the cop to ask
00:22:42.120 specific questions to that method call
00:22:44.760 is the receiver an array is the method
00:22:47.580 named star and if it matches everything
00:22:49.799 that it needs to match it can record an
00:22:52.260 appropriate offense
00:22:55.740 and this is how each cop ties into the
00:22:58.020 source code if we were to look at
00:22:59.820 another cop example just for a quick
00:23:02.280 moment we can look we can see something
00:23:04.320 similar for instance if we look at this
00:23:06.120 cop which is the min max cop
00:23:08.299 it is looking for when the Min and the
00:23:11.700 max are returned as two elements of the
00:23:13.980 same array this would need to do some
00:23:16.320 sort of logic and it on each array that
00:23:19.020 it encounters
00:23:20.520 the source code for this
00:23:23.400 just defines an on array method which
00:23:26.460 can then tie into every single array
00:23:28.440 definition and it can start asking
00:23:30.360 questions about this all right does this
00:23:32.580 match the Min and the max
00:23:34.559 this allows each cop to focus on only
00:23:37.440 the pieces of the abstract syntax tree
00:23:39.539 that they're interested in and make very
00:23:41.880 focused decisions
00:23:44.700 and there are many many different types
00:23:47.580 of nodes possible in the abstract syntax
00:23:49.919 trees you could have nodes for local
00:23:53.340 variable assignments you can have nodes
00:23:56.520 for class definitions or for yield
00:23:58.620 blocks some of these nodes I'm not
00:24:01.020 entirely sure how you would even write
00:24:02.880 Ruby code to represent them
00:24:06.000 um but if you can represent them as
00:24:08.480 abstract syntax tree RoboCop can tie
00:24:11.340 into that piece and start making
00:24:12.840 decisions about it
00:24:15.900 so finally we get to a point where we
00:24:17.520 can start asking RoboCop
00:24:19.740 how does it change the abstract syntax
00:24:22.020 tree and auto correct it
00:24:24.059 if we were to take the method that we
00:24:25.919 looked at earlier and we were to fix it
00:24:27.780 we would want to do something like this
00:24:29.100 where we replace remove the star replace
00:24:30.960 it with join and wrap the arguments
00:24:34.860 lucky for us the parser gem that we
00:24:37.500 talked about earlier has a class called
00:24:39.539 the tree rewriter and as they say it
00:24:42.240 performs all of the heavy lifting for
00:24:44.340 the sorcery writing process so you can
00:24:46.559 have multiple rewrites and it will
00:24:48.480 perform them all in the correct order
00:24:51.059 the general structure would look
00:24:52.500 something like this where you can get
00:24:53.940 the abstract syntax tree for any
00:24:55.799 arbitrary source code
00:24:57.419 and pass it along to the tree rewriter
00:24:59.460 class but from here what can we do
00:25:02.280 one of the things you can do is call the
00:25:05.100 replace method which as the document
00:25:06.780 says will replace the source code with
00:25:09.780 with us of a specific range with new
00:25:11.940 content
00:25:13.080 we can do other things with the tree
00:25:14.520 rewriters such as adding things before
00:25:16.200 or after or removing content but we'll
00:25:18.059 focus on this for right now
00:25:19.919 and two immediate questions should pop
00:25:22.020 up which is what is the range and how do
00:25:25.140 we determine the content
00:25:27.539 looking at the range the range is this
00:25:29.520 special class that comes from the parser
00:25:31.799 gem itself and it outlines a range of
00:25:34.260 characters that represent various Ruby
00:25:36.840 expressions
00:25:38.220 and there's a method available on all of
00:25:40.200 the code that is parsed with parser
00:25:42.000 called location
00:25:43.559 and this gives a lot of really
00:25:45.059 interesting aspects of that piece of the
00:25:47.100 abstract syntax tree but one piece of
00:25:49.380 that is the expression and that provides
00:25:51.840 the range for that entire piece of the
00:25:54.600 abstract syntax tree
00:25:56.460 so if we were to focus on that send node
00:25:59.100 which we determined was the third child
00:26:01.080 of that overall AST we can get to where
00:26:04.020 this source code is located by looking
00:26:06.000 at the location dot expression and this
00:26:08.520 is how we can get the range of what's
00:26:10.200 code needs to change
00:26:13.500 now when we start looking into content
00:26:15.240 of what how do we determine the content
00:26:17.340 it should be replaced with well we knew
00:26:19.320 that the receiver was an array type but
00:26:21.000 we can ask more questions about this
00:26:22.860 receiver for instance we can ask it
00:26:24.960 explicitly what's the source and this
00:26:27.480 will give us exactly as it was written
00:26:29.400 the source code of there's this array
00:26:32.640 and we can save this as a string to use
00:26:34.980 later
00:26:35.820 similarly
00:26:37.260 we can look at the arguments and look at
00:26:40.140 the source of the arguments and this
00:26:41.760 lets us get explicitly from the code
00:26:43.580 exactly that was passed in
00:26:47.880 putting this all together we can get
00:26:49.500 something like this where we set up our
00:26:51.120 code we set up our street rewriter we
00:26:53.640 focus on that send node specifically and
00:26:56.400 we grab the location of that send node
00:26:58.500 using that location.expression we can
00:27:00.539 get the source array we can get the
00:27:03.000 source arguments and we can call this
00:27:05.340 replace method to
00:27:07.260 wrap this array or with this join that
00:27:11.100 wraps the argument
00:27:12.720 and from here we can have the tree right
00:27:14.279 or process it and we end up with this
00:27:16.559 Rewritten method exactly the way we
00:27:19.020 wanted it
00:27:20.400 and effectively this is what RoboCop
00:27:22.200 does they have their own corrector which
00:27:25.320 is an object that wraps the tree
00:27:26.820 rewriter so you don't need to know
00:27:28.140 certain things like the location
00:27:29.400 expression you can pass in directly the
00:27:32.460 piece of the abstract syntax tree but
00:27:34.860 all of the other methods such as replace
00:27:36.360 are available
00:27:38.159 and this allows each cop to handle its
00:27:41.400 own changes whenever it detects
00:27:42.900 something that violates a certain
00:27:44.580 pattern and when it's done RoboCop will
00:27:47.580 write these auto corrected changes back
00:27:49.559 to the file and if necessary reprocess
00:27:52.020 the new file and start again
00:27:55.140 okay so even after now 28 minutes this
00:27:58.919 this can be a lot this is a crash course
00:28:01.140 through
00:28:02.460 Rubicon we've gone through some of the
00:28:04.500 different pieces uh simplistic version
00:28:06.419 of Robocop we've gone through pieces of
00:28:09.179 the command line interface and
00:28:10.980 configuration
00:28:12.240 we've walked through how RoboCop can
00:28:14.640 take that file and convert it to an
00:28:16.799 abstract syntax tree so it can interact
00:28:18.480 with it
00:28:19.440 we talked about how it could then
00:28:21.000 Traverse that abstract syntax tree and
00:28:23.820 rewrite it and then write it back to the
00:28:25.799 file
00:28:27.299 I'm hoping that this has helped you
00:28:28.919 understand a little bit about how
00:28:30.600 RoboCop functions and I hope this has
00:28:33.299 got you thinking about how you can start
00:28:34.620 utilizing this knowledge as I mentioned
00:28:37.500 earlier there's some other great
00:28:39.419 resources online if you want to learn
00:28:41.460 more but if you are interested in
00:28:44.039 building a tool that needs to analyze
00:28:46.679 Source codes there's probably little
00:28:47.820 bits of pulling from the abstract syntax
00:28:49.860 tree or walking through an abstract
00:28:51.539 syntax tree that could all be really
00:28:53.400 useful
00:28:55.020 um I hope you found all of this really
00:28:56.279 interesting uh thank you for your time
00:28:58.080 uh I don't know if we have time for
00:29:00.179 questions uh so I'm just going to end it
00:29:02.039 there if you have questions for me
00:29:03.480 personally come up after the talk and
00:29:05.039 I'll be happy to answer