Ruby Video | Analyzing an analyzer - A dive into how RuboCop works

00:00:00.000 ready for takeoff

00:00:16.920 hello everybody and welcome to my talk

00:00:20.039 today analyzing an analyzer a dive into

00:00:23.039 how a Robocop works RoboCop is quite

00:00:26.640 complex and I don't think we can do an

00:00:28.980 exhaustive study on it so this will be

00:00:30.960 more of a regular dive not a deep dive

00:00:33.840 uh my name is Kyle dollar I'm based out

00:00:37.680 of Vancouver Canada

00:00:39.600 I've been working with Ruby and uh rails

00:00:43.800 for well over a decade now I'm in love

00:00:46.200 with the language I love the community I

00:00:48.719 love spaces like this where we can

00:00:50.820 interact with each other and get a sense

00:00:52.860 of what the Ruby Community feels like

00:00:56.039 I'm really drawn to tools that can

00:00:58.800 benefit the entire community and RoboCop

00:01:01.440 is no exception to that and my hope is

00:01:04.860 that after this listening to this talk

00:01:07.020 you'll understand some of the basics

00:01:08.880 about how RoboCop can analyze and

00:01:12.540 correct code and maybe some of you will

00:01:15.000 feel inspired to either contribute

00:01:17.280 custom rules for yourself your

00:01:19.200 organization or even to the open source

00:01:21.540 repository or even start playing around

00:01:23.820 with new tools that utilize similar

00:01:26.340 ideas or concepts

00:01:28.920 I have been working at aha for the past

00:01:31.619 two years and it is one of the best

00:01:33.780 workplaces that I've been a part of we

00:01:36.479 are a human-centric company that's

00:01:38.220 helping other companies build the

00:01:39.659 products that matter for them with our

00:01:41.700 suite of products and we have an amazing

00:01:44.400 team that's uh distributed and all

00:01:46.860 distributed by Design all over the world

00:01:48.439 we have one of the best company cultures

00:01:51.119 that I've seen and it's powered by the

00:01:53.939 responsive method which helps gives us a

00:01:56.640 framework of shared values that we all

00:01:59.280 agree upon and embody and it really

00:02:01.619 helps Empower us so that we can move

00:02:03.540 quickly and stay aligned so if you'd

00:02:06.240 like to be part of that culture of

00:02:07.799 course uh that's higher

00:02:10.879 linters are a static code analysis tool

00:02:15.420 that can be used to flag

00:02:17.819 programming errors suspicious constructs

00:02:20.160 stylistic errors but it can be used to

00:02:23.280 do more too it can be used to alert

00:02:25.200 around security it could be used as a

00:02:28.140 tool for training other engineers

00:02:30.239 and RoboCop is one of the most popular

00:02:32.400 linters for Ruby and I'm just a little

00:02:35.640 bit curious just from a quick show of

00:02:37.500 hands does and has anyone here not

00:02:39.540 worked with RoboCop before

00:02:43.319 so I don't think I see a single hand and

00:02:45.360 that's about what I expected

00:02:48.120 um rails is one of those gems that is

00:02:50.760 very closely tied to Ruby

00:02:53.280 so much so is that there's often the

00:02:54.840 Assumption if you say you're working

00:02:55.860 with Ruby they just assume you're

00:02:57.840 working with rails

00:02:59.700 and to put things in a little bit of

00:03:01.200 perspective rails has been downloaded

00:03:03.720 about 387 million times off from ruby

00:03:06.840 gems and is about the 40th most popular

00:03:09.000 gem

00:03:10.379 RoboCop has been downloaded about 270

00:03:13.319 million times and is about the 76 while

00:03:15.360 not as popular as rails RoboCop is up

00:03:18.239 there that when you talk about Ruby

00:03:20.159 you're also thinking about RoboCop

00:03:23.220 and the original creator of rubble cop

00:03:25.260 gave another talk about this in 2018 at

00:03:28.440 Ruby kaige so after this talk if you're

00:03:31.379 really curious about learning more about

00:03:32.879 RoboCop this is a great resource

00:03:35.879 um you can learn more about RoboCop from

00:03:38.159 him

00:03:39.900 and I wanted to begin today's talk with

00:03:42.120 a little personal story

00:03:43.739 I first got into using robocops several

00:03:46.860 years ago it was at a point where we

00:03:48.959 were trying to figure out how to have an

00:03:50.400 agreed-upon style for all the code that

00:03:53.159 we were writing but we would end up with

00:03:55.440 pull requests that were just full of

00:03:56.879 these nitpick comments that wouldn't

00:03:59.220 really be talking about the content of

00:04:01.319 the pull request but instead talking

00:04:03.239 about how the code looked

00:04:05.340 and sometimes these would be fine but

00:04:08.040 other times we would get into big and

00:04:09.659 often pointless arguments over what

00:04:11.760 seemed like the most trivial kind of

00:04:14.040 things

00:04:15.239 we would get into big debates of whether

00:04:17.280 or not we should use single quotes or

00:04:18.959 double quotes

00:04:20.160 uh what is the maximum line with uh

00:04:22.740 whether or not we need to have an extra

00:04:24.120 line at the end of a guard clause

00:04:26.699 and we thought that if RoboCop could

00:04:28.979 handle our linting and styling for us we

00:04:31.139 could focus all of the comments of pull

00:04:33.000 requests on the actual content of the

00:04:35.280 pull request and this helped a lot

00:04:38.460 except for that was also incredibly

00:04:40.139 frustrating to work with we would write

00:04:42.180 code we pushed up to CI and CI would

00:04:44.460 reject it because RoboCop would have

00:04:46.560 flag violations and we would go back and

00:04:48.600 we'd fix them and do it again

00:04:51.120 um I had a project in which I needed to

00:04:54.000 name space a huge number of constant

00:04:55.800 references

00:04:57.000 so all of the lines became longer and

00:04:59.340 the maximum line length rule became the

00:05:01.199 bane of my existence I'd have to hunt

00:05:03.479 down from CI every line that was over

00:05:05.820 the length figure out how do I break

00:05:07.560 this up into pieces

00:05:09.300 and I hated it

00:05:10.979 and many Engineers had the similar

00:05:12.840 experience and the message of not liking

00:05:15.060 of how RoboCop was rolled out started

00:05:17.580 getting mixed up into the message of we

00:05:19.740 hate RoboCop

00:05:21.600 but then we started really leaning into

00:05:23.280 Robocop and learning about how we could

00:05:25.620 use autocorrect and a lot of the

00:05:27.419 frustration of Robocop started to

00:05:29.580 disappear we could have RoboCop fix our

00:05:32.340 code for us if it detected something

00:05:35.039 that was a violation we could have it

00:05:38.039 handle it automatically and most the

00:05:40.139 time those changes were perfect

00:05:42.780 and so we kept leaning into this as well

00:05:44.699 we started writing our own custom cops

00:05:46.860 to help us migrate uh bad patterns to

00:05:49.259 good ones we wrote some to keep

00:05:52.320 deprecations under control as we were

00:05:54.419 upgrading rails all the while we were

00:05:56.759 using error messages from ruakop as a

00:05:59.220 way to explain Concepts linked to

00:06:01.139 Internal Documentation and give

00:06:04.520 reasons why certain decisions were being

00:06:06.900 made

00:06:07.979 I gave a talk at railsconf at 2020 uh

00:06:12.240 illustrating how RoboCop can be used to

00:06:14.699 communicate information about bad

00:06:16.139 patterns in code and that is another

00:06:17.699 resource that you can look up later if

00:06:20.160 you are curious about learning more

00:06:22.919 and this year RoboCop turns 10. and in

00:06:26.940 the open source world this is a pretty

00:06:28.919 big deal uh development of the rubicop

00:06:32.100 has been pretty much pretty consistent

00:06:33.479 throughout the years and with where

00:06:35.639 we're at I don't see RoboCop going away

00:06:38.000 anytime soon

00:06:40.440 but it's large enough that understanding

00:06:43.020 how RoboCop Works can be really tricky

00:06:46.199 there are thousands of commits thousands

00:06:49.139 of closed issues hundreds of

00:06:51.660 contributors and releases

00:06:54.660 this is not going to be a code review of

00:06:57.660 Robocop uh I don't think I could cover

00:07:00.000 it and do it justice and it would

00:07:01.560 probably be very overwhelming

00:07:03.479 in the history of with the history of

00:07:05.759 Robocop it's a bit of a marathon and

00:07:07.740 we've got 30 minutes here it's a bit of

00:07:09.539 a Sprint

00:07:10.500 so instead this talk is going to be a

00:07:12.960 dive into how does the basics work I'll

00:07:16.139 outline uh the basics and help

00:07:18.539 illustrate how some of the processes

00:07:20.340 function inside of Robocop

00:07:22.919 um with the aim to stay as close to how

00:07:25.560 RoboCop works as possible but I might

00:07:27.479 make some simplifications in this talk

00:07:29.280 just for the purposes of the talk

00:07:32.460 so let's dig into this how does RoboCop

00:07:36.180 work and I thought of a good way to dive

00:07:38.819 into the details of how RoboCop works is

00:07:41.280 to start by looking at the command line

00:07:42.900 interface

00:07:44.099 how would this how does this work for a

00:07:46.680 single file and a single cup

00:07:49.620 so what happens when you run the RoboCop

00:07:52.560 command

00:07:53.759 well at first it will run the executable

00:07:56.699 file which loads up the Ruby library and

00:08:00.240 processes

00:08:01.500 and then load some configuration and

00:08:04.080 these two steps are the easy part I'll

00:08:06.539 touch them a little bit for completeness

00:08:08.099 but there shouldn't be too much

00:08:09.660 surprises here but then we'll get into

00:08:11.580 the the meat of Robocop at some point

00:08:15.479 RoboCop will need to process a file

00:08:18.660 and that is to take a file that exists

00:08:20.580 and do something to it so that it can

00:08:24.000 make decisions about that file and the

00:08:26.460 code inside of it

00:08:29.340 once it's processed this code it will

00:08:31.740 need to run it through a series of cops

00:08:34.260 that will make decisions on whether or

00:08:36.180 not there are any offenses and if there

00:08:39.539 are it can write those or rewrite those

00:08:42.599 that source code and change the file

00:08:45.660 and it will loop back once it's finished

00:08:47.940 uh changing the file and potentially

00:08:50.459 start again

00:08:51.540 and this Loop is important as multiple

00:08:53.820 different cops can adjust the same line

00:08:56.100 of code and so you may need to process

00:08:58.620 the code a couple times also be wary of

00:09:00.839 getting into Infinite loops

00:09:03.600 so we'll break down this process into

00:09:06.060 these steps

00:09:07.920 and at the start of it all is the

00:09:09.720 command line interface now this is

00:09:11.880 pretty straightforward and I'm going to

00:09:13.440 go pretty quick here as this isn't to

00:09:15.420 talk about command line interfaces this

00:09:16.980 is a talk about Rubicon

00:09:18.899 effectively there's just a executable

00:09:21.839 file called RoboCop it's configured to

00:09:24.120 use Ruby it loads the appropriate

00:09:26.339 libraries into the load path requires

00:09:28.440 Robocop and then does some kind of

00:09:31.260 processing

00:09:32.399 and this is where we go from the command

00:09:35.940 line into the world of Ruby and this is

00:09:38.040 our rubricops entry point to doing

00:09:39.779 whatever it wants to do

00:09:42.000 and this the part that it wants to do

00:09:44.160 next is start loading some configuration

00:09:45.839 determine which cops are active uh what

00:09:49.140 options do we need to provide to them in

00:09:51.839 general this is all done through yaml

00:09:54.000 files I won't go through the options

00:09:56.279 here uh they have a pretty well

00:09:58.320 documented online about what options are

00:10:00.240 available or not

00:10:02.220 um effectively there is a big yaml file

00:10:04.500 where you can give it some options that

00:10:06.420 are provided to all cops for instance

00:10:08.399 and you could say which Ruby version

00:10:09.779 you're looking for here are some file

00:10:11.880 patterns to include here's some patterns

00:10:14.220 to exclude you can also provide specific

00:10:17.100 configuration for specific cops you can

00:10:20.100 Nest the configuration based off the

00:10:21.839 comp's name

00:10:23.399 and then that configuration is then

00:10:25.500 available inside of the comp itself

00:10:29.399 now that's pretty quick that's

00:10:32.040 at the end of the day it's just all

00:10:33.420 loaded with uh yaml and then it is a

00:10:37.200 hash inside the Ruby process there's not

00:10:39.000 too much fancy going on there but this

00:10:41.760 is where we get to the more interesting

00:10:43.140 part of the whole process processing the

00:10:45.300 code

00:10:46.200 and this is a bit of a meta topic

00:10:48.300 because we need to talk about code that

00:10:50.519 understands code

00:10:54.600 and to illustrate how this really works

00:10:57.300 I'm going to be leaning into this

00:10:59.459 specific example RoboCop has a style cop

00:11:03.420 called array join that its purpose is to

00:11:07.260 check whether or not there is a star

00:11:09.660 method that is being used to join values

00:11:12.720 of an array

00:11:14.220 so imagine we had some code and we want

00:11:16.620 to determine if any of these bad

00:11:18.779 patterns exist in the code and if it

00:11:21.120 does we want to flag the code AS

00:11:23.279 violating this rule

00:11:25.800 so imagine we've got our array

00:11:28.079 uh or here we've got a an array of

00:11:30.600 strings which contain Foo bar baz that

00:11:33.899 are being joined with the star method

00:11:36.180 and joined with a comma if we wanted to

00:11:38.640 ask does this code violate the rule as a

00:11:43.260 human we can say yes obviously but how

00:11:45.540 do we write code that determines this

00:11:48.180 one option

00:11:50.100 is that we could reach for our fancy

00:11:53.220 tool of regular expressions

00:11:55.320 and we can start with something that

00:11:56.880 already looks pretty complicated

00:11:58.920 uh it looks for an open brace a bunch of

00:12:01.920 non-closing braces closing brace space

00:12:05.700 star space uh it's pretty complicated uh

00:12:09.240 this would match this code though so in

00:12:12.000 this specific example we could use a

00:12:13.740 regular expression but in the world of

00:12:15.240 Ruby things can change what happens if

00:12:17.160 we use single quotes

00:12:19.380 well we could update our regular

00:12:21.720 expression to do something with this

00:12:23.820 we'll figure out which quote you have a

00:12:25.800 back reference for it again we can write

00:12:27.779 things in different ways if there's no

00:12:29.220 spaces what do we do

00:12:31.800 we can continue adjusting the regular

00:12:34.200 expression

00:12:35.459 but at some point we could write this in

00:12:37.920 a completely different way we use a

00:12:39.480 percent W notation to write a series of

00:12:42.540 white space delimited uh strings uh at

00:12:46.500 this point I don't really know how we

00:12:48.360 would do a regular expression to to kind

00:12:51.360 of match this kind of code and get more

00:12:54.300 complex you don't even need parentheses

00:12:55.560 you can use almost any character to

00:12:57.600 delimit this array so this just gets

00:13:00.120 more confusing

00:13:02.279 um

00:13:03.000 and we can continue down this path what

00:13:04.860 if this code was in a comment what was

00:13:07.260 what if this code was in uh nested array

00:13:09.480 what if there's Dynamic strings in here

00:13:11.120 and using a tool like regular

00:13:13.620 Expressions just wouldn't be sufficient

00:13:15.060 to make decisions about code like this I

00:13:17.700 have to include the obligatory XKCD

00:13:19.800 comic whenever talking about regular

00:13:21.300 Expressions because often you end up

00:13:23.100 with more problems

00:13:24.300 but there needs to be a different way

00:13:27.180 and what we need to be able to do is we

00:13:29.459 need to take Ruby code and break it into

00:13:32.339 the abstract syntax tree that what it

00:13:35.040 represents

00:13:36.600 and this abstract syntax tree or the AST

00:13:39.320 is a representation of the code

00:13:44.040 so for example we have a begin rescue

00:13:46.500 block like we do in the corner this is

00:13:48.600 what the corresponding abstract syntax

00:13:50.760 tree would look like now it is a lot so

00:13:53.700 I will break this down

00:13:55.980 and Ruby has a gem that can handle

00:13:58.260 converting code into an ASD called

00:14:00.540 parser it comes with Ruby you don't need

00:14:03.420 to you just need to require it and you

00:14:05.459 can get a representation of what any

00:14:07.560 Ruby code looks like as an AST

00:14:10.740 so let's come back to our array example

00:14:14.100 what does the abstract syntax tree here

00:14:16.620 look like

00:14:18.180 well first this entire thing is a method

00:14:21.300 call

00:14:22.200 and this is represented by the send no

00:14:25.740 no the send node has many children

00:14:29.160 the first child of it is what is this

00:14:32.399 method being called on so in this case

00:14:34.680 it's being called on an array and that

00:14:37.079 array contains three elements a string

00:14:39.240 Foo a string bar and a string bass

00:14:42.899 the second child of the send method is

00:14:45.600 the name of the method that is being

00:14:47.220 called so in this case the method name

00:14:49.380 is star

00:14:51.060 and the rest of the children are these

00:14:53.760 of the send node are the arguments that

00:14:55.440 are provided to the method uh so in this

00:14:57.540 case there is only one argument and

00:14:59.100 that's just a string that is a comma

00:15:02.220 now if we come back to what the code

00:15:04.620 looked like and we wanted to check what

00:15:06.480 the abstract syntax tree looked like uh

00:15:09.180 with parser we can run this code through

00:15:11.459 parser and when we format it it would

00:15:13.800 look something like this

00:15:16.500 um and what's great about this abstract

00:15:18.300 tree or abstract syntax tree

00:15:20.040 representation is if we start writing

00:15:21.839 this code in a different way we use

00:15:23.459 single quotes instead of double quotes

00:15:25.560 the abstract syntax tree doesn't change

00:15:27.779 if we write it without spaces

00:15:30.240 the abstract syntax tree doesn't change

00:15:32.760 and this is true for the percent W

00:15:34.860 notation as well so this is what RoboCop

00:15:37.800 uses under the hood to be able to make

00:15:40.800 decisions about code

00:15:43.380 they have a utility gem available called

00:15:46.320 RoboCop AST

00:15:48.240 that extends the functionality of the

00:15:50.579 parser gem to make things a little bit

00:15:52.860 more simple or easier to read

00:15:56.040 so RoboCop has this process Source class

00:15:58.860 that you could use if you wanted to

00:16:00.420 where you could provide it's some code

00:16:02.519 in a string form so this could be read

00:16:04.199 directly from a file

00:16:05.940 and the version of Ruby you are

00:16:08.220 interested in and you can look at the

00:16:09.720 abstract syntax tree and start to ask

00:16:11.820 questions about it

00:16:13.560 so if we looked at this code

00:16:15.480 and we wanted to know if it violated

00:16:17.339 that array join rule we could start

00:16:19.500 asking some questions we could ask is

00:16:21.540 this a send type and we could say yeah

00:16:24.180 it is

00:16:25.560 we could look at the receiver of this

00:16:27.720 method and ask it is it an array type

00:16:30.180 it is we could check to see if the

00:16:33.180 method name is star

00:16:35.760 and we could also look at the arguments

00:16:37.500 is there one argument is that argument a

00:16:40.440 string

00:16:41.759 so this gives us a way to start asking

00:16:44.940 meaningful questions of code in a very

00:16:48.600 repeatable manner

00:16:51.060 okay so now that we have a little bit of

00:16:53.519 understanding about how RoboCop is

00:16:55.139 processing the code

00:16:56.579 how does it actually start applying the

00:16:59.399 specific cops

00:17:02.579 to understand that we need to make our

00:17:04.799 example just a little bit more complex

00:17:07.559 what if our array joining code is

00:17:10.199 wrapped in a method

00:17:11.819 Lis seems to throw a bit of a wrench

00:17:13.799 into things we can no longer ask if this

00:17:16.439 is a send type anymore it's not this is

00:17:18.600 a method definition

00:17:20.160 we can't ask if the receiver is an array

00:17:23.459 type because there is no receiver for

00:17:25.559 this method definition

00:17:27.600 the method name here is not star

00:17:30.600 we do have one argument though so that

00:17:32.940 we do have that going for us

00:17:35.460 what we need to do is we need to break

00:17:37.500 this code down into its abstract tree

00:17:39.960 representation abstract syntax tree

00:17:41.760 representation and when we do it looks

00:17:43.860 something like this you can see the

00:17:45.720 original send node that we wanted to

00:17:47.340 focus on is there but it's wrapped in

00:17:49.620 another node this def node

00:17:52.740 so in order to navigate and find these

00:17:55.620 various nodes we need to walk through

00:17:57.840 the abstract syntax tree

00:18:00.240 so what we'll go through next is a very

00:18:02.280 simplified version of how do we walk in

00:18:04.799 abstract syntax tree

00:18:07.020 to do so we could start with a method

00:18:08.820 like walk

00:18:10.380 and the goal of this will can we visit

00:18:12.240 every single node inside of this

00:18:14.820 abstract syntax tree

00:18:16.980 to do so we'll start with passing it the

00:18:18.960 entire abstract syntax tree

00:18:21.240 and we'll need some sort of convention

00:18:22.860 on determining what do we call the

00:18:25.080 methods for our various nodes so for

00:18:27.720 this we'll create a method for each node

00:18:29.640 type that's prefixed by on

00:18:32.280 so here we'll need a method called on

00:18:35.100 Def because def is the top level node of

00:18:38.760 this abstract syntax tree

00:18:41.820 from here what does on Def do well we in

00:18:45.480 order to understand this method we need

00:18:47.820 to know what what does the def node in

00:18:50.100 the abstract syntax tree look like

00:18:53.160 it has many children its first child is

00:18:56.100 the name of the method that's being

00:18:57.840 defined this is a symbol so we don't

00:19:00.660 need to do anything further here

00:19:02.940 uh the second child of the death note is

00:19:05.280 the arguments that are passed in

00:19:07.380 so to deal with this convention we'll

00:19:09.539 create an on args method we'll hand it

00:19:11.580 the arguments that will continue to

00:19:13.559 recursively visit these nodes

00:19:16.320 the last child is the body of the method

00:19:19.679 it could be a few different things but

00:19:21.780 to keep it simple we'll focus on what it

00:19:23.340 is here so we'll need to write in on

00:19:26.039 send method

00:19:28.380 so we now we can start looking at those

00:19:31.200 methods what does the on args do

00:19:34.740 this has many children and each child is

00:19:38.700 an ARG so we could walk through each of

00:19:41.160 the the children of this piece of the

00:19:43.260 AST and call on ARG on Egypt

00:19:46.559 and the odd and ARG itself doesn't need

00:19:49.200 to do anything further it doesn't have

00:19:51.000 anything further nested so we don't need

00:19:52.679 to do anything to navigate this entire

00:19:54.419 tree

00:19:56.820 moving back over to the on send method

00:19:59.460 the send node itself we've gone over

00:20:01.500 before

00:20:02.640 and we know that the first child is is

00:20:04.679 the receiver so we can call the ONG

00:20:06.720 array here

00:20:08.280 the second is the name of the method

00:20:11.160 that's being called which in this case

00:20:12.660 is star it's a symbol we don't need to

00:20:14.160 do anything further and the rest of the

00:20:16.320 children are the arguments to the method

00:20:18.780 so in this case there's only one

00:20:20.220 argument which is the string so we'll

00:20:21.840 call the on Str method

00:20:25.320 almost done here the on array method

00:20:28.500 looks very similar to what the on args

00:20:30.780 look like we go over all of the children

00:20:33.059 and we need to call a method based off

00:20:35.280 what type is inside of the array so in

00:20:38.520 this case we just call on STR on each

00:20:41.520 one of these uh elements of the array

00:20:45.539 and lastly the on SDR method doesn't

00:20:48.360 have any further children so we don't

00:20:50.220 need to do anything

00:20:52.140 and so utilizing all of these methods we

00:20:55.200 can visit all elements of this abstract

00:20:58.020 syntax tree and based off where we are

00:21:00.960 in the true we can start to tie in

00:21:02.640 behavior of what we want to do

00:21:05.820 and this is all fairly complicated still

00:21:08.160 but RoboCop AST that gem that I talked

00:21:11.460 about earlier has abstracted this away

00:21:13.020 into a module called RoboCop AST

00:21:15.419 traversal and using this it can Traverse

00:21:18.600 over any arbitrary abstract syntax tree

00:21:22.140 it'll visit each node of the tree in a

00:21:25.200 depth first search

00:21:27.780 um and if any of the cops that are

00:21:30.780 active

00:21:31.799 Define methods that are interested in

00:21:34.080 each piece of that abstract syntax tree

00:21:36.240 it will hand that node over to the cop

00:21:38.460 to handle its own logic

00:21:42.179 so what this could look like is we could

00:21:44.280 have a class that includes this module

00:21:46.260 and this would allow it to Traverse over

00:21:48.659 any arbitrary abstract syntax tree

00:21:51.659 and the object could hold a reference to

00:21:53.760 whatever cops are available and enabled

00:21:56.820 and then for each type of node it can

00:22:00.179 walk through the cops that are available

00:22:01.799 determine if they respond to that method

00:22:04.440 and if so

00:22:05.520 give them that piece of the abstract

00:22:07.260 syntax tree

00:22:08.640 I've only listed a single method here

00:22:10.679 the onsend but effectively this is what

00:22:13.200 RoboCop does it does this for every

00:22:15.659 possible type of node

00:22:18.299 and if we were to look at what the cop

00:22:20.640 for that array join cop looks like the

00:22:24.360 source code for this

00:22:26.280 um there's while there's a lot going on

00:22:28.020 here the main thing here is that the cop

00:22:29.940 just defines an on send method and this

00:22:32.460 is past every send node so this cop when

00:22:36.240 run through RoboCop will see every

00:22:38.159 method call that is in your source code

00:22:40.080 and this will allow the cop to ask

00:22:42.120 specific questions to that method call

00:22:44.760 is the receiver an array is the method

00:22:47.580 named star and if it matches everything

00:22:49.799 that it needs to match it can record an

00:22:52.260 appropriate offense

00:22:55.740 and this is how each cop ties into the

00:22:58.020 source code if we were to look at

00:22:59.820 another cop example just for a quick

00:23:02.280 moment we can look we can see something

00:23:04.320 similar for instance if we look at this

00:23:06.120 cop which is the min max cop

00:23:08.299 it is looking for when the Min and the

00:23:11.700 max are returned as two elements of the

00:23:13.980 same array this would need to do some

00:23:16.320 sort of logic and it on each array that

00:23:19.020 it encounters

00:23:20.520 the source code for this

00:23:23.400 just defines an on array method which

00:23:26.460 can then tie into every single array

00:23:28.440 definition and it can start asking

00:23:30.360 questions about this all right does this

00:23:32.580 match the Min and the max

00:23:34.559 this allows each cop to focus on only

00:23:37.440 the pieces of the abstract syntax tree

00:23:39.539 that they're interested in and make very

00:23:41.880 focused decisions

00:23:44.700 and there are many many different types

00:23:47.580 of nodes possible in the abstract syntax

00:23:49.919 trees you could have nodes for local

00:23:53.340 variable assignments you can have nodes

00:23:56.520 for class definitions or for yield

00:23:58.620 blocks some of these nodes I'm not

00:24:01.020 entirely sure how you would even write

00:24:02.880 Ruby code to represent them

00:24:06.000 um but if you can represent them as

00:24:08.480 abstract syntax tree RoboCop can tie

00:24:11.340 into that piece and start making

00:24:12.840 decisions about it

00:24:15.900 so finally we get to a point where we

00:24:17.520 can start asking RoboCop

00:24:19.740 how does it change the abstract syntax

00:24:22.020 tree and auto correct it

00:24:24.059 if we were to take the method that we

00:24:25.919 looked at earlier and we were to fix it

00:24:27.780 we would want to do something like this

00:24:29.100 where we replace remove the star replace

00:24:30.960 it with join and wrap the arguments

00:24:34.860 lucky for us the parser gem that we

00:24:37.500 talked about earlier has a class called

00:24:39.539 the tree rewriter and as they say it

00:24:42.240 performs all of the heavy lifting for

00:24:44.340 the sorcery writing process so you can

00:24:46.559 have multiple rewrites and it will

00:24:48.480 perform them all in the correct order

00:24:51.059 the general structure would look

00:24:52.500 something like this where you can get

00:24:53.940 the abstract syntax tree for any

00:24:55.799 arbitrary source code

00:24:57.419 and pass it along to the tree rewriter

00:24:59.460 class but from here what can we do

00:25:02.280 one of the things you can do is call the

00:25:05.100 replace method which as the document

00:25:06.780 says will replace the source code with

00:25:09.780 with us of a specific range with new

00:25:11.940 content

00:25:13.080 we can do other things with the tree

00:25:14.520 rewriters such as adding things before

00:25:16.200 or after or removing content but we'll

00:25:18.059 focus on this for right now

00:25:19.919 and two immediate questions should pop

00:25:22.020 up which is what is the range and how do

00:25:25.140 we determine the content

00:25:27.539 looking at the range the range is this

00:25:29.520 special class that comes from the parser

00:25:31.799 gem itself and it outlines a range of

00:25:34.260 characters that represent various Ruby

00:25:36.840 expressions

00:25:38.220 and there's a method available on all of

00:25:40.200 the code that is parsed with parser

00:25:42.000 called location

00:25:43.559 and this gives a lot of really

00:25:45.059 interesting aspects of that piece of the

00:25:47.100 abstract syntax tree but one piece of

00:25:49.380 that is the expression and that provides

00:25:51.840 the range for that entire piece of the

00:25:54.600 abstract syntax tree

00:25:56.460 so if we were to focus on that send node

00:25:59.100 which we determined was the third child

00:26:01.080 of that overall AST we can get to where

00:26:04.020 this source code is located by looking

00:26:06.000 at the location dot expression and this

00:26:08.520 is how we can get the range of what's

00:26:10.200 code needs to change

00:26:13.500 now when we start looking into content

00:26:15.240 of what how do we determine the content

00:26:17.340 it should be replaced with well we knew

00:26:19.320 that the receiver was an array type but

00:26:21.000 we can ask more questions about this

00:26:22.860 receiver for instance we can ask it

00:26:24.960 explicitly what's the source and this

00:26:27.480 will give us exactly as it was written

00:26:29.400 the source code of there's this array

00:26:32.640 and we can save this as a string to use

00:26:34.980 later

00:26:35.820 similarly

00:26:37.260 we can look at the arguments and look at

00:26:40.140 the source of the arguments and this

00:26:41.760 lets us get explicitly from the code

00:26:43.580 exactly that was passed in

00:26:47.880 putting this all together we can get

00:26:49.500 something like this where we set up our

00:26:51.120 code we set up our street rewriter we

00:26:53.640 focus on that send node specifically and

00:26:56.400 we grab the location of that send node

00:26:58.500 using that location.expression we can

00:27:00.539 get the source array we can get the

00:27:03.000 source arguments and we can call this

00:27:05.340 replace method to

00:27:07.260 wrap this array or with this join that

00:27:11.100 wraps the argument

00:27:12.720 and from here we can have the tree right

00:27:14.279 or process it and we end up with this

00:27:16.559 Rewritten method exactly the way we

00:27:19.020 wanted it

00:27:20.400 and effectively this is what RoboCop

00:27:22.200 does they have their own corrector which

00:27:25.320 is an object that wraps the tree

00:27:26.820 rewriter so you don't need to know

00:27:28.140 certain things like the location

00:27:29.400 expression you can pass in directly the

00:27:32.460 piece of the abstract syntax tree but

00:27:34.860 all of the other methods such as replace

00:27:36.360 are available

00:27:38.159 and this allows each cop to handle its

00:27:41.400 own changes whenever it detects

00:27:42.900 something that violates a certain

00:27:44.580 pattern and when it's done RoboCop will

00:27:47.580 write these auto corrected changes back

00:27:49.559 to the file and if necessary reprocess

00:27:52.020 the new file and start again

00:27:55.140 okay so even after now 28 minutes this

00:27:58.919 this can be a lot this is a crash course

00:28:01.140 through

00:28:02.460 Rubicon we've gone through some of the

00:28:04.500 different pieces uh simplistic version

00:28:06.419 of Robocop we've gone through pieces of

00:28:09.179 the command line interface and

00:28:10.980 configuration

00:28:12.240 we've walked through how RoboCop can

00:28:14.640 take that file and convert it to an

00:28:16.799 abstract syntax tree so it can interact

00:28:18.480 with it

00:28:19.440 we talked about how it could then

00:28:21.000 Traverse that abstract syntax tree and

00:28:23.820 rewrite it and then write it back to the

00:28:25.799 file

00:28:27.299 I'm hoping that this has helped you

00:28:28.919 understand a little bit about how

00:28:30.600 RoboCop functions and I hope this has

00:28:33.299 got you thinking about how you can start

00:28:34.620 utilizing this knowledge as I mentioned

00:28:37.500 earlier there's some other great

00:28:39.419 resources online if you want to learn

00:28:41.460 more but if you are interested in

00:28:44.039 building a tool that needs to analyze

00:28:46.679 Source codes there's probably little

00:28:47.820 bits of pulling from the abstract syntax

00:28:49.860 tree or walking through an abstract

00:28:51.539 syntax tree that could all be really

00:28:53.400 useful

00:28:55.020 um I hope you found all of this really

00:28:56.279 interesting uh thank you for your time

00:28:58.080 uh I don't know if we have time for

00:29:00.179 questions uh so I'm just going to end it

00:29:02.039 there if you have questions for me

00:29:03.480 personally come up after the talk and

00:29:05.039 I'll be happy to answer