00:00:12.799
yeah
00:00:15.150
so my name is Keisha trough and my talk
00:00:17.609
today is called building a chat apps
00:00:19.439
framework a bit about myself so my name
00:00:24.090
is cure I work at the developer
00:00:26.880
acceleration team I try fi I'll talk
00:00:29.939
about more about this team in my in my
00:00:33.870
talk I live in Canada
00:00:36.570
it's where Shopify is based and we may
00:00:40.500
probably have worked together on some
00:00:42.270
open source projects like rails
00:00:44.700
Capistrano route bench and that's me
00:00:48.090
with that cat so let's start with chat
00:00:52.170
ups please raise your hand if you heard
00:00:55.320
about that cool service chat ups you can
00:01:01.350
move your technical and business
00:01:03.510
operations into chat into conversation
00:01:08.010
with your team and this term was first
00:01:11.760
introduced by github and they first
00:01:15.780
started to talk about that on
00:01:17.159
conferences they first made chatter
00:01:20.880
framework and yeah it also connected to
00:01:27.600
a term conversation driven development
00:01:30.259
as you probably heard there is
00:01:32.610
test-driven development behavior driven
00:01:34.500
development and many other kind of
00:01:36.390
German developments and with chat apps
00:01:40.380
and conversation driven development you
00:01:43.229
can bring all of that into a chat with
00:01:45.630
your team
00:01:48.020
and a bit about Shopify we have quite a
00:01:51.890
lot of developers more than 300 yeah so
00:01:57.110
if you don't know about sci-fi its
00:01:59.360
e-commerce platform for small and medium
00:02:02.540
business and when you have so many
00:02:06.590
developers you need to build tools for
00:02:10.520
those developers so the developers could
00:02:13.010
be productive and my team where I work
00:02:16.520
is called developers acceleration and we
00:02:19.220
build tools for internal tools for our
00:02:21.920
developers to make their productivity
00:02:26.630
better and chat ups and all that kind of
00:02:32.630
automation is one of the things that
00:02:35.390
developer acceleration team is working
00:02:38.239
on
00:02:43.650
so for you to have a better idea how all
00:02:48.670
of that looks let's start with with an
00:02:51.640
example so in Shopify every developer is
00:02:56.530
responsible for shipping his or her own
00:03:00.040
features that means we don't have any
00:03:03.910
kind of release engineers who push
00:03:06.930
comments of other people so if you made
00:03:10.959
a feature you're responsible to deploy
00:03:13.420
this feature to see that it works and it
00:03:17.349
doesn't if it doesn't work to roll it
00:03:19.510
back or do something about that so
00:03:23.079
imagine you may need a pull request
00:03:25.140
you're about to merge it you merge it if
00:03:29.049
everything is ok with with a CI and in a
00:03:32.799
few minutes you get a message from a
00:03:36.819
chat bot that your your feature your
00:03:41.500
Comet in the master branch is ready to
00:03:45.010
be shipped and you tell about ok let's
00:03:49.060
ship it and in the in the group channel
00:03:53.380
in slack we use slack everyone will see
00:03:57.669
that you are deploying something what
00:04:00.310
comments do deploy and also the result
00:04:03.639
of this deployment so it's usually it's
00:04:06.940
usually succeeded but it can also fail
00:04:09.280
like on this slide
00:04:12.800
and this is how the ploy work so right
00:04:16.280
after the deploy or after you committed
00:04:20.780
something sometimes it happens that we
00:04:24.050
have an excellent for instance if sign
00:04:28.370
up is down for example someone comes to
00:04:32.180
this the same chat and starts an
00:04:36.050
incident an incident is a special
00:04:39.170
procedure to to manage some kind of bad
00:04:45.740
thing that happens in production and in
00:04:49.580
it includes actions like updating status
00:04:52.880
webpage and investigating what's wrong
00:04:56.920
we also have a chat command for all of
00:05:00.740
that
00:05:01.880
another example is monitoring the most
00:05:06.430
the most heavy SQL queries or the most
00:05:10.790
heavy customers who who bring a lot of
00:05:15.950
laud on our service and another example
00:05:19.670
of automation is creating new
00:05:22.700
repositories so if you work in a small
00:05:26.540
company and small team you probably have
00:05:30.340
CTO or someone who is admin in your
00:05:34.130
github organization who can create a new
00:05:37.190
repo for you but if you have hundreds of
00:05:40.220
people there is no there can be no
00:05:43.130
special person who who has fairness
00:05:47.450
ability to create and europeís for
00:05:49.280
someone and another aspect is that as a
00:05:53.600
developer you don't even know who to ask
00:05:55.640
to create a ream for you now we have
00:05:58.370
special events called hack days where we
00:06:02.750
have a lot of internal hackathons
00:06:04.700
sheriff I
00:06:06.220
and on these days recreate a hundred new
00:06:09.220
repositories during a couple of days so
00:06:12.190
this is an action that should be
00:06:14.350
automated as well and speaking about
00:06:19.300
chat offs it's all about it's also about
00:06:23.020
the interface if we would if we would
00:06:28.030
take another another path will probably
00:06:32.140
create a web interface in bootstrap or
00:06:37.260
or something else and to give developers
00:06:41.850
all actions Oh to give developers
00:06:45.460
ability to trigger all those actions and
00:06:48.730
scrape that automated but with chat ops
00:06:52.690
is just another interface which is chat
00:06:55.320
which has a lot of advantages for
00:06:58.330
example your team will will see what's
00:07:02.860
happening and what actions are you
00:07:05.320
taking to do something
00:07:11.580
now we come to the next part of my talk
00:07:15.419
which is about frameworks about chatter
00:07:18.819
frameworks that exists and about our own
00:07:21.819
kind of framework that we wrote and
00:07:25.810
reasons why we wrote it
00:07:28.139
the first framework is called cubot it's
00:07:31.840
the framework invented by github that I
00:07:34.479
mentioned how what is written in
00:07:38.650
CoffeeScript which means it's in
00:07:41.620
JavaScript in no GS and as a ruby
00:07:45.280
developers you're probably maybe some of
00:07:48.940
you don't like JavaScript and the
00:07:53.110
remaining Ruby developers who don't like
00:07:54.759
JavaScript but in case of chat ups
00:07:57.430
framework JavaScript may be a good thing
00:08:00.699
because it brings a lot of a synchronous
00:08:06.130
support to your code which is important
00:08:09.340
in case of cheddar framework because
00:08:11.500
comments have to be asynchronous and one
00:08:14.710
heavy comment shouldn't block comments
00:08:17.229
from from other people another framework
00:08:23.349
is called Lita it's written in Ruby and
00:08:26.289
it's very well extendable its few years
00:08:31.240
old a very good framework and it's fair
00:08:34.390
to mention that both of these frameworks
00:08:38.320
have different adapters to every chat
00:08:42.310
provider we use slack so it's like is
00:08:46.870
the the only adapter we use but if you
00:08:49.300
use some very rare chat solution you can
00:08:56.190
you can find exist an adapter or right
00:08:58.800
here Oh an adapter so let's see how how
00:09:03.890
chat scripts and how DSL looks like so
00:09:09.720
this is the later DSL you just define
00:09:14.120
small Ruby class which has a macro code
00:09:19.740
root in this macro you describe a
00:09:22.860
regular expression with the comment that
00:09:26.130
you would like to trigger
00:09:27.930
so with this handler if I go to slack
00:09:31.200
and write echo something the bot will
00:09:34.140
will catch this phrase and reply with
00:09:38.730
the second word that comes after echo
00:09:42.240
and the who bought syntax is very is
00:09:48.600
very similar to leta we also defined the
00:09:52.190
the regular expression that the bot
00:09:55.470
should should wait for and send a reply
00:10:05.640
if we take a closer look we will see
00:10:09.640
that both of these details are based on
00:10:14.170
regular expressions and you should write
00:10:18.910
regular expression to tell the bot what
00:10:22.540
comments to detect and why regular
00:10:26.260
expressions it is the the easiest way to
00:10:30.930
tell the board what comment to watch and
00:10:36.480
this approach have a few disadvantages
00:10:40.570
like it cannot detect typos it cannot
00:10:49.530
reply with this comment was not found
00:10:53.320
maybe man something else it can also it
00:10:57.310
also cannot do in input validation like
00:11:01.450
if the comment was was right but the
00:11:05.920
argument was was wrong and that argument
00:11:09.100
may have not matched by the regular
00:11:11.830
expression and this comment won't be
00:11:13.390
found and having regular expressions in
00:11:17.710
your chat BOTS means that all developers
00:11:21.820
should should be really good in regular
00:11:26.230
expressions and it's always easy to make
00:11:29.980
a mistake and find a regular expression
00:11:33.040
that will conflict with a different
00:11:37.030
script regular expression so we thought
00:11:43.660
that maybe we could do something else
00:11:46.060
without regular expressions and yeah
00:11:49.810
here is an example
00:11:51.850
the first option to write their common
00:11:56.320
syntax with a regular expression and the
00:11:58.149
second one is to write it with some kind
00:12:00.970
of pattern language and with echo the
00:12:07.329
difference is not that big but where the
00:12:09.639
bigger comment like github add user name
00:12:12.790
to team name the relative expression
00:12:15.820
becomes quite long and it's quite easy
00:12:19.420
to make a mistake there as I said so we
00:12:23.889
we thought that maybe we can improve
00:12:25.660
that that experience of writing chat
00:12:30.420
handlers and what we wanted to to have
00:12:36.730
from that solution we want to be a
00:12:39.970
friendly for both developers and the
00:12:42.070
user by being friendly for developer it
00:12:46.540
means that developer wouldn't need to
00:12:48.190
write a regular expression and friendly
00:12:50.740
for user means that we would suggest the
00:12:52.839
write comment if user made a mistake
00:12:55.709
we're also we also have a lot of Ruby
00:13:02.069
infrastructure code written in Ruby it's
00:13:04.480
surely fine so we decided that wants to
00:13:07.510
stick with Ruby after we tried both
00:13:11.220
aletan who bought in production and we
00:13:15.760
wanted simpler and more powerful DSL
00:13:19.350
that would provide a better argument
00:13:22.209
support so our solution we we decided to
00:13:27.579
make it on top of Lita
00:13:29.260
with the custom common router and custom
00:13:32.620
DSL and this is how this DSL looks like
00:13:37.050
first of all is very similar to Lita but
00:13:40.510
instead of defining the regular
00:13:42.310
expression
00:13:46.540
here you define special puttering and
00:13:51.860
you also define a help and right after
00:13:55.520
this pattern is matched its dispatched
00:13:57.620
to a ruby method with a keyword argument
00:14:01.480
and in this case it's very simple
00:14:03.980
handler it will reply with the same
00:14:06.680
command so let's take a look on a bit
00:14:10.280
more complex handler it has two
00:14:12.980
arguments one of them is yeah this
00:14:16.700
handler is for displaying some chart
00:14:19.520
from your like the first variable is
00:14:23.120
application name and the second is
00:14:25.250
format a foreman is enum field it can be
00:14:30.410
depth daily or hourly value and help and
00:14:34.940
it should be converted into a calling of
00:14:39.110
routine method which is kind of simple
00:14:42.460
so this pattern would match all the
00:14:47.600
following user inputs can be my app so
00:14:52.220
hourly is the default value for the
00:14:54.950
format variable you can override it here
00:14:59.570
and here and you can also define it in
00:15:02.600
the explicit way which is useful when
00:15:06.320
you have more arguments and maybe you
00:15:11.150
don't remember the order of them so we
00:15:15.200
also wanted to have the explicit formats
00:15:19.790
and we to be able to work without
00:15:24.380
regular expressions we we talkin eyes
00:15:27.680
this pattern with different kind of
00:15:31.340
tokens first are to our static tokens so
00:15:35.770
the user inputs to start with a new
00:15:38.780
relic and chart and then there is that
00:15:41.890
simple variable and then there is a
00:15:43.970
variable with default it looks like this
00:15:48.800
so this comment consists of four tokens
00:15:53.920
our next goal is to convert the user
00:15:59.510
input of New Relic chart my app daily
00:16:02.120
into coin rule method actually yeah
00:16:07.040
instant changing the new Eric Cantor and
00:16:09.200
Cohen that method with those keyword
00:16:14.060
arguments and
00:16:19.450
this may seem as a as a task is a
00:16:27.220
difficult task until we discovered the
00:16:31.090
class in Ruby standard library which is
00:16:33.820
called shrink scanner yeah it's a class
00:16:36.940
in a ruby standard library please raise
00:16:38.980
your hand if he heard about the class
00:16:41.430
yes and not too many people the string
00:16:47.230
scanner works as a scanner I'll have an
00:16:50.020
example now so you initiate an object
00:16:54.630
with with a string in this case string
00:16:58.900
is the user input and there is a method
00:17:04.420
called scan and you give just just
00:17:10.240
talking to that scan command and yeah so
00:17:18.240
it scans and if this user input would
00:17:22.300
start from something else it wouldn't
00:17:24.840
scan the string at all so if it would
00:17:29.170
start with the github or some other
00:17:33.330
command it wouldn't miss can't then we
00:17:37.150
have the next token which is chart
00:17:38.830
static token it is also scanned so we
00:17:43.030
can go further then we scan for variable
00:17:49.140
so that
00:17:51.510
it's scanned and then the next variable
00:17:55.810
and we get the the values for these
00:18:02.230
variables so it wouldn't be honest so
00:18:06.640
it's not very honest to say that we
00:18:10.000
completely got rid of regular
00:18:13.000
expressions but the end developer of of
00:18:17.620
a handler doesn't have to write a
00:18:19.570
regular expression but we we use some
00:18:22.180
where expressions under the hood
00:18:29.050
yes so more than that we have type
00:18:34.070
version
00:18:35.750
so when defining handler you can you can
00:18:41.540
declare the type of variable for
00:18:44.330
instance the target this is the comment
00:18:46.970
used to tell to the infrastructure that
00:18:50.600
some server will go to down time that
00:18:55.820
means that maybe we are going to restart
00:18:57.890
the server or repair it somehow and
00:19:01.930
there are two arguments one of them is
00:19:05.960
target which is a chef node
00:19:08.870
it should be a valid chat HF node
00:19:11.270
address and then duration duration can
00:19:14.750
be one minute or one second or one hour
00:19:20.740
or just an iteration and we also declare
00:19:24.740
that so the first one dives chef node
00:19:27.680
second types duration and when the
00:19:31.040
message receives so when we call this
00:19:35.540
Ruby method inside this method you'll
00:19:37.760
have duration as activesupport duration
00:19:41.600
and target will be valid chef node so we
00:19:45.890
can be sure in this method that both of
00:19:49.010
these arguments are valid so this
00:19:52.550
comment will well develop for the first
00:19:56.420
input but for the second input it won't
00:19:59.720
develop and it will return an error and
00:20:03.280
this this method won't be called at all
00:20:06.710
so when writing code in that method you
00:20:09.740
will be sure that you get the right
00:20:13.880
input
00:20:21.100
after shipping these DSL to our
00:20:24.610
developers money developers could could
00:20:29.679
write their own chat handlers to
00:20:32.529
automate their their workflows and we
00:20:35.799
got to the number of more than 200 boss
00:20:39.039
scripts handlers and more than 60 of
00:20:44.039
more than 600 common invitations on a
00:20:48.580
busy day in in slack so this became a
00:20:53.580
part of infrastructure that we had to
00:20:56.200
scale as I mentioned where we based our
00:21:01.059
framework on on Lita so it was just the
00:21:06.460
next on top of fleeta and that means so
00:21:12.820
little is written in Ruby that means
00:21:14.649
that it doesn't have any support for a
00:21:19.330
synchronous workflow which meant that if
00:21:22.529
you ask a bot for some common that takes
00:21:27.580
a minute
00:21:32.340
working on comments from other users so
00:21:35.620
the bot was blocked for that minute and
00:21:37.570
it couldn't accept comments from other
00:21:40.630
users which was super bad especially in
00:21:44.410
the scale so we decided that we will as
00:21:49.060
one option we can make a nice read on
00:21:52.960
every comment and do all the operation
00:21:55.330
in that thread she not block receiving
00:21:58.930
new comments from slack and Ruby threads
00:22:04.270
are not are not so good in some kind of
00:22:07.750
in most kind of operations but in in our
00:22:10.780
case when most of the handlers for chat
00:22:14.440
BOTS were only making HTTP queries or
00:22:17.850
they were invoking some other kinds of
00:22:21.370
systems they didn't they didn't do any
00:22:24.790
calculations on the both side so they
00:22:28.240
just requested data from other systems
00:22:30.790
so in this case Ruby threads were were
00:22:34.360
quite efficient and this approach helped
00:22:39.090
but we we thought that maybe there is
00:22:43.390
some other approach and we went with
00:22:49.470
with the master process and Redis and
00:22:52.210
when the master process received a
00:22:55.060
comment from slack it pushed that
00:22:57.670
comment to Redis and we had a pool of
00:22:59.620
workers and we could have multiple
00:23:03.070
machines that are that work as workers
00:23:06.030
so we could scale it scale that
00:23:08.580
horizontally and it works the same as a
00:23:13.930
side kick or delay job worker queue
00:23:22.970
having that approach we could have
00:23:25.070
active and passive instances of the
00:23:29.540
bought server running and slack would
00:23:32.390
make a callback to a load balancer with
00:23:35.870
the message for a bot and then the world
00:23:39.020
browser could determine to which to
00:23:43.220
which machine to Trude the message and
00:23:47.380
that comes to the availability problem
00:23:51.550
so if you remember and at the beginning
00:23:55.400
of this year github was down for like
00:23:59.030
four three or four hours that was a
00:24:04.520
pretty big downtime and one of the
00:24:08.720
reasons for for that such a long
00:24:11.750
downtime was that github is heavily
00:24:15.770
based on on their chat ups scripts and
00:24:20.140
butBut chat ups was down as well because
00:24:23.000
of some network failure so they couldn't
00:24:25.370
use any of the chat of scripts to
00:24:27.650
recover the system because chat up setup
00:24:30.950
was down as well and we know the problem
00:24:34.160
is Sean v we also had this problem when
00:24:37.640
our bot was unavailable or slack was
00:24:40.490
down and in this case we couldn't do
00:24:43.430
anything so we decided that we'll build
00:24:47.480
a special offline or rescue mode in our
00:24:51.080
board so you should be if you have this
00:24:55.340
bought locally on your laptop you just
00:24:58.960
launch it with a special bean command
00:25:01.400
and you will have exactly the same
00:25:03.980
interface in your common line as you
00:25:07.010
would have interface for a bot in chat
00:25:09.790
but it works even if slack or something
00:25:14.810
else is down
00:25:18.100
summary this is very important so
00:25:22.570
probably you learned a bit about chat
00:25:26.320
ups and how it can automate things and
00:25:28.710
you thought okay cool I'm going to try
00:25:32.410
that in my team in my company no but I
00:25:36.600
would like to say that it's it only
00:25:39.820
makes sense but if you have a very big
00:25:42.340
team because when I worked on in smaller
00:25:47.530
teams and smaller companies I would say
00:25:50.920
that we didn't need all of that just
00:25:53.590
because we it wasn't on such scale that
00:25:57.790
that we needed to automate things with
00:26:00.610
shadows and in this case it's really
00:26:03.400
easier to come to your CTO and asked to
00:26:07.060
create a new repo for you instead of
00:26:10.080
bringing more code and more
00:26:12.190
infrastructure to keep the chat ups
00:26:15.730
running so if you're interested in
00:26:19.510
working on such a big scale systems we
00:26:23.140
welcome to check Shopify careers and I
00:26:25.900
had I have mentioned a lot of projects
00:26:29.800
in Ruby a lot of gems and some other
00:26:33.670
things so you can go to my Twitter and
00:26:36.600
the last weight is a gift with with all
00:26:40.180
links that I mentioned today which
00:26:42.820
you're welcome to check thank you
00:26:50.509
thank you very much here any questions
00:26:53.909
for
00:27:00.050
how many of you use some form of bots on
00:27:04.340
your favorite too
00:27:08.920
for my favorite slack bot is the flip
00:27:12.400
table what so what am i typing it above
00:27:15.340
it with a region guide so if there are
00:27:20.920
no questions back here
00:27:22.420
roundup father yet what it means