Shitlist-driven development and other tricks for working on large codebases

Summarized using AI

Shitlist-driven development and other tricks for working on large codebases

Florian Weingarten • June 23, 2017 • Singapore • Talk

The video titled "Shitlist-driven development and other tricks for working on large codebases" features Florian Weingarten, a Production Engineering Lead at Shopify, discussing strategies for enhancing productivity and efficiency in large software codebases. The talk revolves around the challenges faced when a large number of contributors work simultaneously on a monolithic codebase and how to manage those challenges effectively.

Key Points:

  • Context of Shopify's Development:

    • Shopify is built on a Ruby on Rails monolithic application, serving over 400,000 online stores, with more than 800 contributors and a continuous deployment frequency of up to 50 times a day.
  • Productivity Challenges:

    • Deployments can become bottlenecks due to the high volume of concurrent contributions needing to be integrated and released, potentially leading to longer wait times for changes to go live.
  • Automating Deployments:

    • To improve deployment speed and reduce manual involvement, they advocate for the automation of the entire CI/CD process. This includes parallelizing tests, pre-building containers, and minimizing application boot times.
    • They introduce "Shiva," their deployment tool that allows for automatic deployments without requiring human intervention once tests are passed.
  • Managing Concurrent Changes:

    • Introduction of “shitlist-driven development” to manage deprecations and changes. This concept helps flag and monitor outdated practices while ensuring that new suboptimal code is not added, allowing teams to focus on gradual improvements without conflicts.
  • Dealing with Unreliable Tests:

    • The significance of recognizing and resolving flaky or leaky tests in the codebase is emphasized. Solutions include using production monitoring tools for test failures and adopting a binary search approach to identify and fix issues efficiently.
  • Conclusions and Takeaways:

    • Fast and automated deployments allow for safer, smaller changes more frequently, which enhances productivity.
    • Keeping a clear, maintainable list of deprecated methods and leveraging automated testing insights can significantly improve code quality and contributor experience.

In summary, Florian highlights that with well-structured automation processes and internal tools, organizations like Shopify can manage complex, large-scale codebases effectively, thereby improving the overall development workflow and developer satisfaction.

Shitlist-driven development and other tricks for working on large codebases
Florian Weingarten • June 23, 2017 • Singapore • Talk

Speaker: Florian Weingarten, Production Engineering Lead, Shopify

Working on large codebases is hard. Doing so with 700 people is even harder. Deploying it 50 times a day is almost impossible. We will look at productivity tricks and automations that we use at Shopify to get stuff done. We will learn how we fix the engine while the plane is running, how to quickly change code that lots of people depend on, how to automatically track down productivity killers like unreliable tests, how to maintain a level of agility that keeps developers happy and allows them to ship fast, and most importantly what the heck a "shitlist" is.

Speaker's Bio

Florian is originally from Germany, where he studied mathematics and computer science. Since moving to Canada, he is now working as Production Engineer at Shopify in Ottawa, spending most of his time on refactoring large Ruby on Rails codebases and thinking about scalability and performance problems.

Event Page: http://www.reddotrubyconf.com/

Produced by Engineers.SG

Help us caption & translate this video!

http://amara.org/v/8HYM/

Red Dot Ruby Conference 2017

00:00:03.940 yeah so first of all I want to apologize for the other swearing in the title um
00:00:10.219 it was kind of like an internal name that we used and someone on my team said wouldn't be funny if you get the tour
00:00:15.530 accepted through this in the title so I listened but so I'm going to talk about
00:00:21.080 something that we call shitless driven development and I'm going to share some tricks about about how to work with
00:00:28.430 which we really large code bases and it's not going to be too really specific but I think no matter which language you
00:00:36.050 you work with hopefully you can find something useful he has the foreword for
00:00:42.170 a load of context my work for Shopify which is a ecommerce software as a service platform headquartered in Canada
00:00:50.000 and we are as far as I know one of the oldest and I think the largest ruby on rails core business in the world we've
00:00:56.899 been using rails since version zero point something like over ten years ago and I think it's probably the biggest
00:01:04.070 ruby company in Canada this is a Ottawa the capital of Canada and it's one of
00:01:10.400 the buildings on the on the right is the Shopify headquarter and my job at
00:01:17.900 Shopify is I basically work on one of the core architecture teams and a lot of my job looks like this so I I do a lot
00:01:24.920 of very broad work very low level changes lots of maintenance work and lots of work that effects the entire
00:01:31.220 application the entire platform and most of the most of the take tips and stuff
00:01:39.229 that in this in this talk is coming from the contacts a little bit so Shopify is
00:01:46.340 a is a monolithic rails application that doesn't mean that the the tip that I'm giving here can't be applied to like
00:01:53.149 other code bases but just so you know this is where where this comes from but um so we we run a much tenant
00:02:01.130 architecture which means that we we host people online stores and we have about
00:02:06.140 more than 400,000 of those online stores and they're all running in the same education the same database the same
00:02:11.420 the same deployment so it's not like multiple it's not like each shop has its own deployment but it's obviously an
00:02:16.640 application and we do we do about twenty to forty thousand requests per second
00:02:22.760 and our main github repository has about eight hundred contributors which
00:02:28.700 includes developers designers and content strategists and documentation
00:02:33.800 writers all that kind stuff and there's a whole bunch of problems that comes with you have so many people trying to
00:02:40.610 change the same thing at at the same time and so quickly and so for all of
00:02:47.030 those eight hundred contributors have information to merge changes to master they all have committed to deploy to
00:02:53.930 production and we with the rate of change that we have right now we deploy
00:02:59.180 about fifty times today and those fifty deploys are about 100 PR and fifty to a hundred peers a day and together that
00:03:08.750 kind of amount of change every day gives you a bunch of interesting problems that
00:03:15.200 you don't really run into with small applications but yeah it's a it's a
00:03:21.739 challenge so I want to phrase this talk
00:03:27.860 is from the perspective of productivity problems so I'm going to talk about three three important productivity
00:03:35.989 problems that were that we were faced with and share some tips about how we
00:03:41.530 how you work on them so the first one is deploy so if you have this many people
00:03:48.230 working on the same code in the same application and they all want to deploy the deploys actually become a bottleneck and what I mean by that is first of all
00:03:57.440 if you hire more people they want to ship more code and shipping more code means that you either need to deploy
00:04:02.750 more often or you need to have bigger deploys so one of the two and four for
00:04:09.500 several reasons smaller deploys are often better so a few obvious ones is that that fewer changes are often easier
00:04:15.920 to to debuggers save how you're changing less code at the same time it's easier to roll
00:04:21.209 I guess easier to to revert and easier to keep an overview of what's happening
00:04:26.220 so from from that perspective we we wanted multipliers and not bigger ones so now the important observation is that
00:04:35.819 if you want small deployer and we want to declare often you need to you need to deploy to be fast so as an example I
00:04:43.620 said we deploy about fifty times to be most of our developers on the same time zone so that means about six deploys per
00:04:50.250 business hour and if those deploys take longer than ten minutes then they become a serious productivity problem for us
00:04:55.560 because we can't ship the code as fast as we want to and that means we can't you know can't develop features as
00:05:02.820 likely as we want to
00:05:09.669 so what do we what do we do about it so
00:05:15.200 first of all when I say when I said employee I don't I don't only mean getting the code into production but I
00:05:20.630 mean the entire pipeline that comes with that so building if you use docker for
00:05:27.350 example building a CI container running running your tests on CI building the production container uploading the
00:05:33.110 production container to wherever you have to upload it to like get it on the server restart all of those containers
00:05:38.750 to make sure everything is successful so when I say deploy I really mean this entire sequence of steps so an obvious
00:05:48.229 one is if you have Seattle's you should paralyze them so if you have two people who want to ship something at the same
00:05:54.320 time you shouldn't run this the average one after another but also it's the same person has multiple tests you can easily
00:06:00.110 run the tests in parallel so that one is pretty obvious another one that is that
00:06:05.990 was super helpful for as I said you should build those containers in advance so before I said we have about 50
00:06:14.450 deploys but about 100 PRS that means some deploys contain more than one PR so if we build those production containers
00:06:23.210 for every merge to master that means we build a lot of containers that actually never get deployed but the advantage of
00:06:29.090 that is that if that container wants to if someone wants to deploy that container then it is already ready and
00:06:36.320 we don't have to build it in that moment another really big improvement that we
00:06:42.260 had is during the container build we would have often involve different rake
00:06:48.860 tasks and so on and each of those rake tasks would often put the rails application and if your rate application
00:06:54.860 is this big booting just just running a rake task just booting loading the rails environment before you can even start
00:07:01.190 doing or start running your real-time rake tasks often takes up to 10 seconds or so so finding a way to combine all of
00:07:07.789 those and so you only have to put it once was a huge speed up for us deploy
00:07:13.970 297 parallel it you don't want to do one server at a time if you have an application of this side and now if you if you look at all
00:07:24.710 those different steps or building containers running tests building the production container restarting the
00:07:30.740 applications all of those require booting the application so if you find a way to reduce the time it takes to boot
00:07:36.830 your application that has a huge impact and different in many different areas so there was a really big improvement and
00:07:44.240 then the last one that is a little bit overlooked is how long does it take to shut down your application so especially
00:07:50.570 if you're running a web application and you using jewnicorn there's a standard
00:07:55.730 timeout value that says how long is a request allowed to run and if you want to deploy you have to you either have to
00:08:02.780 terminate those requests which gonna lead to errors or you have to wait for them to finish and if you wait for them to finish that means you deploy is going
00:08:09.170 to take at least as long as it takes for that to happen so doing whatever you can
00:08:15.470 to make sure you have as little long running requests as possible and I have a huge impact on the speed at which you
00:08:22.220 can deploy the other bottleneck related to deploy as a human so there's a couple
00:08:28.400 of steps that you can totally get away with at a smaller company or a smaller codebase smaller project but if you want
00:08:35.030 to deploy a hundred times a day then that's not going to work anymore so one
00:08:40.550 example is smaller companies often have an ops team and the ops team is allowed to deploy but Italy has 800 people and
00:08:47.360 they all want to deploy if all of them have to ask the ops team to deploy that doesn't work so you need to allow people
00:08:55.370 to deploy on their own so another thing asking having someone that decides now's
00:09:01.070 a good time to deploy doesn't they're asking everyone to pay attention to the status of mass sei doesn't care asking
00:09:08.750 everyone to pay attention to errors doing a deploy so having everyone watch the deploy to see whether they pay that
00:09:14.750 doesn't scare as many people and at the end even even ask them even even saying
00:09:20.390 hey every developer can deploy themselves even that doesn't scale at the point so in summary humans don't
00:09:26.540 care and you should automate this process as much as you can so to illustrate i want
00:09:33.350 to show you the tool that we use but there's nothing really special about this tool you can easily write your own
00:09:38.779 at the point that I'm trying to make is that you should have some kind of tool and that tool should not be human trivia
00:09:44.240 and software so this is our deploy software called Shiva that open source
00:09:50.209 if you want you you can use it or you can give some ideas and write your own but a few important parts here is that
00:09:57.200 here for example you can see it's waiting for CI and as soon as those
00:10:02.360 tests are passing we automatically deploy this and there's no human that
00:10:08.300 has to press a button or nobody has to say it's okay to deploy now so basically
00:10:13.850 we expect people if they emerge to master that basically means this is good
00:10:18.980 to get employed another big one that we often have is that people make mistakes
00:10:25.550 so you merge something and then you you you figure out or something was wrong I
00:10:30.680 need to revert it and then we often had problems where people had to manually
00:10:35.779 keep an eye on oh this can't get deployed so reverted then lock the deployers make sure the first one
00:10:41.930 doesn't go out without the second one and so on so automating this in software is really was really on taking away some of the
00:10:49.820 human interaction from this process right so this feature for example here says if there is a revert for the commit
00:10:57.380 that hasn't been deployed that in then nothing in that range can get deployed and then if something passes the eye
00:11:03.260 after that it's going to get deployed automatically another thing that you can
00:11:10.640 automate is telling people that their code is not being deployed so if you if
00:11:16.850 you deploy stuff automatically it's still important that people know that their changes are going out so we have a
00:11:22.790 wave attack channel with waking where people get notifications to see that the
00:11:29.209 code is going out and another another important thing is that we don't we
00:11:34.970 don't want people to merge too many commits into master so we don't we don't want the commits to pile up so if
00:11:40.910 there's a large backlog of stuff that hasn't been deployed yet we want people to wait and so on a smaller application
00:11:49.459 it's okay if someone keeps an eye on that and then pokes the people and say hey don't do this but if you have if you have a lot of people and if the
00:11:55.220 application gets really big then this kind of like educating people is also
00:12:01.040 something that you can automate so in our case if someone does is if someone merges to Mars a while
00:12:08.000 sea ice failing or wires a lot of commits that haven't been deployed yet those people also couldn't get in the notification saying on you shouldn't do
00:12:15.230 that this one is really interesting I
00:12:20.420 thought because it's a little water roundabout or like yeah is it I would
00:12:25.790 say it's a workaround for a missing feature and github I think where if you
00:12:31.279 if merging two master basically means it's going to get deployed then that
00:12:36.380 also becomes a bit of a bottleneck so well the workflow that we actually use is we have a browser extension that at
00:12:43.490 inject this button here into the gate of UI and people don't actually merge their
00:12:49.279 peers they just say this is ready to be merged and then later a boss comes on merged for you when some heuristic
00:12:56.000 decides that now is a good time to deploy this so this means people can say
00:13:01.160 ok this is ready and then they can move on and work on the next peer I don't have to basically the developers the
00:13:07.010 humans themselves I don't have to orchestrate this whole deploy process
00:13:12.319 but it's another step that that we automated
00:13:22.899 okay so the the next problem I want to talk about is we have the name of my
00:13:29.180 Tyler comes upon for the the presentation title and that the problem is basically that how do you deal with
00:13:36.949 deprecations so especially if you work on a very low level like a team like the
00:13:43.129 one I'm working on where you do a lot of framework changes a lot of stuff that effect not
00:13:48.230 only a certain feature but the entire application then you often or for example the team that it's responsible
00:13:54.259 for upgrading to a new rails version and that kind of stuff so basically if you if you have an API an internal API that
00:14:00.649 is being used by a lot of code and your job is to migrate from the from the old
00:14:07.790 way to do it to the new one so the way that rails solves this internally is with active support deprecation notices
00:14:15.439 and that's basically logging and everybody be expand with those log output and you just have to hope that
00:14:22.670 people will fix it but the reality is people will not fix it because nobody feels responsible for those errors if
00:14:28.519 you have eight hundred eight hundred people working on the application so basically the idea is you go and fix all
00:14:36.290 of those methods to use a new one and now everything is fixed but the problem is that in the meantime while you get
00:14:42.110 that someone else might have added a new um a new class that also does it wrong
00:14:48.019 or someone if you do B first and then you do see after you finish see someone
00:14:53.059 might have unfixed B and the wrong again and it's really it gets really annoying if you have a lot of people and you try
00:14:58.819 to make very low-level changes you would step on your toes all the time so what
00:15:05.480 else can you do after that is better than log on so you can try to send an
00:15:12.529 email and say hey don't do this or you can send a slack announcement basically tell everyone use the new method not the
00:15:18.949 old one and that might work if they have five people in people in your team but if you have 800 and new people get
00:15:25.220 higher all the time and the oak people forget it or maybe they don't care or for all kinds of reasons this doesn't
00:15:30.949 really work so the idea is that the idea that we had is
00:15:36.600 we we need to find a way to automate this we need to find a way to educate people what is the right behavior and do
00:15:43.740 this education by by code by enforcing certain rules but without pissing everyone off and without everyone having
00:15:51.000 to come to us and ask for help so the
00:15:56.420 the the other extreme that you can do is you can you can basically raised in the old method and say this you can't use it
00:16:03.150 anymore then you run your tests fix all the tests and then you know everything's good but if you run hundreds of
00:16:10.200 thousands of tests and you have a lot of code then you can't really um you're
00:16:15.330 basically forced to make all those changes in one PR and if you ship 100 PS
00:16:22.170 every day then this is definitely going to cause much conflict and all that kind of stuff so you want to find a way to
00:16:27.980 fix those things one after the other ideally like tiny slices 1 PR per change
00:16:34.140 or something like that but without people being able to undo your work so
00:16:41.730 the idea is basically if we have two classes that both do it wrong VNC can we say can we fix B so that if someone
00:16:50.120 basically are basically can we whitelist some of those users without allowing
00:16:55.920 people to add new ones and that that is an idea that we jokingly internally called the list
00:17:01.350 so a shitless is basically a list of things that is already shitty so stuff
00:17:07.200 that is doing it wrong and that's basically a whitelist all of those people are allowed to do it wrong but
00:17:12.690 you can't add new stuff to it so for a second just assume here that the
00:17:18.440 deprecated method knows who called it that that's one of the important ideas and if if that method knows who called
00:17:26.010 it you can do something like this way you say it's the college either B or C then those who are ok because they been
00:17:33.090 around forever and we are working on those but nobody can add new ones and
00:17:38.130 now you can go and you can fix the removed from the list and now this one is still allowed this one is 6
00:17:45.539 nobody can accidentally unfix it and also nobody can add new staff so the
00:17:53.940 problem with this approach is a little bit that I kind of assumed that I was able to change the method that I want to
00:18:00.720 defecate but that method might be in a gem or it might be in rails or it might be you know somewhere that outside of
00:18:07.049 your control or maybe you don't want to go through all of those classes and change the parameters everywhere so
00:18:14.629 maybe maybe the level of granularity that you want to maybe instead of saying
00:18:19.940 B and C are allowed to call this maybe you want to say the shop model and the customer model are illogical is but not
00:18:26.190 check out models or you want to say maybe the internal web requests are
00:18:31.259 allowed but not the external ones or maybe the background jobs are allowed to do it and not the web requests so
00:18:37.399 there's different granularities that you might want and basically the key to how
00:18:43.349 do you implement this granularity is how do you how do you figure out who called
00:18:49.259 it so and something simple that you can do is you can you can come up with with
00:18:56.099 an annotation basically if you locate at the bottom we have a controller or a job and those jobs basically register
00:19:03.989 themselves to the list basically with a what I call context here so it
00:19:09.570 says the context that the code is now running in is the shitty controller foo
00:19:14.820 method or the shitty job and then the list itself can say this is allow
00:19:21.559 this dish should raise an exception unless it's coming from the shitty job and then basically the workflow is
00:19:29.359 first-year what your whitelist you're allowed constant here is empty you run your test you put everything in there
00:19:35.669 that fails the test that one P added you can ship and now you're confident that nobody can add any X identity add any
00:19:41.070 new and then your task is now basically remove one item from this list
00:19:47.669 see which tests fail fix all of them move on to the next one and this is really great for for
00:19:55.130 are generating to-do lists or basically giving your team approach that progress
00:20:01.310 indicator because you can you can see how this list is getting smaller and smaller every day it's super awesome for
00:20:06.620 motivation because people feel like they're making progress eggless is getting smaller and it's much more measurable than a lot full of
00:20:15.070 deprecation spam that nobody is going to look at yeah so ya miss it so to
00:20:26.750 summaries the to summarize this is in our experience very valuable if your
00:20:31.940 nicotine is very broad behavior if you're maintaining some kind of internal API if you need to break down a huge
00:20:37.520 task into small chunks this is help for my team has always become the go-to tool
00:20:42.970 it's awesome for generating to-do lists for having a something to just work
00:20:48.710 through and it's also often to educate your team about how you want them to
00:20:54.320 write code and how you want them to what kind of methods you want them to use in which kinds of methods you want them to
00:20:59.660 stay away from and this education happens at a medic code level so you
00:21:05.960 don't have to talk to all those humans but you just make good error messages and the error message should then
00:21:12.230 explain what is what are you what you want from people so as an example if you
00:21:19.340 have certain certain behavior that you want to deprecate the error message should explain what are you doing wrong
00:21:26.390 why is it wrong it was working yesterday ago eisenhower on how can I fix it who can I talk to if I if I don't know how
00:21:33.080 to fix it and basically the an error message like this via code is enforces
00:21:40.640 good best practices for whatever it is that you want to does you want people to
00:21:46.910 do
00:21:53.400 okay so the the third problem that I want to talk about is the problem unreliability head so this one might not
00:22:01.960 be such a big deal if you are working more like a service-oriented architecture but if you aren't working
00:22:07.630 on the monolithic grades education then this can get really really annoying really quickly so the interesting thing
00:22:16.060 is that there is a there's a I mean most people probably know what the what the problem is with us when I say unreliable
00:22:22.320 unreliable tests but they're not annoying enough to really force you to do something about it but if the more
00:22:29.290 tests you have the more people you have and so on those problems become really not just likely but actually common so
00:22:35.290 if I say when I say unreliable test I mean I mean a test that sometimes passes sometimes fails without you making any
00:22:41.920 changes to the code and for some context we run about 750ti bills per day that's
00:22:49.840 ten minutes each and about 70,000 tests and it's only a single one of those 70,000 tests is unreliable and fails 1%
00:22:56.980 of the time randomly then we lose over one hour of productivity per day and this is the those numbers are based on
00:23:03.100 the assumption that those tests are running on your branch so that one hour of productivity for one person if you
00:23:09.730 apply this to master where the failing test effects way more people then this is even worse so there's two common
00:23:19.900 types of unreliable tests that I want to talk about one is the flaky test so that's the the one that is easy to spot
00:23:27.700 easy to debug it's just a test that you you see all the time sometimes it fails sometimes it doesn't often that is time
00:23:34.480 dependent so maybe you have like a couple lines in your code and if there's
00:23:39.550 more than a second in between sometimes then calculation doesn't match anymore or maybe the test only fails if your STI
00:23:45.880 systems under load because something is out of memory or that kind of stuff and the second
00:23:51.550 category is way more sneaky because the test that is the problem is actually not
00:23:57.700 the one that is failing so those tests are Auto dependent so you might have a test a that fails only if a
00:24:03.610 source has B that fails only a ring before and yeah tracking down
00:24:10.880 those tests and fixing them is going to be super important because yeah they can
00:24:15.890 be a huge productivity tailor for for your team so how do you check them back
00:24:21.610 so a lot of people probably know about software like box neck which is an
00:24:27.500 exception tracking software that allows you to use in production so every time that happens an exception in your
00:24:32.630 production system you lock that exception somewhere and you get like data and all those features and all kind
00:24:38.030 of stuff I thought this was really cool an interesting idea to not only use this
00:24:46.280 in production but only use it for your test so every time a test fails and how is the AI system we actually report it as
00:24:51.440 an exception and then we can we can use all of those data and all of the features on those test failures and you
00:24:57.680 get all kinds of cool stuff where you can see when did the state test start failing which PR might have caused it or
00:25:04.300 you even get like alerting you can say if a certain test fails more than five
00:25:09.890 days in a row or something like that you can notify someone or pindy also automatically and that kind of stuff so
00:25:16.120 as for most problems the very first step here to fixing the problems that you
00:25:22.010 need the visibility first so you need you need to figure out what is actually wrong how bad is it how often does it
00:25:27.650 happen how many people are affected and so on yeah so after you've identified which
00:25:34.670 tests are problematic how do we fix them so with the the first kind of flaky test
00:25:43.750 that's the test that sometimes fails and sometimes passes if you have a suspicion that this test might be flaky obviously
00:25:51.980 you want to you want to confirm if that's actually the case for you what we do is we have a little script on them that runs on our TI system and those
00:25:59.360 little green and red boxes here each box is one container that we run in parallel each container runs that's one single
00:26:05.600 test in isolation a thousand times so here we basically run the same test 64,000 times and if it looks like this
00:26:13.670 it is giving sometimes it's a it's undergoing passes you have confirmation that our this test
00:26:18.780 is actually a problem the other one that it's a little bit Nakia a little bit
00:26:25.350 harder to debug is the leaky test so if you if you're not super familiar with
00:26:33.450 how testing frameworks like mini unit s or r-spec work I found it to be
00:26:39.270 confusing at first when I was first learning about test-driven development is that it doesn't actually create a new
00:26:46.530 process for each test but it runs multiple tests in the same process and that means if those tests by mistake
00:26:53.190 somehow mutate local state those that that mutation is still visible in the next test so it's possible that those
00:26:59.490 tests affect each other and a leaky test is a test that makes another test fail by modifying global State and a really
00:27:08.309 great way to find those tests is using binary search so what you do is basically you you look into your
00:27:15.720 monitoring that I had before and you look at the list of tests that ran and the last one is going to be the one that failed but as I said that the one that
00:27:23.760 fail is not actually the problem the problem is one of the tests that ran before it because one of those costs the last time to fail so you take that list
00:27:30.240 and you divide it by half you run the first half and then the failing test and
00:27:35.900 if it fails again you know the problematic test needs to be in that half if it doesn't fail it into the
00:27:42.960 second half and then you repeat this and repeatedly cut through the list and basically perform a binary search
00:27:50.040 through the list of candidates and then the tool that we have at the very end says if it does identify a leaking test
00:27:56.520 then it says this how you can reproduce it locally here's your leaky test and the test that fails because of a leaky
00:28:02.700 test and then on yeah you can basically track it down this way so putting all
00:28:09.720 those pieces together what we do is we have this automatic monitoring it's a test fails too often or for too many
00:28:15.600 days in a row you can automatically confirm is it leaky is it flaky ping the
00:28:22.620 author of the test say hey this is a problem just fix it when this is super valuable for for
00:28:30.179 productivity okay so quick summary at
00:28:36.179 the end um I talked about three problems the first one is deploys so if your
00:28:41.489 application gets really big and if you have a lot of people and you want to ship a lot of code the one of the really
00:28:46.649 important things you can do is make sure you deploy a really fast because if they are fast you can deploy more often and
00:28:53.039 you can deploy smaller which is safer and yeah so besides making the path also
00:29:00.779 automate them make sure there's a little human involvement as possible
00:29:06.229 problem two that I talked about was what I call too many cooks in the kitchen so you have too many people trying to
00:29:11.700 change the same thing or stepping on each other's toes or accidentally I'm doing each other's work and so on and
00:29:19.249 what I would like you to encourage us try this approach of shiftless driven development which is basically a fancier
00:29:26.489 version of deprecation warrant where you know once you fix one one in it's
00:29:31.559 impossible to add new ones or am-6 that one and the last one that I talked about unreliable and reliable test the
00:29:38.460 important thing that I want you to take away from this is that you can actually use a lot of your production monitoring
00:29:43.469 tools for your tests and you can get a lot of insight out of that and using the
00:29:50.489 binary search approach to a plank leaky test is really um really powerful
00:30:05.109 okay so we show on times with the state exactly two questions okay so yeah he
00:30:10.700 was good I'm really glad I was sitting close to you so my question is I've seen
00:30:21.590 a lot of really powerful internal tools here and some of them I think the underlying philosophies are applicable
00:30:27.710 about across code bases that the specific tools might not be so I'd love to talk you to talk about Shopify's
00:30:34.389 decision-making process for what is important to build here you know you
00:30:40.730 mean which important to a certain point I like yeah which which internal tools are important to build how that how you
00:30:47.779 can make internal tools that fit with the grain of your process effectively
00:30:54.249 description I would say the ones that affect the most people the ones that
00:31:00.139 have the most impact across the company are probably the most important one so for us the test stuff that I talked
00:31:06.259 about was um SS more often as it more
00:31:11.929 often you want to deploy the more annoying it gets if there's unreliable tests so that was one where we thought
00:31:17.899 well is affecting 500 people and if this is if it has too many safety tests then there's 500 people who can't get
00:31:24.350 any work done so that was a good candidate for something that we need to work on I don't know that that onto the
00:31:31.850 person yeah thank you okay great let's try and get someone
00:31:37.009 from the other side of the audience okay
00:31:42.830 um hi so first of all it's a very interesting stop and I have a question
00:31:49.279 and the subject of unstable test because I've also had my share of you know
00:31:56.080 that's it suddenly break one day so can you share your experience on you know
00:32:02.869 the most annoying and difficult unstable test statistics and how did you fix the
00:32:12.270 in my experience the safety tests are usually pretty easy to to identify and
00:32:19.470 usually pretty easy to fix as well the the really annoying ones other geeky ones where the test that is failing is
00:32:25.290 actually not the test that is the problem and stuff that I've seen a lot is so so often the problem is cause I
00:32:34.290 state that it's being modified in one test and then some other test somehow is
00:32:39.510 affected by that and something that I see a lot of stuff related to caching so
00:32:45.750 someone was trying to be smart and catch something like a global variable or often like related to Rails autoloading
00:32:53.280 where the first test costs a certain class or a certain constant to be auto
00:33:00.060 loaded and then the second test behaves differently because of that that is often really annoying there's a lot of
00:33:07.110 very annoying details about how
00:33:13.910 transactional fixtures for example work in rails like the one thing that a lot of people seem to run into a chopper is
00:33:20.370 that if if your test makes a table modification like an alter table
00:33:25.740 statement to add a column for example those statements for example actually
00:33:31.500 cause the database commit which means the test does not correctly rollback the changes and all those kind of really
00:33:39.260 intricate details that most people don't know about and they shouldn't need to know about it but they affect your test
00:33:44.520 like those are in my experience really annoying thank you all right cool thank
Explore all talks recorded at Red Dot Ruby Conference 2017
+12