Dylan Blakemore

Pushing to master - adopting trunk based development

Trunk based development is a scary practice to adopt for engineers used to git flow or github flow. But there is ample evidence to show that it leads to higher quality code and faster delivery. So why are so many resistant to pushing to master? In this talk, we'll go over why TBD can be scary, what challenges are involved in pushing for team and company adoption, and how to overcome those challenges

RubyConf 2022

00:00:00.000 ready for takeoff
00:00:16.920 hi everyone I hope everyone's had a good start to rubycon 2022
00:00:23.039 I'm Dylan um I'm incredibly excited to be here this is my first Rubicon and my first
00:00:29.640 conference talk uh so I'm excited a little bit nervous but I hope you all
00:00:34.860 enjoy the talk and can take something away from it I'm a software engineering manager I'm
00:00:41.520 from Cape Town South Africa where I studied mechanical engineering actually but about five years ago I dropped out a
00:00:49.020 PhD and somehow landed a junior Dev job at a company called zappy
00:00:56.760 Zappy's the world leader in automated into End Market Research with offices in
00:01:02.760 Cape Town Boston and London and some satellite people all over the world a stack is mainly rails back-ends with
00:01:10.860 react front end obviously a little bit of python and elixir thrown in for luck we've got millions of lines of Ruby
00:01:19.320 across a number of different apps and we've also got all the tech debt and
00:01:24.600 spaghetti that comes that comes along with that so about two years ago we started to
00:01:30.960 notice some problems with our engineering department uh delivery speed was decreasing engineering happiness was at an all-time
00:01:38.520 low and our Innovation was just stagnating so it really felt like we'd forgotten
00:01:45.479 how to disrupt and we were plotting through the mark of BAU and customer requests and bugs that were happening
00:01:52.860 more and more often because we never knew when we would break something
00:01:58.740 so it obviously needs change now the problems we're facing weren't unique to us uh you know High degrees of
00:02:05.520 coupling poor software practices frustrating process these are all things that most companies at a certain scale
00:02:11.760 have had to deal with so and surely there's something out there that can teach us how to move forward
00:02:21.720 let's go he's gonna do that so enter the state of devops report and
00:02:28.800 the Dora metrics um for those who might not know the state of devops report uh is an annual
00:02:35.340 important report on software engineering and it strives to understand what makes a high performing team
00:02:41.340 other researchers take a rigorous scientific approach to Gathering and understanding that data and all of that
00:02:47.640 is described really well in the Fantastic book accelerate the outcome of years of research was
00:02:53.340 these Dora metrics which together indicate are indicators of high performing software teams
00:03:00.420 deployment frequency is a measure of how often an application can be deployed to
00:03:05.879 production change lead time is the amount of time between writing some code and getting that code into production
00:03:12.959 change failure rate is the rate at which uh failures happen
00:03:19.379 and then mean recovery time is the amount of time it takes to fix any failures
00:03:25.860 uh it has been a new metric added reliability but there's still a bit of debate around its validity so I'm not
00:03:33.360 going to touch on that too much but it's important to notice that these are metrics and not goals anytime a metric
00:03:39.060 becomes a goal it can be gained so we don't want to look at how we can
00:03:44.159 game these and improve them just by doing anything instead you focus on the best practices and the behaviors that
00:03:50.760 have been proven to improve the metrics as a tech lead about two years ago and
00:03:57.000 an em now there's one particular behavior that I'm very very passionate about and that's obviously trunk-based
00:04:03.540 development it's an alternative branching strategy which encourages more regular but
00:04:09.900 smaller commits to master and it directly leads to significant improvements in your deployment
00:04:15.120 frequency metric it's been proven that organizations that
00:04:20.400 practice TBD are more performant but if that's the case then why isn't everyone doing it why aren't we all
00:04:26.580 doing TBD that's what I'm here to try and answer so
00:04:31.820 let's explore TVD and why it's a difficult to adopt that scale
00:04:38.639 um now TBD purists have very specific definitions of trunk-based development if you're doing it right then you should
00:04:45.780 be committing and pushing straight to master and you should be pairing whenever you're writing code
00:04:51.960 um the thing is when I hear this I like to remember one of my favorite favorite Twitter posts by G Paul Hill which is I
00:04:59.699 don't like or value definition wrangling and I don't care whether you call how I work tdd or not probing the boundaries
00:05:05.580 of ideas is great but axiomatizing natural language is boring and often a
00:05:10.800 form of brow beating so he's obviously speaking about test driven development here but I think the
00:05:16.620 same can be said about trunk-based development what's important is the intent and the
00:05:24.240 result if we aren't sticking to the letter of the law but we're still gaining the same benefits then it really doesn't matter
00:05:29.940 whether someone on the internet fights you about the definition
00:05:35.039 but that does mean that we have to figure out the intent and the benefits of TBD
00:05:41.820 so the intent behind TBD is to keep each of our commits smaller and commit more regularly to the trunk it avoids feature
00:05:48.960 and integration branches entirely by understanding that the only true integration branch is master
00:05:55.800 so compare GitHub or git flow branching strategy um they have the differences but a
00:06:03.000 common pattern to both of them is that you have moderately long-lived feature or integration branches of some sort and
00:06:09.720 one or more people are committing to those they're generally deployed to a non-production environment where they
00:06:15.600 are qaed or fiddled around with and then at some stage when the future is done it's merged into master and deployed to
00:06:22.259 production now the problem with that approach is that the longer lift your branch is the
00:06:27.539 more it diverges from the truth you may Branch off from a feature Branch to do
00:06:32.580 some kind of patch of the QA and then merge back into the future branch and then you merge that back into trunk and
00:06:38.400 by the time you do merge it into master master it looks nothing at all like it did when you first branched off
00:06:45.300 so upon some inspection these strategies are filled with problems the probability of merge conflicts is high if you have
00:06:52.199 multiple teams working in the same code base the probability of bugs is high because what you test is not what trunk
00:07:00.360 actually looks like and you also tested probably in a non-production environment so the data
00:07:05.639 and usage is different the probability of big PR's is pretty high
00:07:11.759 you know we all know the meme a thousand lines diff one comment lgtm thumbs up 20
00:07:17.759 lines diff 20 comments it's funny because it's true but what does that say about the quality of the
00:07:24.240 reviews on these big PR's and what does it say about the team's understanding of the
00:07:29.639 code that's going into their Repository personally if it takes me more than five
00:07:35.280 minutes to understand every single line of a pull request I'm not going to review it I'm not going to approve it
00:07:41.220 I'm going to go to the developer and say hey can you break this down for me and really help me understand your code
00:07:50.340 so how do we fix all this actually quite simple you just don't use
00:07:56.280 long running branches treat the trunk as your integration and your testing branch
00:08:01.979 and keep your commits small and precise that way the chance of merged conflicts
00:08:07.319 and Divergence is massively decreased because your branches never really live for much longer than a day or two
00:08:13.500 now because you're committing so frequently and in smaller chunks your peers and commits in general look quite
00:08:18.900 different from other branching strategies um you don't really commit full-fledged
00:08:24.419 features in one go instead the unit of value is a lot smaller if you finished writing a class and it's spec then you
00:08:32.520 ship that it doesn't really matter whether it's been through QA or tested because if a class method function or
00:08:39.300 any piece of code has an Associated spec and it's a good spec there's no reason for it not to exist in master
00:08:46.680 now there's two rules of thumb I like to use for determining whether my commits or pull requests are too complex the
00:08:52.800 first is number of files changed it should always be two a functional change and a test
00:09:00.120 um and then second rule of thumb I use is about the description or your commit message if that includes the word and
00:09:07.440 you're probably doing a bit too much so now that we know what TBD is and a
00:09:13.380 bit of what it looks like in practice let's talk about the benefits so first is that you deliver value much
00:09:20.880 faster but in smaller increments every good commit should deliver some value even if it's just making future
00:09:27.420 development a tiny bit easier you get better feedback faster because your
00:09:32.820 commits are smaller and uh more focused so people actually discuss the functionality instead of whether you
00:09:39.959 know you're using single or double quotes or have a space after your brace um
00:09:45.420 so now this along with more emphasis on pairing creates a better sense of collective ownership within the team
00:09:53.160 uh TBD provides you with a more accurate commit history as well as simpler
00:09:59.160 mergers now emerge conflict is an inherent indication that your work is
00:10:04.200 out of date if you need to resolve a merge Conflict at time of merge then uh
00:10:10.560 you really should go through the entire QA process again because what you're shipping is not what was tested
00:10:17.820 and then speaking of QA it becomes simple more reliable and has a higher quality we've all joked about testing in
00:10:24.120 Pride at some point but when I say it now it's not a joke I don't see value in
00:10:30.720 testing in a non-production environment because it's always going to be a poor imitation of reality
00:10:39.060 and last but not least TBD makes you feel like Elite hacker straight out of the 90s film because
00:10:45.180 there's just something thrilling about committing 10 to 15 times a day and shipping those shipping those all
00:10:52.620 so now all of these points except possibly the last are not my opinion
00:10:57.959 their facts they are proven the state of devops has proven that organizations
00:11:03.300 that practice TBD show higher performance than those that don't so again the question is why isn't everyone
00:11:08.940 doing this in trying to push for adoption at sapi
00:11:14.100 I've desperately tried to answer this question and what I've heard are a lot of excuses
00:11:20.339 they all seem to fall into one of three categories first Common one is that it's not not applicable to my work uh I'm doing a big
00:11:27.899 V factor that has to go in as well and think well my work crosses multiple apps that all have to be updated together or
00:11:33.540 you know I won't know it actually works until everything's running together there's only one case when TBD is not
00:11:39.899 applicable and that is open source because in that case the code owners aren't necessarily the same as the code
00:11:45.660 contributors so you do need some branching strategies every other time it's applicable
00:11:54.060 the second cam is mainly an aversion to change just in general you know this has
00:11:59.279 always worked for us so why change now this should never be a software engineer's mindset our strength is the
00:12:08.760 ability to learn new things and adapt quickly to change being stuck in your way as a software
00:12:14.339 engineer is a fantastic way to become irrelevant
00:12:19.920 so we can cross that off too um
00:12:25.320 now the last category of excuse is that it's difficult
00:12:30.480 and I think this is an interesting one so let's talk about it this is a state of devops report has
00:12:37.680 amazing changes to their target demographics so previously they focused a bit more on our senior Engineers they
00:12:44.160 had 40 percent of respondents having more than 16 years of experience this year they wanted to get feedback
00:12:52.440 from all Juniors so that number dropped down to 13 percent and the results showed that the less
00:12:58.740 experienced developers showed negative results with TBD across the board with a
00:13:03.959 decreased a decrease in perceived overall experience sorry overall performance an increase in
00:13:10.560 unplanned work error proneness and change failure rate that's opposed to senior senior
00:13:16.860 developers who showed the exact opposite results so now this makes it seem like
00:13:22.139 difficulty is a very very valid reason for not adopting TBD but the question is is TBD itself
00:13:28.680 difficult or a poor tool for inexperienced Developers I don't believe so we had a junior
00:13:35.100 engineer join my team earlier this year um zappy was his first software job uh
00:13:40.740 now he's not committing five times a day yet but he's been vocally positive about
00:13:46.320 shorter-lived branches and testing in production I myself have less than five years of
00:13:52.200 experience in the software engineering world and TBD is one of the loves of my life
00:13:57.540 closely behind my dog and closely ahead of my girlfriend
00:14:03.540 um so years of experience can't be a real determiner for the success of TBD
00:14:09.480 and maybe that means that we're looking at it slightly wrong Shia LaBeouf will try to tell you
00:14:15.120 otherwise but you can't just do trunk-based development a trunk can't grow without roots you know roots to
00:14:21.120 stabilize the tree roots uh to feed it and Roots keep it healthy and these roots are software engineering best
00:14:26.940 practices I believe there's four of them which contribute primarily to the growth of
00:14:32.220 this tree the first fruit is design when I first joined zapian for about
00:14:37.500 three years or so while I was there there was never really a focus on designing your code
00:14:44.399 um and you know and documenting that design and getting feedback on it looking back it's it's a bit shocking to
00:14:50.459 me now um and it meant that the first time you know your teammates ever hear about your solution your proposal
00:14:57.660 is when I'd spent weeks building the thing and now I have a PR with 2000
00:15:03.240 lines of code and then you know someone comes along and they say hey there's actually a way better way of doing this
00:15:09.240 but at that point you've sung so much time into it that it's unfeasible to
00:15:14.399 start from scratch you've got deadlines you got PMS breathing down your neck so the code goes in with the promise to
00:15:21.899 clean it up later and I imagine almost everyone has had a similar experience and I also Imagine
00:15:28.019 almost everyone here has not cleaned it up every time it happens
00:15:33.300 so my team has been doing some very ad hoc design for about a year or so but
00:15:39.240 it's only within the last six months of zappy that we've tried to formalize the design phase of software development by
00:15:45.420 introducing rfcs so RFC stands for request for comment and it's just a design doc with any level of detail but
00:15:53.339 the reason for calling it an RFC is to emphasize the collaborative nature of design
00:15:59.100 um you know just like comments on pull requests we want feedback on the design so that we can interact quickly before
00:16:05.160 we we even start coding and we found that we can save weeks of work by catching flaws after only a
00:16:12.360 couple of days you know the best rfcs come from being wrong
00:16:17.399 now I mentioned briefly that the amount of detail in RFC uh Canon should be flexible
00:16:23.699 so it becomes a little tricky to figure out when to actually write one you know as much as we want design to happen we
00:16:29.579 don't want to write an essay for a one-line bug fix uh so I try and use three indicators to figure out whether I
00:16:35.519 should write a design doc the first is if I'm at all unsure about some decision that needs to be made the second if I've
00:16:42.540 got a number of different approaches in mind but I can't really figure out which is potentially the best one
00:16:48.899 and in the last is if a solution or a problem is too complex to easily explain
00:16:54.779 in a couple of paragraphs then I'll write up a nice design doc and get some feedback
00:17:00.180 now design should be a priority regardless of trunk-based development but TBD cannot happen without design
00:17:06.439 because you want to commit small and fast you kind of need to have an outline of what the Stepping Stones look like
00:17:12.419 for a story and that's only possible with design this chart shows the number of
00:17:18.540 deployments versus the number of RF rfcs for most of the teams that's Happy over the past year or so normalized by team
00:17:25.500 size I will let you guess which data point is my team
00:17:31.080 the data is a little bit rough you know some teams were excluded because I only got formed halfway through the year some
00:17:36.179 teams may not have uploaded all of their design documents to our database but they does seem to be a correlation of
00:17:43.080 sorts between the amount of time spent designing and the number of deployers per developer
00:17:49.020 now I'm not saying there's causation here just because you design a dock doesn't mean you're going to ship more but it does mean that or does imply at
00:17:57.960 least that trunk-based development gets enabled by Design
00:18:03.660 next up we've got test driven development you do need to be confident that your code going in is not going to
00:18:10.140 break anything when you deploy it and if you if you're deploying 10 times a day then manual QA is is not feasible
00:18:19.740 um so you need a robust set of Suite of specs to be confident that you won't
00:18:25.140 ship breaking changes and I'm talking about real test driven development not just unit testing you
00:18:31.980 know the strict definition is write us back first make it fail write your code
00:18:37.020 make it pass refactor rinse and repeat I will call on G Port Hill again and say
00:18:43.380 that we should focus on the benefits rather than the semantics you know it's okay if once in a in a while you write
00:18:49.559 your code first then you'll spec it's not the end of the world but I believe the most valuable part of
00:18:54.840 tdd is that the spec should guide the design and there's a very strong correlation
00:19:00.600 between code that's easy to test and that's that easy to refactor and both of those signal uh good tactical design and
00:19:07.260 solid principles in a case study involving some big development teams uh teams reported a 15
00:19:13.620 to 35 percent uh increase or uh decrease in initial velocity uh when using tdd
00:19:21.299 and I think that can be explained by the learning curve you know 40 to 50 percent of Engineers found that found that
00:19:27.179 adoption of tdd was quite difficult uh potentially due to a lack of upfront design
00:19:32.640 but the engineers found that in the long term their velocity increased uh
00:19:38.220 partly because the stability also increased in fact density of defects dropped by between 50 and 90 percent
00:19:45.179 when tdd practices were followed um and then Engineers found that their
00:19:50.580 code quality was was just better and the designs were simpler so how do we get people to do tdd I
00:19:58.860 think the best way is pairing you know have your more experienced Engineers pair with the Juniors and show them how
00:20:04.799 tdd Works something I like to try to do when pairing with a junior is to lead
00:20:10.020 first drive first and just write the specs so I'll just sit right the specs go through what they're supposed to do
00:20:16.080 with them and then I'll hand over the driving the steering wheel to them and they can write the code and that way
00:20:22.260 they learn the value of tdd while also doing the fun part of the coding what a test without some way to run them
00:20:28.919 the answer is the next route a great CI CD pipeline Zappy's got an incredible SRE team so I've never really worked in
00:20:35.940 a world without good CI and CD you know we have what any push to to any
00:20:42.360 branch Branch or test Suite runs Jenkins build a test for visual regressions on the front end you know deploying to prod
00:20:50.160 is a simple one-liner in in a shell we can even deploy from slack but I'm
00:20:55.919 pretty sure I'm the only one who's ever used that so great continuous integration pipeline
00:21:01.679 allows us to be more certain that things won't break and alerts us at every stage
00:21:06.900 of the process and the best part is CI pipelines are incredibly easy to set up
00:21:12.720 these days you know even for personal projects you can build and run tests with a very
00:21:19.260 simple GitHub actions workflow and we're at the point now where there's really no excuse to not have at least
00:21:25.380 some form of automated CI continuous deployment on the other hand
00:21:31.280 is a little bit trickier there are some easy available options out there you
00:21:36.480 know GitHub actions again can deploy applications straight to AWS but as soon as your your web app and
00:21:44.640 your infrastructure becomes complex enough and you have edge cases and strange things starting to appear
00:21:51.179 but you know adoption of continuous deployment does require a real upfront
00:21:56.640 investment but it's one of the most critical components not just for trunk-based
00:22:01.740 development but for efficient software engineering just in general then finally we've got feature Flags or
00:22:09.299 feature toggles as they're also known uh these allow you to switch pieces of functionality on and off for certain
00:22:14.820 users the most simple feature flag is just an if statement which maybe enables something locally but you can also use
00:22:21.960 services like launch Darkly which is what we use and they provide more advanced targeting functionality
00:22:27.860 feature flags are what allow us to test in prod my favorite thing and it makes
00:22:33.120 QA faster and more reliable and avoid straggling PRS being blocked
00:22:39.000 because QA hasn't had time to get to the code but therefore more than just QA
00:22:45.600 um feature Flags can be used to safely expose early stage features uh to L4 beta users
00:22:51.299 they can be used for a b feature testing and with the really Advanced tools you
00:22:56.340 can even have controlled incremental rollouts to your customer base but my favorite application
00:23:02.460 of so I was supposed to put up a flag there my favorite application of feature Flags is quite a selfish one and that's
00:23:09.419 that they allow the devs to ignore the bureaucratic crap that surrounds releasing a new feature
00:23:16.559 this year is a chart on our platform um it shows the behavior change score in
00:23:22.140 a survey for four different ads pretty easy stuff the behavior change question
00:23:27.240 asks How likely a respondent is to I don't know purchase the product in the ad or something along those lines and in
00:23:33.659 this case it was asked on the scale from 0 to 10. and again in this case we want to
00:23:39.720 display the result of that score as a mean so we just mean across all the respondents and get the number and that
00:23:45.960 calculation type is displayed at the top next to the behavior change label because we can switch between you know
00:23:51.900 top box mean whatever you want we thought it's a good idea to have it there but based on some feedback we got
00:23:57.919 customers can quite easily see that it's a mean so they just said it was noise it
00:24:04.200 made it messy we decided to remove that uh that little mean
00:24:10.260 it's our new chart looked like this being a very small change it was done
00:24:15.419 within like a few hours and it was shipped to production and then things just got messy we had sales and customer
00:24:24.419 facing reps shouting at us that we hadn't notified them properly so now they're sales pitches and their demos
00:24:30.720 were all out of date we a product manager asked us to quickly
00:24:36.480 quickly revert so that we could appease the business but we still wanted the change to go in
00:24:43.140 to get approval for this change it took six it took two months to bet all the documentation together and communicate
00:24:49.679 to everyone that needed to know this that little change there from that to that took two months to get approval
00:24:57.000 um it was a one-liner I think two years ago that PR would have been lying around for those two months
00:25:03.299 in in PR health but now we have feature flag so we just added a flag to the code shipped it and
00:25:09.720 gave control of the toggle to the PM presumably at some point he got all the communication out there
00:25:16.919 and turned it on but we didn't have to be involved
00:25:22.380 so getting people to use feature Flags heavily like everything else that enables TBD is not an easy task there's
00:25:28.200 a financial cost when integrating with a flagging service and devs need to be convinced to change their behavior
00:25:35.159 it does take time but eventually you get there you know at zappy we have 69
00:25:40.380 engineers at the moment and 88 active feature Flags so I think that is quite a
00:25:46.620 success so all the data shows that adoption of
00:25:51.720 these best practices while very beneficial is not always easy during the
00:25:57.240 early stages velocity and developer happiness tend to decrease you know engineering mindsets and
00:26:03.539 organizational process they all need to change and an upfront investment in Technologies and training is vital to
00:26:09.179 the success but and even knowing what the roots are and
00:26:15.360 understanding their value is not quite enough to bring about changes and Achieve adoption and that's where you
00:26:22.380 guys come in adoption doesn't happen without people with the drive to push for change or without willing or without
00:26:28.799 people willing to embrace it and it's not just the EMS and the tech leads of the world who have that power anyone
00:26:34.380 with the right mindset and the data to back themselves should be able to convince others
00:26:39.659 now it doesn't happen overnight you can't force it so the best bet is to lead by example start writing design
00:26:45.600 docs for any ticket that you pick up share them with your team when pairing write your specs first use Simple
00:26:51.900 feature Flags even if they're as basic as ifrails.emv.development
00:26:58.100 but there's even more to an effective culture to an effective team than just software best practices
00:27:04.440 we need to nurture at our people and our culture it's been repeatedly shown that
00:27:10.080 our teams and organizations with a generative culture and flexible work arrangements are performed those with
00:27:16.080 bureaucratic or pathological cultures stable teams with devs who stick around are more likely to be the norm in high
00:27:23.100 performing organizations but above all of this we need trust which is the most
00:27:29.220 vital nutrient needed to grow the tree so back to the original question why is
00:27:34.679 it difficult to adopt trunk-based development I think the answer is because trunk-based development itself is not a
00:27:41.159 goal it's a metric TBD is an indicator of good software engineering practices and team health so
00:27:48.179 your focus should be on the roots and the soil because without those the tree won't grow
00:27:54.840 so go forth and crush it and I hope you learned something thank you
00:28:07.400 I guess we have two minutes if anyone has any questions then shoot
00:28:12.720 uh so question is how long does RCI take to run um we've got a number of different apps some of them take
00:28:20.640 two minutes um a couple of them take longer up to I think up to 20 minutes is our longest
00:28:26.700 one something along those lines yep yep so the question is how do you handle
00:28:31.860 like dependent uh PRS I think
00:28:37.080 on one hand uh if a lot of different files are dependent on each other that's an
00:28:43.020 indicator of high coupling and maybe you can have a look at your design
00:28:48.480 um but the other thing I like to do or I like to think about is that everything can go
00:28:55.320 in separately until you have a kind of title together PR and that's when you bring in you know the dependencies that
00:29:01.440 one obviously comes last but yeah I do like to think of the tired together PR coming at the end
00:29:08.820 yeah so the question is around how we deal with unused code um
00:29:14.400 we do have tools for it I can't remember what the tool is called but you can like track you know um you can monitor which
00:29:20.820 code is actually touched with rails um there's some gem for it uh but a lot of the time it's just on
00:29:28.320 the the developer to kind of remember and clean them up after themselves
00:29:34.559 um the question is around automating um uh rules for for I guess big commits the
00:29:41.640 answer is no we haven't automated things uh just because it's like you know it's a
00:29:48.600 judgment call at the end of the day I putting two rigorous rules in place I think leads to
00:29:54.120 more drama than it does anything else so I'd rather just make a judgment all right
00:30:02.220 I think that's it right thank you