Ruby Video | Pushing to master - adopting trunk based development

00:00:00.000 ready for takeoff

00:00:16.920 hi everyone I hope everyone's had a good start to rubycon 2022

00:00:23.039 I'm Dylan um I'm incredibly excited to be here this is my first Rubicon and my first

00:00:29.640 conference talk uh so I'm excited a little bit nervous but I hope you all

00:00:34.860 enjoy the talk and can take something away from it I'm a software engineering manager I'm

00:00:41.520 from Cape Town South Africa where I studied mechanical engineering actually but about five years ago I dropped out a

00:00:49.020 PhD and somehow landed a junior Dev job at a company called zappy

00:00:56.760 Zappy's the world leader in automated into End Market Research with offices in

00:01:02.760 Cape Town Boston and London and some satellite people all over the world a stack is mainly rails back-ends with

00:01:10.860 react front end obviously a little bit of python and elixir thrown in for luck we've got millions of lines of Ruby

00:01:19.320 across a number of different apps and we've also got all the tech debt and

00:01:24.600 spaghetti that comes that comes along with that so about two years ago we started to

00:01:30.960 notice some problems with our engineering department uh delivery speed was decreasing engineering happiness was at an all-time

00:01:38.520 low and our Innovation was just stagnating so it really felt like we'd forgotten

00:01:45.479 how to disrupt and we were plotting through the mark of BAU and customer requests and bugs that were happening

00:01:52.860 more and more often because we never knew when we would break something

00:01:58.740 so it obviously needs change now the problems we're facing weren't unique to us uh you know High degrees of

00:02:05.520 coupling poor software practices frustrating process these are all things that most companies at a certain scale

00:02:11.760 have had to deal with so and surely there's something out there that can teach us how to move forward

00:02:21.720 let's go he's gonna do that so enter the state of devops report and

00:02:28.800 the Dora metrics um for those who might not know the state of devops report uh is an annual

00:02:35.340 important report on software engineering and it strives to understand what makes a high performing team

00:02:41.340 other researchers take a rigorous scientific approach to Gathering and understanding that data and all of that

00:02:47.640 is described really well in the Fantastic book accelerate the outcome of years of research was

00:02:53.340 these Dora metrics which together indicate are indicators of high performing software teams

00:03:00.420 deployment frequency is a measure of how often an application can be deployed to

00:03:05.879 production change lead time is the amount of time between writing some code and getting that code into production

00:03:12.959 change failure rate is the rate at which uh failures happen

00:03:19.379 and then mean recovery time is the amount of time it takes to fix any failures

00:03:25.860 uh it has been a new metric added reliability but there's still a bit of debate around its validity so I'm not

00:03:33.360 going to touch on that too much but it's important to notice that these are metrics and not goals anytime a metric

00:03:39.060 becomes a goal it can be gained so we don't want to look at how we can

00:03:44.159 game these and improve them just by doing anything instead you focus on the best practices and the behaviors that

00:03:50.760 have been proven to improve the metrics as a tech lead about two years ago and

00:03:57.000 an em now there's one particular behavior that I'm very very passionate about and that's obviously trunk-based

00:04:03.540 development it's an alternative branching strategy which encourages more regular but

00:04:09.900 smaller commits to master and it directly leads to significant improvements in your deployment

00:04:15.120 frequency metric it's been proven that organizations that

00:04:20.400 practice TBD are more performant but if that's the case then why isn't everyone doing it why aren't we all

00:04:26.580 doing TBD that's what I'm here to try and answer so

00:04:31.820 let's explore TVD and why it's a difficult to adopt that scale

00:04:38.639 um now TBD purists have very specific definitions of trunk-based development if you're doing it right then you should

00:04:45.780 be committing and pushing straight to master and you should be pairing whenever you're writing code

00:04:51.960 um the thing is when I hear this I like to remember one of my favorite favorite Twitter posts by G Paul Hill which is I

00:04:59.699 don't like or value definition wrangling and I don't care whether you call how I work tdd or not probing the boundaries

00:05:05.580 of ideas is great but axiomatizing natural language is boring and often a

00:05:10.800 form of brow beating so he's obviously speaking about test driven development here but I think the

00:05:16.620 same can be said about trunk-based development what's important is the intent and the

00:05:24.240 result if we aren't sticking to the letter of the law but we're still gaining the same benefits then it really doesn't matter

00:05:29.940 whether someone on the internet fights you about the definition

00:05:35.039 but that does mean that we have to figure out the intent and the benefits of TBD

00:05:41.820 so the intent behind TBD is to keep each of our commits smaller and commit more regularly to the trunk it avoids feature

00:05:48.960 and integration branches entirely by understanding that the only true integration branch is master

00:05:55.800 so compare GitHub or git flow branching strategy um they have the differences but a

00:06:03.000 common pattern to both of them is that you have moderately long-lived feature or integration branches of some sort and

00:06:09.720 one or more people are committing to those they're generally deployed to a non-production environment where they

00:06:15.600 are qaed or fiddled around with and then at some stage when the future is done it's merged into master and deployed to

00:06:22.259 production now the problem with that approach is that the longer lift your branch is the

00:06:27.539 more it diverges from the truth you may Branch off from a feature Branch to do

00:06:32.580 some kind of patch of the QA and then merge back into the future branch and then you merge that back into trunk and

00:06:38.400 by the time you do merge it into master master it looks nothing at all like it did when you first branched off

00:06:45.300 so upon some inspection these strategies are filled with problems the probability of merge conflicts is high if you have

00:06:52.199 multiple teams working in the same code base the probability of bugs is high because what you test is not what trunk

00:07:00.360 actually looks like and you also tested probably in a non-production environment so the data

00:07:05.639 and usage is different the probability of big PR's is pretty high

00:07:11.759 you know we all know the meme a thousand lines diff one comment lgtm thumbs up 20

00:07:17.759 lines diff 20 comments it's funny because it's true but what does that say about the quality of the

00:07:24.240 reviews on these big PR's and what does it say about the team's understanding of the

00:07:29.639 code that's going into their Repository personally if it takes me more than five

00:07:35.280 minutes to understand every single line of a pull request I'm not going to review it I'm not going to approve it

00:07:41.220 I'm going to go to the developer and say hey can you break this down for me and really help me understand your code

00:07:50.340 so how do we fix all this actually quite simple you just don't use

00:07:56.280 long running branches treat the trunk as your integration and your testing branch

00:08:01.979 and keep your commits small and precise that way the chance of merged conflicts

00:08:07.319 and Divergence is massively decreased because your branches never really live for much longer than a day or two

00:08:13.500 now because you're committing so frequently and in smaller chunks your peers and commits in general look quite

00:08:18.900 different from other branching strategies um you don't really commit full-fledged

00:08:24.419 features in one go instead the unit of value is a lot smaller if you finished writing a class and it's spec then you

00:08:32.520 ship that it doesn't really matter whether it's been through QA or tested because if a class method function or

00:08:39.300 any piece of code has an Associated spec and it's a good spec there's no reason for it not to exist in master

00:08:46.680 now there's two rules of thumb I like to use for determining whether my commits or pull requests are too complex the

00:08:52.800 first is number of files changed it should always be two a functional change and a test

00:09:00.120 um and then second rule of thumb I use is about the description or your commit message if that includes the word and

00:09:07.440 you're probably doing a bit too much so now that we know what TBD is and a

00:09:13.380 bit of what it looks like in practice let's talk about the benefits so first is that you deliver value much

00:09:20.880 faster but in smaller increments every good commit should deliver some value even if it's just making future

00:09:27.420 development a tiny bit easier you get better feedback faster because your

00:09:32.820 commits are smaller and uh more focused so people actually discuss the functionality instead of whether you

00:09:39.959 know you're using single or double quotes or have a space after your brace um

00:09:45.420 so now this along with more emphasis on pairing creates a better sense of collective ownership within the team

00:09:53.160 uh TBD provides you with a more accurate commit history as well as simpler

00:09:59.160 mergers now emerge conflict is an inherent indication that your work is

00:10:04.200 out of date if you need to resolve a merge Conflict at time of merge then uh

00:10:10.560 you really should go through the entire QA process again because what you're shipping is not what was tested

00:10:17.820 and then speaking of QA it becomes simple more reliable and has a higher quality we've all joked about testing in

00:10:24.120 Pride at some point but when I say it now it's not a joke I don't see value in

00:10:30.720 testing in a non-production environment because it's always going to be a poor imitation of reality

00:10:39.060 and last but not least TBD makes you feel like Elite hacker straight out of the 90s film because

00:10:45.180 there's just something thrilling about committing 10 to 15 times a day and shipping those shipping those all

00:10:52.620 so now all of these points except possibly the last are not my opinion

00:10:57.959 their facts they are proven the state of devops has proven that organizations

00:11:03.300 that practice TBD show higher performance than those that don't so again the question is why isn't everyone

00:11:08.940 doing this in trying to push for adoption at sapi

00:11:14.100 I've desperately tried to answer this question and what I've heard are a lot of excuses

00:11:20.339 they all seem to fall into one of three categories first Common one is that it's not not applicable to my work uh I'm doing a big

00:11:27.899 V factor that has to go in as well and think well my work crosses multiple apps that all have to be updated together or

00:11:33.540 you know I won't know it actually works until everything's running together there's only one case when TBD is not

00:11:39.899 applicable and that is open source because in that case the code owners aren't necessarily the same as the code

00:11:45.660 contributors so you do need some branching strategies every other time it's applicable

00:11:54.060 the second cam is mainly an aversion to change just in general you know this has

00:11:59.279 always worked for us so why change now this should never be a software engineer's mindset our strength is the

00:12:08.760 ability to learn new things and adapt quickly to change being stuck in your way as a software

00:12:14.339 engineer is a fantastic way to become irrelevant

00:12:19.920 so we can cross that off too um

00:12:25.320 now the last category of excuse is that it's difficult

00:12:30.480 and I think this is an interesting one so let's talk about it this is a state of devops report has

00:12:37.680 amazing changes to their target demographics so previously they focused a bit more on our senior Engineers they

00:12:44.160 had 40 percent of respondents having more than 16 years of experience this year they wanted to get feedback

00:12:52.440 from all Juniors so that number dropped down to 13 percent and the results showed that the less

00:12:58.740 experienced developers showed negative results with TBD across the board with a

00:13:03.959 decreased a decrease in perceived overall experience sorry overall performance an increase in

00:13:10.560 unplanned work error proneness and change failure rate that's opposed to senior senior

00:13:16.860 developers who showed the exact opposite results so now this makes it seem like

00:13:22.139 difficulty is a very very valid reason for not adopting TBD but the question is is TBD itself

00:13:28.680 difficult or a poor tool for inexperienced Developers I don't believe so we had a junior

00:13:35.100 engineer join my team earlier this year um zappy was his first software job uh

00:13:40.740 now he's not committing five times a day yet but he's been vocally positive about

00:13:46.320 shorter-lived branches and testing in production I myself have less than five years of

00:13:52.200 experience in the software engineering world and TBD is one of the loves of my life

00:13:57.540 closely behind my dog and closely ahead of my girlfriend

00:14:03.540 um so years of experience can't be a real determiner for the success of TBD

00:14:09.480 and maybe that means that we're looking at it slightly wrong Shia LaBeouf will try to tell you

00:14:15.120 otherwise but you can't just do trunk-based development a trunk can't grow without roots you know roots to

00:14:21.120 stabilize the tree roots uh to feed it and Roots keep it healthy and these roots are software engineering best

00:14:26.940 practices I believe there's four of them which contribute primarily to the growth of

00:14:32.220 this tree the first fruit is design when I first joined zapian for about

00:14:37.500 three years or so while I was there there was never really a focus on designing your code

00:14:44.399 um and you know and documenting that design and getting feedback on it looking back it's it's a bit shocking to

00:14:50.459 me now um and it meant that the first time you know your teammates ever hear about your solution your proposal

00:14:57.660 is when I'd spent weeks building the thing and now I have a PR with 2000

00:15:03.240 lines of code and then you know someone comes along and they say hey there's actually a way better way of doing this

00:15:09.240 but at that point you've sung so much time into it that it's unfeasible to

00:15:14.399 start from scratch you've got deadlines you got PMS breathing down your neck so the code goes in with the promise to

00:15:21.899 clean it up later and I imagine almost everyone has had a similar experience and I also Imagine

00:15:28.019 almost everyone here has not cleaned it up every time it happens

00:15:33.300 so my team has been doing some very ad hoc design for about a year or so but

00:15:39.240 it's only within the last six months of zappy that we've tried to formalize the design phase of software development by

00:15:45.420 introducing rfcs so RFC stands for request for comment and it's just a design doc with any level of detail but

00:15:53.339 the reason for calling it an RFC is to emphasize the collaborative nature of design

00:15:59.100 um you know just like comments on pull requests we want feedback on the design so that we can interact quickly before

00:16:05.160 we we even start coding and we found that we can save weeks of work by catching flaws after only a

00:16:12.360 couple of days you know the best rfcs come from being wrong

00:16:17.399 now I mentioned briefly that the amount of detail in RFC uh Canon should be flexible

00:16:23.699 so it becomes a little tricky to figure out when to actually write one you know as much as we want design to happen we

00:16:29.579 don't want to write an essay for a one-line bug fix uh so I try and use three indicators to figure out whether I

00:16:35.519 should write a design doc the first is if I'm at all unsure about some decision that needs to be made the second if I've

00:16:42.540 got a number of different approaches in mind but I can't really figure out which is potentially the best one

00:16:48.899 and in the last is if a solution or a problem is too complex to easily explain

00:16:54.779 in a couple of paragraphs then I'll write up a nice design doc and get some feedback

00:17:00.180 now design should be a priority regardless of trunk-based development but TBD cannot happen without design

00:17:06.439 because you want to commit small and fast you kind of need to have an outline of what the Stepping Stones look like

00:17:12.419 for a story and that's only possible with design this chart shows the number of

00:17:18.540 deployments versus the number of RF rfcs for most of the teams that's Happy over the past year or so normalized by team

00:17:25.500 size I will let you guess which data point is my team

00:17:31.080 the data is a little bit rough you know some teams were excluded because I only got formed halfway through the year some

00:17:36.179 teams may not have uploaded all of their design documents to our database but they does seem to be a correlation of

00:17:43.080 sorts between the amount of time spent designing and the number of deployers per developer

00:17:49.020 now I'm not saying there's causation here just because you design a dock doesn't mean you're going to ship more but it does mean that or does imply at

00:17:57.960 least that trunk-based development gets enabled by Design

00:18:03.660 next up we've got test driven development you do need to be confident that your code going in is not going to

00:18:10.140 break anything when you deploy it and if you if you're deploying 10 times a day then manual QA is is not feasible

00:18:19.740 um so you need a robust set of Suite of specs to be confident that you won't

00:18:25.140 ship breaking changes and I'm talking about real test driven development not just unit testing you

00:18:31.980 know the strict definition is write us back first make it fail write your code

00:18:37.020 make it pass refactor rinse and repeat I will call on G Port Hill again and say

00:18:43.380 that we should focus on the benefits rather than the semantics you know it's okay if once in a in a while you write

00:18:49.559 your code first then you'll spec it's not the end of the world but I believe the most valuable part of

00:18:54.840 tdd is that the spec should guide the design and there's a very strong correlation

00:19:00.600 between code that's easy to test and that's that easy to refactor and both of those signal uh good tactical design and

00:19:07.260 solid principles in a case study involving some big development teams uh teams reported a 15

00:19:13.620 to 35 percent uh increase or uh decrease in initial velocity uh when using tdd

00:19:21.299 and I think that can be explained by the learning curve you know 40 to 50 percent of Engineers found that found that

00:19:27.179 adoption of tdd was quite difficult uh potentially due to a lack of upfront design

00:19:32.640 but the engineers found that in the long term their velocity increased uh

00:19:38.220 partly because the stability also increased in fact density of defects dropped by between 50 and 90 percent

00:19:45.179 when tdd practices were followed um and then Engineers found that their

00:19:50.580 code quality was was just better and the designs were simpler so how do we get people to do tdd I

00:19:58.860 think the best way is pairing you know have your more experienced Engineers pair with the Juniors and show them how

00:20:04.799 tdd Works something I like to try to do when pairing with a junior is to lead

00:20:10.020 first drive first and just write the specs so I'll just sit right the specs go through what they're supposed to do

00:20:16.080 with them and then I'll hand over the driving the steering wheel to them and they can write the code and that way

00:20:22.260 they learn the value of tdd while also doing the fun part of the coding what a test without some way to run them

00:20:28.919 the answer is the next route a great CI CD pipeline Zappy's got an incredible SRE team so I've never really worked in

00:20:35.940 a world without good CI and CD you know we have what any push to to any

00:20:42.360 branch Branch or test Suite runs Jenkins build a test for visual regressions on the front end you know deploying to prod

00:20:50.160 is a simple one-liner in in a shell we can even deploy from slack but I'm

00:20:55.919 pretty sure I'm the only one who's ever used that so great continuous integration pipeline

00:21:01.679 allows us to be more certain that things won't break and alerts us at every stage

00:21:06.900 of the process and the best part is CI pipelines are incredibly easy to set up

00:21:12.720 these days you know even for personal projects you can build and run tests with a very

00:21:19.260 simple GitHub actions workflow and we're at the point now where there's really no excuse to not have at least

00:21:25.380 some form of automated CI continuous deployment on the other hand

00:21:31.280 is a little bit trickier there are some easy available options out there you

00:21:36.480 know GitHub actions again can deploy applications straight to AWS but as soon as your your web app and

00:21:44.640 your infrastructure becomes complex enough and you have edge cases and strange things starting to appear

00:21:51.179 but you know adoption of continuous deployment does require a real upfront

00:21:56.640 investment but it's one of the most critical components not just for trunk-based

00:22:01.740 development but for efficient software engineering just in general then finally we've got feature Flags or

00:22:09.299 feature toggles as they're also known uh these allow you to switch pieces of functionality on and off for certain

00:22:14.820 users the most simple feature flag is just an if statement which maybe enables something locally but you can also use

00:22:21.960 services like launch Darkly which is what we use and they provide more advanced targeting functionality

00:22:27.860 feature flags are what allow us to test in prod my favorite thing and it makes

00:22:33.120 QA faster and more reliable and avoid straggling PRS being blocked

00:22:39.000 because QA hasn't had time to get to the code but therefore more than just QA

00:22:45.600 um feature Flags can be used to safely expose early stage features uh to L4 beta users

00:22:51.299 they can be used for a b feature testing and with the really Advanced tools you

00:22:56.340 can even have controlled incremental rollouts to your customer base but my favorite application

00:23:02.460 of so I was supposed to put up a flag there my favorite application of feature Flags is quite a selfish one and that's

00:23:09.419 that they allow the devs to ignore the bureaucratic crap that surrounds releasing a new feature

00:23:16.559 this year is a chart on our platform um it shows the behavior change score in

00:23:22.140 a survey for four different ads pretty easy stuff the behavior change question

00:23:27.240 asks How likely a respondent is to I don't know purchase the product in the ad or something along those lines and in

00:23:33.659 this case it was asked on the scale from 0 to 10. and again in this case we want to

00:23:39.720 display the result of that score as a mean so we just mean across all the respondents and get the number and that

00:23:45.960 calculation type is displayed at the top next to the behavior change label because we can switch between you know

00:23:51.900 top box mean whatever you want we thought it's a good idea to have it there but based on some feedback we got

00:23:57.919 customers can quite easily see that it's a mean so they just said it was noise it

00:24:04.200 made it messy we decided to remove that uh that little mean

00:24:10.260 it's our new chart looked like this being a very small change it was done

00:24:15.419 within like a few hours and it was shipped to production and then things just got messy we had sales and customer

00:24:24.419 facing reps shouting at us that we hadn't notified them properly so now they're sales pitches and their demos

00:24:30.720 were all out of date we a product manager asked us to quickly

00:24:36.480 quickly revert so that we could appease the business but we still wanted the change to go in

00:24:43.140 to get approval for this change it took six it took two months to bet all the documentation together and communicate

00:24:49.679 to everyone that needed to know this that little change there from that to that took two months to get approval

00:24:57.000 um it was a one-liner I think two years ago that PR would have been lying around for those two months

00:25:03.299 in in PR health but now we have feature flag so we just added a flag to the code shipped it and

00:25:09.720 gave control of the toggle to the PM presumably at some point he got all the communication out there

00:25:16.919 and turned it on but we didn't have to be involved

00:25:22.380 so getting people to use feature Flags heavily like everything else that enables TBD is not an easy task there's

00:25:28.200 a financial cost when integrating with a flagging service and devs need to be convinced to change their behavior

00:25:35.159 it does take time but eventually you get there you know at zappy we have 69

00:25:40.380 engineers at the moment and 88 active feature Flags so I think that is quite a

00:25:46.620 success so all the data shows that adoption of

00:25:51.720 these best practices while very beneficial is not always easy during the

00:25:57.240 early stages velocity and developer happiness tend to decrease you know engineering mindsets and

00:26:03.539 organizational process they all need to change and an upfront investment in Technologies and training is vital to

00:26:09.179 the success but and even knowing what the roots are and

00:26:15.360 understanding their value is not quite enough to bring about changes and Achieve adoption and that's where you

00:26:22.380 guys come in adoption doesn't happen without people with the drive to push for change or without willing or without

00:26:28.799 people willing to embrace it and it's not just the EMS and the tech leads of the world who have that power anyone

00:26:34.380 with the right mindset and the data to back themselves should be able to convince others

00:26:39.659 now it doesn't happen overnight you can't force it so the best bet is to lead by example start writing design

00:26:45.600 docs for any ticket that you pick up share them with your team when pairing write your specs first use Simple

00:26:51.900 feature Flags even if they're as basic as ifrails.emv.development

00:26:58.100 but there's even more to an effective culture to an effective team than just software best practices

00:27:04.440 we need to nurture at our people and our culture it's been repeatedly shown that

00:27:10.080 our teams and organizations with a generative culture and flexible work arrangements are performed those with

00:27:16.080 bureaucratic or pathological cultures stable teams with devs who stick around are more likely to be the norm in high

00:27:23.100 performing organizations but above all of this we need trust which is the most

00:27:29.220 vital nutrient needed to grow the tree so back to the original question why is

00:27:34.679 it difficult to adopt trunk-based development I think the answer is because trunk-based development itself is not a

00:27:41.159 goal it's a metric TBD is an indicator of good software engineering practices and team health so

00:27:48.179 your focus should be on the roots and the soil because without those the tree won't grow

00:27:54.840 so go forth and crush it and I hope you learned something thank you

00:28:07.400 I guess we have two minutes if anyone has any questions then shoot

00:28:12.720 uh so question is how long does RCI take to run um we've got a number of different apps some of them take

00:28:20.640 two minutes um a couple of them take longer up to I think up to 20 minutes is our longest

00:28:26.700 one something along those lines yep yep so the question is how do you handle

00:28:31.860 like dependent uh PRS I think

00:28:37.080 on one hand uh if a lot of different files are dependent on each other that's an

00:28:43.020 indicator of high coupling and maybe you can have a look at your design

00:28:48.480 um but the other thing I like to do or I like to think about is that everything can go

00:28:55.320 in separately until you have a kind of title together PR and that's when you bring in you know the dependencies that

00:29:01.440 one obviously comes last but yeah I do like to think of the tired together PR coming at the end

00:29:08.820 yeah so the question is around how we deal with unused code um

00:29:14.400 we do have tools for it I can't remember what the tool is called but you can like track you know um you can monitor which

00:29:20.820 code is actually touched with rails um there's some gem for it uh but a lot of the time it's just on

00:29:28.320 the the developer to kind of remember and clean them up after themselves

00:29:34.559 um the question is around automating um uh rules for for I guess big commits the

00:29:41.640 answer is no we haven't automated things uh just because it's like you know it's a

00:29:48.600 judgment call at the end of the day I putting two rigorous rules in place I think leads to

00:29:54.120 more drama than it does anything else so I'd rather just make a judgment all right

00:30:02.220 I think that's it right thank you