00:00:16.710
So, I'll be talking about developer productivity engineering. My name is Panayiotis Thomakos, and you can find me on the internet.
00:00:19.330
I work at a company called Strava. We are a GPS-based training site and social network for athletes, and we have about 50 engineers working there. I've been there for about eight years, but more recently, I've been working in productivity engineering.
00:00:26.800
Productivity engineering effectively means that my job is to try to make other people as productive as possible. Even though I'm technically a team of one, there are many other people at Strava who spend some portion of their time working on productivity-related tasks. The takeaways from today can apply to any engineering organization or even to yourself in your personal projects.
00:00:58.769
So, what does it mean to be productive for me? Productivity is inherently tied to happiness, which really means how engaged my mind is on any particular task. The more time I get to focus on challenging and interesting problems, and the less time I have to spend on repetitive and mindless tasks, the more productive I am.
00:01:16.060
Often, this means that I'm using automation to reduce those repetitive and mindless tasks. While productivity engineering isn't solely about automation, it is a significant part of it. Today, I want to discuss automation because it often costs engineering time and effort to build it, and it's not always obvious how we prioritize that or even make a case that it's important.
00:01:38.400
You may have found yourself in a situation where you're battling with your automation, or perhaps you have so much to automate that you don't even know where to start. Alternatively, you might be working hard on your next feature, leaving you no time to think about automation. It's okay to decide that automation is not a priority for you right now.
00:02:07.250
However, it can be a bit disconcerting to feel like you can't make that decision strategically. At Strava, I've developed a framework that helps you think systematically about automation and when it might be appropriate to dedicate time and effort to automate something. It's called Developer Productivity Engineering, or DPE, named after Site Reliability Engineering developed at Google.
00:02:27.690
Google uses Site Reliability Engineering to apply engineering practices to the problems of site reliability and operations. Similarly, Developer Productivity Engineering uses engineering practices to solve the challenges of developer productivity. It can be broken down into three steps: identify, measure, and prioritize. I will tell you about each of these.
00:02:51.140
Let's start with identifying productivity bottlenecks. Sandi Metz once said, 'Duplication is far cheaper than the wrong abstraction.' When we're writing code, this means we can't just go around duplicating stuff because it introduces wrong abstractions. Wrong abstractions have a high long-term cost; they're difficult to change and maintain.
00:03:15.020
The same concept applies to productivity engineering. Just because you've done something twice does not mean it's worth automating. We don't want to end up in a situation where the work required to maintain our automation exceeds the cost of just doing the task manually.
00:03:40.489
There is a more effective heuristic, known as toil. Site Reliability Engineering has its exact definition, and I believe it applies well to Developer Productivity Engineering. Toil usually refers to hard or menial labor, but that's not a rigorous definition. For our purposes, toil refers to a task that satisfies a set of approximately six criteria.
00:04:00.270
The first criterion is that the task needs to be manual. This might seem obvious, but if a machine is already doing it, our threshold for automation should be significantly higher. The second criterion is that the task must be repetitive. This means it needs to occur frequently enough—once or twice a week, month, or quarter—to warrant investment in automation.
00:04:24.320
The task should also be automatable. We must at least be able to envision or have the budget to put software engineering effort into the task. If we can't, it's probably not worth automating. Furthermore, the task should be tactical, not strategic; meaning it should occur in response to something measurable, like CPU load or site load.
00:04:49.650
The task should not provide enduring value. Essentially, if doing this task yields similar results repeatedly, then there's no permanent improvement, making it a good candidate for automation. Lastly, it helps to know if the task will scale linearly with growth or even faster. As we add more engineers or more commits, the task will likely become more cumbersome.
00:05:10.869
These are the six criteria. I will give you a simple example from my own work at Strava. We have a Ruby CLI that we run on our machines to deploy the website and API twice a day. This deployment script manages the intricate details of changing the bytes on all our EC2 servers and restarting them.
00:05:36.270
However, developers still need to be present to ensure that everything operates smoothly and that we don't need a failover or rollback. This task is indeed manual; developers must type commands into their keyboards and monitor specific metrics to ensure nothing goes wrong. It is repetitive, as we run it twice a day.
00:06:08.250
The task is automatable to some extent; we could write a cron task to initiate the deployment, and we could develop a service that pulls metrics and sends notifications when something goes wrong, rather than expecting developers to gather that information themselves.
00:06:24.639
It is tactical since it responds to the passing of the QA suite and occurs twice a day. There is no enduring value in the deployment itself. If you consider that all the effort in creating the code provides enduring value, changing bits and bytes on a server does not.
00:06:41.739
Moreover, the task scales linearly as we hire more people, which we are doing, increasing the number of commits and the frequency of our deployments. You should feel comfortable assessing toil. Almost anyone in your organization should feel up to the task.
00:07:06.680
The most effective way I've found to do this is by talking to people, whether through retrospectives or weekly and bi-weekly one-on-ones. This is usually the best time for the most effective assessment of toil.
00:07:27.990
Now, let's discuss measuring productivity. First, you should not track your time; it’s a tedious process and an ineffective way to measure productivity. One of the main reasons is that productive work is inherently varied, and it’s tough to correlate long-term productivity gains with initial activities.
00:07:51.880
Instead, we take a more indirect approach that means focusing on the negatives affecting productivity and trying to minimize them. The first step is measuring toil, which we do by sending surveys and having conversations with the team. We ask people to estimate how much time they spend on undesirable or manual tasks.
00:08:26.880
Additionally, we instrument all our existing automation. If someone complains about waiting on a deploy script, it’s beneficial to know how long it takes that script to run. While this approach may seem simple and somewhat non-intuitive, it is liberating to feel that you can measure productivity and make improvements.
00:08:49.480
Let’s talk about prioritization. At some point in your organization, a few months down the road, you may have identified a significant amount of toil and measured it, but you might find you have so much toil that you don’t have time to automate it.
00:09:09.910
Determining what to work on next can be straightforward. Calculate four different costs: the toil cost, which means framing this in terms of consistent measurements like hours per week or month spent on a task; the implementation cost—how long it will take to automate a solution.
00:09:29.860
The less clear the implementation is, the higher the total cost should be before you think about beginning. Also, consider that software maintenance incurs costs. If you spend an hour a week on upkeep, that should subtract from the initial time cost of doing the task manually.
00:09:53.720
Finally, factor in onboarding costs. If you bring a new hire into your company, how long will it take for them to own that process? You want to avoid having one person solely responsible for a particular process.
00:10:17.380
We have successfully implemented Developer Productivity Engineering at Strava. Early this year, we recognized that our mobile release process was quite toilsome and calculated that it cost us approximately 20 developer hours each week.
00:10:42.000
After investing significant effort into automation, we estimate that by the end of this year, considering the implementation and maintenance costs, we will save upwards of 17 developer hours per week by automating the process.
00:11:06.960
I believe you all can apply these principles in your own lives successfully. Thank you for your attention. I work at Strava, and we are hiring! You can find my slides at this link and follow me on Twitter, GitHub, etc. Thank you.