Technical Debt

Summarized using AI

One does not simply... rebuild a product

Prakriti Mateti • April 11, 2024 • Sydney, Australia

In the presentation titled "One does not simply... rebuild a product," Prakriti Mateti, a director of engineering at Culture Amp, discusses the challenges and strategies involved in rebuilding the company's Performance product. This product, which is nine years old and has a significant customer base, is being rebuilt from the ground up due to its outdated infrastructure and accumulated technical debt. Mateti outlines the reasons for the rebuild and shares the team's journey towards creating a new framework that aligns with the company’s ambition to capture a $3 billion global market.

Key points discussed include:

- Recognition of Need for Rebuild: Mateti emphasizes the necessity of acknowledging when a rebuild is required. Despite attempts to address technical debt and optimize the existing system, the team concluded that these efforts would not suffice.

- Challenges with Monoliths: The existing system comprised two monoliths, leading to complex interdependencies and significant tech debt. Eight teams struggled to introduce new features without colliding with each other’s work, resulting in decreased ownership and accountability.

- Initial Attempts to Refactor: Before deciding on a full rebuild, the team first tried various methods to mitigate tech debt and attempted piecemeal re-architectures. However, these efforts proved too slow to keep up with market demands.

- Radical Reset: After recognizing that previous approaches would not lead to desired outcomes, the team opted for a complete overhaul of the product. Mateti discusses the importance of establishing buy-in from various stakeholders and clearly articulating the vision for the rebuild.

- Building Trust and Transparency: The presentation highlights the value of operating with transparency and trust, seeking forgiveness rather than permission throughout the rebuilding process.

- First Slice Strategy: Mateti outlines a plan to validate their new architecture by initially focusing on a single component, referred to as \'Self Reflections\', rather than attempting an exhaustive rebuild all at once.

- Setting Engineering Standards: The rebuild aims not only to refresh the product technically but also to enhance the engineering culture at Culture Amp, promoting high standards, engagement, and fast-moving teams.

In conclusion, Mateti's insights provide valuable lessons on navigating product rebuilds, emphasizing the importance of recognizing the need for change, securing stakeholder buy-in, and focusing on transparent communication. The journey of rebuilding the Performance product is shared as a case study to assist others who may find themselves in similar situations, providing hope and guidance amidst challenges.

One does not simply... rebuild a product
Prakriti Mateti • April 11, 2024 • Sydney, Australia

"A rebuild is never finished, only started"
"Technical rebuilds are doomed to fail"
"One does not simply walk into Mordor"

We're rebuilding Culture Amp's second largest product - Performance. It's 9 years old, came in as a Series A acquisition 5 years ago, has over 2700 customers with the largest one at 77k users. Against conventional wisdom, we're rebuilding it from the ground up with an aggressive timeline. The underlying model is outdated, slow to iterate on, and not extensible. The monoliths are riddled with tech debt, tightly coupled, patched and band-aided over many times, and won't scale to the $3b global Performance market we're targeting.

That wasn't challenging enough already I'm also using this opportunity to rebuild our engineering culture. Setting a high bar for engineering standards, ways of working, and hoping to improve engagement as we go.

RubyConf AU 2024

00:00:04.200 Thank you! If anyone wants to fight me on the burger sandwich conundrum, I will be available just outside. I can pack a punch; I know I look small, but I'm pretty spry. Alrighty, so let’s begin. One does not simply rebuild a product. Emy Jackson once said, 'It takes three times as long as you expect to rewrite a system.' A colleague of mine at Culture Amp kindly said, 'A rebuild is never finished, only started.' Thank you, Culture Amp, for being an employee experience platform that does a whole bunch of things.
00:00:22.119 However, I specifically work on the performance product, which helps organizations build high-performing teams that ultimately drive better outcomes for the business. This performance product is nine years old and was acquired by Culture Amp five years ago in 2019 through a Series A acquisition. Initially, it was quite small, but since then we've expanded it to about 2,700 customers. On average, these customers launch around 850 performance review cycles during low periods, increasing to about 1,500 during peak times. I'm not bragging; these are not bragging numbers. I'm merely trying to give you a sense of scale.
00:01:10.479 Despite not having a huge number of customers, it's still a decent scale. Our largest performance review cycle typically involves about 40,000 employees. However, the global potential for the performance market is a whopping $3 billion. Therefore, our goal is to build a performance product that can lead in this massive global market. Until last year, we faced many compounded technical obstacles that prevented us from achieving this goal. The Series A product we acquired served us well in 2019, giving us a foothold in the market, but it can no longer support our aspirations towards becoming a market leader.
00:01:41.479 The product itself comprises two monoliths: a frontend monolith and a backend monolith. The underlying domain models, data models, and architecture are all outdated. While they were well set up nine years ago, today's needs have evolved, and our current infrastructure does not fit those needs anymore. Additionally, a variety of other modules cohabitate within these shared monoliths alongside our core performance models, including features for one-on-ones and goals, which are adjacent products. This crowded environment has led to numerous common shared concerns like authorization, authentication, email notifications, and tasks.
00:02:04.320 These shared services and concerns are all living within the same monolith, creating a very complex and crowded system. As a result, we have accumulated significant backend and frontend tech debt, which has been compounded by attempts to aggressively scale and iterate the product since its acquisition. There are also numerous early-stage decisions that, while sensible at the time of the product's initial build, were no longer serving us well by last year.
00:02:41.320 Eight different teams were trying to work within just these two monoliths, with each team stepping on each other's toes due to poor domain separation. Teams would undertake extensive discovery phases and spend weeks starting the implementation of new features, only to find that they collided with another team's work. This unfortunate overlap often caused projects to grind to a halt, preventing the launch of new features. The lack of ownership due to tight coupling made it challenging to have clear accountability over specific domains.
00:03:12.639 Furthermore, our teams did not benefit from any shared tooling or infrastructure that was available at Culture Amp because it was not tailored to support this acquired product. All of these factors contributed to a culture of fear within our engineering teams: teams working too slowly, lacking pride in their work, and ultimately, lacking engagement. Essentially, I am describing the opposite of a high-performance culture.
00:03:38.640 While grappling with these challenges, our customers grew frustrated and tired of waiting for us to innovate. They expected us to meet basic needs, yet we were unable to do so. To cut a long story short, we found ourselves in a very deep hole, which might look familiar to some of you. That’s why I’m here. My name is Prakriti Mateti, and I am a director of engineering. Today, I want to share our journey of trying to climb out of this deep hole through a rebuild.
00:04:27.000 I will walk you through the steps of a full rebuild and stop at each step to share some key reflections. I hope this will make you feel less alone the next time you find yourself in a similar situation, which will inevitably happen. The first step in this journey is recognizing the need for a rebuild. How did we come to this realization? Before discussing any talk of rebuilding, we first tried everything else within our control.
00:05:12.640 Initially, there was a lot of noise about enormous tech debt; I would constantly hear comments like, 'There's so much tech debt.' However, this tech debt lacked a clear classification and was not well understood by our stakeholders. Its cost in terms of impact on our teams was only appreciated by those actually working on those teams. Therefore, we began by showing instead of telling.
00:05:58.560 We identified every last bit of tech debt we could find in both the front-end and backend monoliths. We gave it recognizable names, classifications, and impact ratings. We empowered our senior leadership and executives to discuss the tech debt, including cross-functional partners in product and design. We then prioritized each issue, slowly chipping away at it bit by bit. As with most tech debt remediation efforts, it was taking a long time.
00:06:32.760 We realized that even if we managed to fix all the tech debt, it would still not eliminate the other problems we were facing. Our underlying domain model was still outdated, and we still would not be able to reach that market-leading product for the $3 billion market we were targeting. Next, we tried a piecemeal re-architecture. Within the monoliths, we identified a domain we could re-architect while keeping the rest of the monolith functional. This plan was surprisingly effective, but it was too slow.
00:07:09.679 By the time we finished rethinking one domain, the market could have moved on, leaving us behind. So, we thought, why not extract components from the monoliths? This turned out to be a challenging task because everything was deeply intertwined within those monoliths. Unfortunately, there were no patterns at Culture Amp for doing this at the time. There were no quick solutions for establishing new services or shared tooling.
00:07:47.919 Despite these challenges, we decided to give the extraction a good go. We established a tiger team, separating them from the product teams and tasked them with extracting an example domain. To our surprise, they succeeded in documenting the process so that other domain teams could follow suit. But this effort made us realize that none of these approaches would allow us to escape the hole at the speed we required to lead the market.
00:08:22.679 Although we could have opted for any of those approaches or a combination of them, we suspected that by the time we finished, our competitors would have far outpaced us. Therefore, we needed a radical reset. We needed to burn it all down and start over from scratch. That was the moment we understood we had to pursue a rebuild.
00:09:14.960 As I mentioned earlier, I will share some key reflections at every stage. During this initial phase, we moved from an amorphous statement about having too much tech debt to a clear breakdown with prioritization, estimation levels, and visibility across layers. This built empathy and awareness within the organization. We were able to work cross-functionally, allowing our product and design partners to advocate for tech debt concerns, even in rooms where I wasn’t present.
00:09:57.720 Additionally, we attempted everything possible before suggesting the rebuild with a high level of transparency. By doing this, when the time arrived to say, 'Hey, we need to rebuild this,' people were much more open to hearing that difficult message. Our team had genuinely tried everything else. However, a key lesson learned was that I should have suggested the rebuild sooner.
00:10:33.440 Deciding exactly when to make that suggestion is tough. If suggested too early, teams might dismiss it, thinking engineers are inclined towards greenfield projects and just want to work on tech debt. Conversely, if suggested too late, it leads to wasted time working on the broken product instead of rebuilding it right away. Finding that perfect timing is crucial, and it is a lesson I’m carrying with me into future projects.
00:11:13.560 Another culprit was the need to establish trust and seek forgiveness rather than permission. We performed much of the groundwork without explicit permission, operating with a high degree of transparency. Once we confirmed that rebuilding was essential, we needed to secure buy-in. This meant getting support from the teams who would be involved in the rebuild, as well as cross-functional leaders in product and design.
00:11:43.640 We sought buy-in from senior leadership, including directors overseeing other products and VPs of engineering and product, as well as the exec and board, because this undertaking would require significant resources from Culture Amp. Before we could secure buy-in, we first needed to articulate where the product was going. We created a radar outlining opportunities that would be unlocked post-rebuild, showcasing the promise of a better world following the rebuild.
00:12:24.920 Generating this list was key to justifying the rebuild. We initiated some blue sky architecture brainstorming sessions, gathering a group of engineers, saying to them: 'Imagine there are no constraints.' From there, we conducted many brainstorming sessions. Ultimately, smaller groups consolidated the ideas into a potential architecture for our new rebuilt product.
00:12:57.679 The goal here wasn't to finalize the architecture down to the last detail, but rather to visualize what a creative exercise would produce. If the proposed architecture looked vastly different from our existing product, it made a strong case for the need to rebuild. However, if it appeared too similar, that might suggest we should iterate rather than rebuild. Spoiler alert: our proposed architecture looked vastly different, which is why I'm here sharing this with you.
00:13:32.560 This initial sketch was more of a brainstorming session, and no one was held to any specific architectural decisions at that stage. Building that opportunity radar and conducting these exercises greatly facilitated buy-in from the team responsible for the rebuilding effort. We involved product and design from the beginning, ensuring they were aligned with the direction we were heading.
00:14:08.440 We maintained close communication with senior leadership, leveraging previous attempts to address the tech debt, piecemeal architecture modifications, and monolith extraction efforts to gain their buy-in. It was vital for us to do this groundwork before making our final appeal, as the executives were not easily convinced.
00:14:40.960 To solidify our case, we consulted a trusted external advisor who had successfully completed rebuilds and witnessed others fail. Their insights on potential failure modes proved invaluable. Armed with this input, we prepared an executive presentation that provided a comprehensive narrative covering our performance product. This last step likely helped us secure that final 20% of buy-in.
00:15:06.960 We proposed a plan where we would pause the core performance offering for at least a year, which might have been unbearable for our customers, as they would see no new value during that time. Instead, we suggested that while we were rebuilding one slice, the rest of the performance offering could continue to ship critical updates to retain our customers.
00:15:57.320 We found success with this approach, as one of our board members summarized our situation effectively by saying, 'it sounds like you're grinding to a halt with an unhappy team.' When I heard that, I sensed that we would receive the buy-in we needed. A few things worked well for us in this process: consulting a neutral external advisor provided credibility to our plan, validating the need for a rebuild.
00:16:39.839 Moreover, one of our product VPs did a remarkable job weaving together the holistic narrative, elucidating the entire puzzle. He emphasized that the rebuild was just one aspect of the larger performance puzzle, shifting focus back to customer needs rather than solely our technical struggles. We also took the time to work transparently with stakeholders, ensuring that all key players were involved in every conversation.
00:17:39.440 This approach meant that by the time we made our final proposal, it wasn’t a surprise to anyone involved. They had been part of the process, invested in the outcome, and were eager to support the rebuild. While we slowed down the discussions leading up to the ask, this strategy ultimately helped us move much faster later.
00:18:09.759 However, we faced challenges. We operated under the radar longer than intended, creating uncertainty among engineers who were still working on the existing performance roadmap. They sensed something was changing, leading to anxiety about the future of their projects. I wish we had either worked more openly or wrapped up the secretive aspects more quickly to alleviate their concerns.
00:18:57.520 Additionally, I realized that I needed to create a space for active debate around the technical choices being made. We did not achieve adequate buy-in on these decisions early on, which resulted in significant pivots after several weeks of work had already been performed. This deviation created uncertainty and shook the confidence of many involved in the project.
00:19:42.520 We had secured buy-in, but the proposal to undertake a total rebuild was still a huge risk. We decided to focus on rebuilding just one slice and take it all the way to our customers. Once all customers were migrated onto it, we could gradually rebuild the rest of the product. This first slice was to validate our technology choices, our new domain model, architecture, and our capability to execute in a timely manner that wouldn’t jeopardize our market position.
00:20:31.360 This came with immense pressure and high visibility. If this first slice failed, we risked losing buy-in for the entire initiative. Additionally, engineers who had previously invested significant effort into various tech debt and architecture initiatives felt a sense of loss at having to abandon their contributions when pivoted to this rebuild.
00:21:24.640 Given the stakes involved, we needed to ensure that we initiated the first slice successfully. To do this, we established clear goals for what we aimed to achieve. Specifically, we wanted to deliver a domain model that could take us into the future, an architecture that could grow, and clear ownership for teams throughout the rebuild process.
00:22:11.680 In our ambition, we decided not only to rebuild the product but also to transform Culture Amp. We aimed to create fast-moving teams that would foster a strong engineering culture, provide amazing working methods, and promote high engagement, learning, growth, and development opportunities. While this might seem like madness, we laid out ground rules and principles for how we would approach this rebuild.
00:22:58.640 Our first principle, 'like for like,' signified that this was primarily a technical rebuild. We wouldn't redesign the entire user experience or add numerous features. Instead, we focused on taking the existing product and standing it up over here on a future-ready domain model and architecture. In doing so, we would utilize the latest standards and tooling Culture Amp had to offer, eliminating legacy components we didn’t need and adopting the design system wherever possible.
00:23:52.720 We would enhance product accessibility, uphold high engineering standards, and strive for efficiency. It was essential that we minimized the project length to avoid a never-ending rebuild that dragged on for five years with continuously changing objectives and shifting expectations. Secondly, we aimed to minimize customer risk by freezing the entire core performance component of both front-end and back-end monoliths once the first slice was delivered.
00:24:28.640 After shipping the first slice to production and migrating all customers onto it, the old product would be frozen. We needed to prevent the new product from diverging from the old one and eliminate any feedback mechanisms from the new to the old one. This meant we were committed to systematically decommissioning legacy features while maintaining fully functioning new products.
00:25:13.440 Lastly, we resolved to be deliberate in every step we took, investing time in setting high engineering standards and building high-performing teams throughout the process. The total scope of the rebuild, as we referred to it, consists of six slices. Although I won’t bore you with the specifics of these slices, I will reveal that for the first slice, we selected something we call 'Self Reflections.'
Explore all talks recorded at RubyConf AU 2024
+14