A Tale of Two Feature Flags

00:00:11.990 Hello everyone.

00:00:14.450 My name is Rebecca Sliter, and I'm an engineering manager at Kickstarter. Before joining Kickstarter, I was a consultant, which involved traveling to clients of all shapes and sizes to discuss software development practices.

00:00:20.280 Because of that experience, I've grown to care greatly about the tools and practices that help teams deliver better code more efficiently.

00:00:25.560 In a nutshell, I believe development should be as simple as possible. Therefore, I try to avoid writing clever code, focusing instead on clear communication of ideas, domains, or tools across the development team.

00:00:29.910 This brings me to the point of my talk: A Tale of Two Feature Flags. Specifically, we'll explore what happens when a team implements a set of features behind flags and ships them. Today, we'll discuss what makes a flag successful, comparing it to other feature flagging strategies that may not work out well.

00:01:07.710 Software development, in many ways, has never been better. Over the last few years, we have developed languages, patterns, and release strategies that abstract away much of the grunt work, allowing us to focus on delivering great products. Yet, these conventions, patterns, and languages can be misused if not guided appropriately.

00:01:21.080 While it's the best of times, it can also be the worst. Feature flags, for instance, are designed to ease our workload but can unexpectedly introduce complexity that detracts from our overall productivity.

00:01:32.190 Let's first focus on what makes feature flags powerful. Features begin with an idea, igniting the excitement of building something great and delivering value to users.

00:01:39.540 However, this journey often leads developers to face the hard truth: writing software requires significant effort. Developing a feature from start to finish can be time-consuming, necessitating coordination across the entire team.

00:01:57.420 Once the development work is complete, you're often left with a sizable bundle of code ready for deployment—and it can be risky to release that all at once.

00:02:04.880 Even if the code has been tested, many moving parts are involved, and small bugs are likely to slip through undetected. The uncertainty of how everything will perform in a live environment is daunting. What would be ideal is to test ideas on a small group of users first.

00:02:20.100 However, large monolithic releases hinder this possibility. Moreover, your implementation may depend on new databases or third-party APIs, making simultaneous deployment risky.

00:02:28.710 Communicating these risks to stakeholders can be challenging; they often push for grand features to be released. You might find yourself as the naysayer, caught between business ambitions and technical realities.

00:02:37.680 Yet, we all know the solution: put features behind a feature flag. At a high level, flags are configured in their own files—often YAML files—where you can decide which features to enable or disable.

00:02:44.600 Initially, these files are straightforward, containing boolean values that define the state of different features. As development unfolds, rather than fully implementing a feature, you query these boolean configurations to determine the user experience.

00:02:59.750 Depending on the state of the flag, users will either access the existing UI or the newly implemented feature. This method allows for iterating on your development without risking a complete overhaul upon release.

00:03:14.850 Feature flags help address many challenges we face in software releases. If a feature is complex, we can deploy it iteratively, alleviating the burdens of massive deployments that require extensive configurations upfront.

00:03:26.460 Once a feature is ready to be shown, you can restrict access through the flag, making it available only to specific users, perhaps internal staff for testing.

00:03:39.750 As you gain confidence, you can gradually expand access until everyone can see the feature. Afterwards, you remove the flag and the surrounding conditional logic, making the feature universally available.

00:03:48.080 Additionally, feature flags can integrate smooth A/B testing methodologies, where users are shown different versions of a feature to assess performance before making final decisions.

00:04:04.840 Another practical use of feature flags is employing them as kill switches, allowing teams to deactivate problematic components of their code without affecting the entire application.

00:04:17.739 For instance, if a page experiences a spike in load due to high traffic, you can disable specific features like commenting without taking the entire page down.

00:04:28.200 These benefits make feature flags seem like the best of times. However, what happens when a team introduces too many flags or mismanages them? Let's discuss what occurs in practice when a team decides to ship code behind a feature flag.

00:04:39.820 Often, teams begin their journey with the best intentions. I worked on a project last fall involving faceted search for a retail client, where search results came from several independent sources.

00:04:51.020 Given the uncertainty surrounding the performance of these external services, we decided to encapsulate the feature behind a feature flag, allowing us to test its impact on user experience.

00:05:05.400 However, as with many projects, changes in scope emerged. The external dependency wasn't ready when we needed to move to the next facet, which required a new separate flag to enable its deployment.

00:05:20.490 What began as a tidy implementation turned into a mess with multiple feature flags. While this may not seem inherently bad, it complicated our development process significantly.

00:05:32.790 This tight coupling of features also meant that we later struggled to determine how to display the user interface should one flag be active and the other inactive.

00:05:47.540 It required extensive discussion with stakeholders and resulted in a truth table to ensure we were clear on the business logic, which ironically introduced more complexity to our code.

00:06:05.080 Testing the resulting features became overwhelming due to the intricate dependencies, making it easy for bugs to slip through. Our QA team struggled to verify whether the end user experience matched expectations.

00:06:18.270 This situation illustrates my first key point: treating independent flags as a means to iteratively release features can be exceptionally challenging.

00:06:31.650 Around this time, our team recognized we had limited options: either scrap the project and push back on the project manager or continue carefully by testing the feature extensively.

00:06:42.929 We chose to proceed with care, but it quickly became burdensome. While manual testing allows for thorough checks, it’s impossible to guarantee that everything is perfectly checked, and mistakes will inevitably happen.

00:06:55.340 We found ourselves spending significant time testing instead of obtaining valuable feedback from our clients or users, which is fundamentally what putting something behind a feature flag should accomplish.

00:07:08.850 Through this, we learned an important lesson: feature flags should simplify the process, not add unnecessary complexity. If implementing a flag consumes more time than the feature it encapsulates, something is amiss.

00:07:21.010 The key takeaway from my experience with faceted search is that we should stick to one flag at a time for a given feature to maintain simplicity and clarity.

00:07:32.360 Now, let’s delve into the implementation of one of those flags. The search feature I previously mentioned was designed to deliver different types of results we referred to as facets.

00:07:43.560 To manage these facets, we initially implemented a single feature flag backend. However, as we adopted this method, it spread throughout our codebase.

00:07:54.740 The challenge arose when each team used a different strategy for feature flags, leading to difficulties in unit testing since each test had to know the flag state.

00:08:10.240 What was supposed to be a simple implementation ended up resulting in tests coupling deeply with the flag's state, complicating maintenance and future enhancements.

00:08:23.000 Removing toggles became a laborious process every time we dialed the flag up to 100 percent. We found ourselves repeatedly updating backend code to accommodate toggles, rather than cleaning up after the implementation.

00:08:39.150 This resulted in additional time spent reviewing code and trying to evaluate the test suite, further complicating the simple task of removing a flag.

00:08:51.840 Ultimately, the flag's presence became a burden, complicating our testing landscape and delaying timely delivery of code, as we grappled with these intertwined dependencies.

00:09:05.220 Feature flags are designed to streamline our workflows, yet without proper management, they can morph into significant burdens. Making flags permanent often adds unnecessary complexity to systems.

00:09:18.529 Teams often hesitate to remove flags because they fear they might break critical functionalities, leading to a scenario where flags intended to be temporary become immortalized.

00:09:32.540 Some flags morph into kill switches, which can hinder team performance due to the lingering complexity that arises from keeping outdated flags in the codebase.

00:09:46.010 Occasionally, kill switches can be beneficial when used purposefully, such as when Stack Overflow employs a kill switch to disable posting while undergoing maintenance.

00:10:02.720 However, it's paramount that these switches are consciously architected to remain long-term, unlike the temporary nature of a feature flag.

00:10:17.220 When thinking about flagging strategies for temporary feature flags, I would recommend isolating their usage within the codebase.

00:10:30.700 A flag should exclusively reference a single part of your application, reducing potential confusion around its functionality and improving removal efforts.

00:10:46.529 Additionally, treating flagged code as new code, separate from production, encourages clarity around what a flag does and when it should be removed.

00:11:03.880 When it's time to remove a flag, it should be clear what needs to be removed, enabling a smoother workflow and reducing bugs or corruption in the code.

00:11:20.279 Now, let’s pivot to the question of maintaining clean code with feature flags. A frequent issue arises when teams neglect to remove flags after their expected lifespan.

00:11:36.320 Flags often remain in codebases, creating technical debt that can accumulate unnoticed. Over time, this complicates the mental burden when developers interact with the codebase.

00:11:49.170 Teams may convince themselves to leave flags functional for a chance to review performance later. However, unsupported flags represent a blockage that requires ongoing mental energy.

00:12:07.520 Considering the example of Knight Capital, it became evident that ignoring obsolete flags can have dramatic repercussions. Their system experienced a historic failure due to long-lived feature flags.

00:12:18.780 In their case, a feature flag was left on for eight years until a different team inadvertently enabled it, leading to catastrophic financial consequences.

00:12:31.920 Actions were unintentional, but they highlight how the presence of lingering flags not only complicates management but shrouds potential pitfalls in risk.

00:12:44.560 What became clear is that long-lived feature flags should not be ignored. Teams must adopt a proactive strategy to ensure that obsolescence doesn't result in unwarranted failures.

00:13:00.080 A best practice for mitigating these risks is to establish procedures that enhance visibility and accountability surrounding feature flags.

00:13:14.410 For example, introduce deadlines and assign maintainers to each flag to clarify responsibility over its state and future deployments. These actions contribute to an organized approach that promotes continual maintenance.

00:13:31.300 The process at Kickstarter involves regularly checking active flags and escalating accountability when they haven't been touched in some time.

00:13:48.022 By following this strategy, teams can make significant strides toward cleaner code, reducing the risk of features sitting unmonitored indefinitely.

00:14:01.200 Ultimately, while feature flags allow teams to ship code quickly, they require discipline and structure. Without a robust approach to their management, they can become more of a hindrance than a help.

00:14:14.822 It’s important that teams remain vigilant about flag usage to scale effectively and foster a culture that prioritizes removal and efficiency.

00:14:28.720 In conclusion, embracing feature flags as a part of your software development strategy means striking a careful balance between their utility and their potential downsides.

00:14:41.800 As long as we recognize the complexities and adhere to a disciplined methodology, we can harness feature flags to release better code faster and more reliably.