Keynote: Seamless Releases with Feature Flags: Insights from GitHub's Experience

Ruby

Hana Harencarova

@hharen

#feature-flags

#continuous-integration-ci

Keynote: Seamless Releases with Feature Flags: Insights from GitHub's Experience

by Hana Harencarova

The keynote presentation by Hana Harencarova at Euruko 2023 focuses on the use of feature flags to enable seamless releases in software development, drawing from GitHub's experience. Harencarova introduces feature flags, explaining how they allow developers to activate or deactivate features rapidly without full redeployment, thus minimizing disruption in production. The discussion includes her journey from academia to GitHub, and she sets the stage by emphasizing the potential chaos in deployments when issues arise with new features.

Key points discussed during the presentation include:

Definition and Benefits of Feature Flags:
- Feature flags are a mechanism to enable or disable features selectively.
- They decouple feature deployment from release, allowing for safer and more efficient shipping of code.
Implementation and Usage:
- Explanation of using Flipper, an open-source feature flag management tool, and GitHub's own extensions to it.
- Strategies for placing feature flags effectively in code to cleanly manage new features without cluttering the codebase.
Incremental Feature Development:
- Emphasizes the importance of breaking features into smaller components for easier management and code reviews.
- Allows teams to test individual parts of a feature in production before full deployment.
Best Practices:
- Suggestions on maintaining feature flags include:
- Avoiding excessive use and ensuring their timely removal.
- Keeping features behind flags only if they introduce notable risk or complexity.
Real-World Application at GitHub:
- Detailed the default setup for code scanning, explaining the usage of multiple feature flags to launch features iteratively.
- Demonstrated how GitHub navigates inter-team cooperation while managing dependencies between different components.
- Highlighted the complexity in managing feature flags across both monoliths and microservices, advocating for caching and efficient checks.
Conclusion and Future Directions:
- Harencarova concludes with the acknowledgment of the ongoing efforts to refine feature flags, optimize processes, and improve inter-team collaboration while managing features securely and efficiently.
- She also mentions enhancing user experiences by allowing configurations at the organization level for code scanning.

Overall, the talk advocates for a strategic approach to feature flags, ensuring they contribute positively to the development process while mitigating risks associated with feature releases.

In closing, Harencarova encourages attendees to thoughtfully implement feature flags in their own environments for improved project efficiency and effectiveness.

00:00:12.120 Hello, thank you very much for this warm welcome. Today, I would like to talk about feature flags and our approach to using them.

00:00:18.060 Imagine it's Thursday afternoon. Of course, we don’t deploy on Fridays, right? But you've deployed a new feature, everything is going well, you've tested it, and then the deploy starts. Everything is running smoothly, and suddenly you notice the page is slowing down.

00:00:30.539 Errors start appearing, customers begin to fill the support tickets, and it feels like everything is on fire. You think, 'Oh no, now we need to roll back.' This process takes about 40 minutes. The worst part is, is the problem even related to the last deploy?

00:00:42.059 Imagine if you had a magic wand and could instantly stop all of this chaos. You could have this magical event with feature flags. The big advantage of feature flags is that you can enable and disable features within seconds, instead of needing to go through the whole redeploy and rollback process.

00:01:07.439 While feature flags won't save you from everything, they can save you a lot of trouble and make it easier and safer to ship features.

00:01:20.159 Okay, so I'm Hana. Six years ago, I stumbled upon a Ruby meetup called Ruby Monsters in Zurich and that's where I learned Ruby. After conducting academic research in psychology and decision-making, I transitioned to teaching. Eventually, I began working as a software developer and eventually joined GitHub, where I am part of the code scanning team.

00:01:51.840 Today, I'll discuss how we utilize feature flags to deploy our features. I will show an example of a default setup. Oh, and I brought stickers. I even wrote it on my hand because I forgot them the first two days, so if you're interested in some cute GitHub stickers, they are down on the table.

00:02:28.800 First, we will discuss feature flags in general, how to use them, and then we will look at an example of the default setup from code scanning. At the end, we will explore how to work with feature flags at scale.

00:02:58.440 Now, I have a couple of questions for you: who here already knows what feature flags are? Please raise your hands. Great! And now, who has already used feature flags? Very good! And who still uses them today in their current company?

00:03:11.120 Perfect, thank you! The main advantage is the ability to enable or disable features whenever you want, and they are decoupled from the deployment process.

00:03:40.519 You can enable a feature flag for everyone, but you can also use them for testing purposes. That means you could enable them for specific actors, for specific groups. For example, we often ship features first to staff or conduct private beta testing. You can also enable features for a percentage of users or percentage of requests.

00:04:11.640 Feature flags are fundamentally like an 'if' statement. The concept is straightforward: when you create a branch for old and new behavior, you can implement a fictional feature in GitHub. Suppose the developers who like pets want to showcase them to their fellow developers. GitHub would like to implement a feature where next to your profile, you can have your pet profile.

00:04:37.919 In its simplest form, you could use an 'if' statement: if this pet profile is enabled for a user, then you will show a combined user and pet profile; otherwise, you will only show the user profile. It’s straightforward.

00:05:05.080 We use Flipper, an open-source implementation of feature flags, and at GitHub, we have our own extension of Flipper along with our own tooling.

00:05:16.340 To define the feature, you would do something like this in the helper method: define pet profile enabled. Here, we ask Flipper, and in our case, we often check for either actors—at code scanning, we work with repositories, so we frequently enable feature flags for repositories or for the owners of those repositories. You can adapt this to your field. We also use a disabled feature flag, and you might think, 'Isn’t that the same thing?' Not quite. We typically enable a feature flag for a large group while needing to disable it for a specific repository or individual for whatever reason.

00:06:02.820 So we enable the feature flag for a larger group, and then with the disabled flag, we can still deactivate it for individual actors or repositories. The major advantage, as I mentioned, is the safety of deployments. But what do I mean by that? A big part is the decoupling of deployments and feature shipments. We can put risky changes behind feature flags, ship features continuously, and then decide when we are ready to actually ship the features and to whom.

00:06:40.780 If things go wrong or if we need to change something, we can disable feature flags in seconds. Another significant advantage is the ability to ship things incrementally, especially when dealing with large features. If you were to create a feature branch, it could take months of development. Imagine GitHub has over a thousand developers, and the code base is vast.

00:07:11.580 With numerous features being shipped across different parts of the code base, you could end up with two branches after a while. Merging back to the main branch would be quite a hassle. However, with feature flags, we can start shaping features incrementally, allowing for small Pull Requests (PRs). These PRs are easy to review, and we can also divide the code that requires review from our team, speeding up the process for those changes.

00:07:52.919 We can prepare different PRs when we know we’re touching files from different teams, enabling a more seamless review process. Now, if multiple people are working on the same feature, it could be that your entire team collaborates on it. At some point, another team may also need to contribute, and by having one feature flag initially, everyone can ship their code where it needs to go.

00:08:21.390 Moreover, having feature flags allows us to catch potential problems in production early. If planned well, we can test individual parts of a feature before the whole feature is ready, enabling us to test in production under realistic circumstances.

00:08:54.840 Let’s proceed to how to use feature flags. I created a few examples, but let’s stick to the imaginary feature. One of the biggest questions is where to place the feature flags and how to use them. Due to my academic background, I was curious about whether there was research on feature flags, so I checked some studies; there are a few, but the findings reveal that a significant reason people dislike feature flags is the difficulty of removing them.

00:09:06.899 Feature flags can clutter code, and if teams are not diligent in their maintenance, they may become permanent fixtures in the code. However, if you strategically place them, you can considerably reduce the workload. The first instinct might be to place every new line of code under a feature flag, but this isn’t necessary. You want to guard the new behavior, not every new line of code. For instance, if we have a method called pet details, you could return immediately if the feature flag isn’t enabled, and none of the new methods would be called.

00:09:52.799 This works as long as you’re creating new methods specific to that feature. However, if the methods are shared across different features, you’ll need to include the feature flag within those methods. Let's consider another example where you have an existing method, such as a profile hash, that sends data about a user. If the feature flag is enabled, we’d want to add the pet data to that hash. Therefore, we can guard all the new code, and then in the end, we return the profile hash. If the feature flag is off, we just run the first part; if it’s on, we add the pet data.

00:10:49.920 It may seem logical to merely guard the part where we merge the hash since that’s where we care about the final result. You can do that, but the concern is that creating the hash is new code, and if we need to query for the pet data on a user, we could run into issues. Maybe the migrations weren’t executed, and the transition of data wasn’t completed.

00:11:04.740 Users might not have pet data associated with them, and introducing bugs can be a risk because this code could execute even if you’re not using it. So it’s better to guard the entire section.

00:11:54.240 Often, you will also have feature flags within the views. For example, if a new feature is enabled or not, you could adjust how the profile is displayed. Let’s say you want to show pet information in various parts of the UI. Instead of adding multiple if-else branches, which could lead to error-prone scenarios and be difficult to read, consider duplicating the branches for a while. Have one branch with the new behavior and another with the old. This makes it easier to remove the feature flag later, as you can simply keep the branch that works while deleting the old one.

00:12:53.160 However, keep in mind that if a feature flag is in use for a while, changes might occur in the old branch by another team without them considering the new functionality, which could lead to problems. Before deploying the feature, you should always check the old branched code for any changes that might affect your shipping process or the feature itself.

00:13:39.120 So if you're using the same method in multiple views, instead of repetitively checking the feature flag, you can create a single method that checks it once and memorizes the result for reuse. This approach prevents multiple checks from occurring for the same feature flag and can be implemented in many places across your components.

00:14:24.600 Now, when should you use feature flags? We aim to place all new features behind feature flags, as well as risky changes and new bug fixes that could impact performance. However, it is essential to note that we don’t use feature flags for every little change going into the code base.

00:15:14.700 For instance, if changes are small or non-risky, feature flags are typically not necessary. Simple modifications, like altering the text of a button, can be easily tested without a feature flag. Similarly, if a bug fix is straightforward and we are confident it needs to go out immediately, we might choose not to utilize a feature flag. The critical point is that while utilizing feature flags is beneficial, applying them to everything wouldn’t make sense.

00:16:17.580 A couple of tips from this part: when writing your test cases, it’s useful to test both the old and new behaviors, verifying functionality with the feature flag enabled and disabled. Clearly stating the feature flag's state in your test names makes it easier to determine the required changes when removing it.

00:17:14.340 Additionally, caching the feature flag checks improves performance. In our projects, we often name feature flags in a way that indicates which team they belong to, the project they’re associated with, and the specific feature.

00:17:51.420 Now, let's discuss the default setup. As I mentioned, I work on the code scanning team. What does code scanning do? It scans code for vulnerabilities and creates actionable alerts. Initially, if you wanted to set up code scanning, you had to commit a YAML file, which we created at GitHub. However, this process required committing, creating a pull request, getting a review, and merging, which posed hurdles, especially for non-technical users.

00:18:39.000 The solution was to enable users to set up code scanning directly through the UI with just a few clicks. And this is how we approached it. Now, users can select the default setup of code scanning through the UI. The advanced setup takes you back to the generated YAML file where you can make changes. You can select which languages to scan and scan events, and then simply enable the setup, and it works.

00:19:37.020 To get to this stage, we initially created an internal API solely for testing the feature. Our codebase is divided between a Rails monolith and a microservice architecture. We established the first feature flag for this API. Then, we needed to consider our workflow: there’s a monolith, there are microservices, and we figured out what can be done in parallel and what relies on other work.

00:20:12.720 To ship the first version of the default setup, we used three feature flags— one for the internal API, which included necessary changes in the back-end, and another for the UI. Even though we were working on the UI, every change happened under this feature flag. GitHub has protected branches, and we wanted this to run on pull requests while also needing it to run on protected branches.

00:20:56.460 Now, let’s talk about testing. The first step is the tests. You want to have your unit and integration tests for loads with feature flags turned on and off. Then you want to test the changes locally, which we often do using CodeSpaces. We have a specialized CI run where all feature flags are enabled for that run. This ensures that if multiple feature flags are in play, we can catch any potential issues before merging them.

00:21:50.520 CI helps avoid complications, as things could work fine during development, but then if someone else deploys another feature, you might face issues that can be challenging to debug. We also enable feature flags for actors or repositories, allowing us to test features in a review app.

00:22:29.880 A review app is a deployed version of the current application, connected to the production database, sitting between local development and production testing. When deploying, we also conduct canary testing, rolling out the feature to a small percentage of users to monitor with observability tools. Developers can check their features in production before a full rollout.

00:23:09.179 This leads us into the feature shipping process. Initially, we enable the feature flag for the developers working on it. After some time, we enable it for the entire team to encourage experimentation. The benefit of dogfooding—using your own product—helps with testing because we constantly interact with the software daily.

00:24:06.450 For larger features, we typically ship them first to all our employees to collect data and feedback—including any problems we might identify not just for some features but for others. We also utilize private beta ships for a select group of customers who are interested in testing this functionality, and we can enable it just for them.

00:25:05.340 Dark shipping entails rolling out feature flags to a percentage of users or a percentage of requests. The final step is what we call General Availability (GA), where we make the feature available to everyone.

00:25:45.540 During this shipping process, regardless of the stage—whether it’s a review app or during deployment—it is crucial to have observability tools in place. If something goes wrong, you want to see it quickly—monitoring errors and job/request timing helps identify issues rapidly.

00:26:29.280 On January 9th, we shipped the first version of the default setup. This initial iteration was a simple take-it-or-leave-it situation, which was easy to set up for repositories with specific languages but did not allow for further configurability yet.

00:27:10.110 Next comes the cleanup phase. Many teams struggle with cleaning up or neglecting to clean up feature flags. A practical approach is to remove feature flags as soon as you know everything is running smoothly. Typically, it takes a couple of days to a couple of weeks to do so.

00:28:03.720 Testing your code before cleaning up helps dictate which tests need revisions or removals. You can also prepare a pull request for removing the feature flag while it is still fresh in your mind, maintaining easy access to the required changes. In general, you want to eliminate the feature flag from three locations: your code, your tests, and any tooling that you use.

00:28:47.990 During the preparation of this presentation, I found out that GitHub has a script that automates the process of preparing a pull request for removing feature flags. I’m excited to test that!

00:29:36.420 This was just the beginning for the default setup. We launched the feature with the capability to use UI and set up code scanning through a few clicks, but we have no intention of stopping there. We aimed to provide a selectable query suite, add a public API, and incorporate languages.

00:30:15.780 Initially, we started with languages that had higher success rates, and we also wanted to select languages to scan. Ultimately, we moved towards the organizational level, starting with setting code scanning up at the repository level before enabling automatic detection of language changes.

00:31:03.180 One of the final functionalities we aimed to implement was allowing users to decide how often they want to conduct code scanning, and for each part, we utilized different feature flags. This allowed us to develop these parts in parallel without interference from other teams.

00:31:51.180 Another major topic is cross-team cooperation. When working within a single team, things tend to proceed more smoothly since there is a shared understanding of tasks, collaboration is easier, and communication is more efficient. There are scenarios where inter-team collaboration is necessary, such as in the case of implementing feature flags on an organization-wide scale, which involved different teams.

00:32:45.300 We decided to go with two separate feature flags, where each team was accountable for their respective processes. Our aim was to create a UI and an API for organizations, with the UI feature flag being independent and the API being dependent on the UI flag, allowing us to test each without causing delays in development.

00:33:39.180 In the end, we could ship them together, which involved less risk. These changes have been in production from January until now for the code scanning aspect. As stated, we first shipped the basic version, and with every iteration, we systematically added functionality.

00:34:38.520 In conclusion, one of the best aspects was that we could deploy feature flags in our monolith but also in our microservices. This enabled us to work effectively across teams without interference, employing different feature flags for different components which made shipping and testing more straightforward.

00:35:00.900 As we scaled up our use of feature flags at GitHub, they weren’t limited to just the monolith anymore. Our microservices also needed to check if feature flags were enabled, which led to an increased complexity in managing them. Initially, microservices had to request and verify whether a feature flag was enabled from GitHub, which was cumbersome.

00:36:40.680 To simplify this, we created our own tooling that allows checks on feature flags from all microservices. With this development, it provided more complexity since not all microservices are built in Ruby or Rails, but various programming languages. Caching and pre-loading become necessary to ensure application performance isn’t hampered by frequent feature flag checks.

00:37:15.060 As we worked on feature flags, we established a system to categorize 'big features' which involve enabling flags for many users versus smaller flags that only require checking for a specific group or percentage of users. This categorization helped manage performance and reduce the overhead of managing too many requests.

00:37:48.660 Lastly, as I was preparing this presentation, I noticed the amount of feature flags present on a single page in our application can get out of hand. For example, on one request view, we had 95 feature flags, with 58 enabled, which generated 357 calls in 68 milliseconds just for one view. This indicates the importance of managing feature flags wisely to maintain application performance.

00:38:32.940 Every company is unique, and it’s important to assess what works best for yours. I hope you gained some insights on using feature flags that you can apply effectively in your environment. I’d like to wish you all happy feature flagging!

00:39:11.900 Before I finish, I want to mention an upcoming conference in Switzerland that several of us are organizing this year. It will be taking place in November and would be a fun event before Christmas.

00:39:40.300 I also kindly ask you to take out your phones and scan this QR code. There are three or four short questions, and I would love to hear your feedback.

00:40:15.240 Thank you once again for your attention! Let’s stay in touch, and you can find me on LinkedIn or Twitter.

EuRuKo 2023