Testing in Production: There is a Better Way

00:00:00 Today's talk is about testing—in production, which often means not testing at all and just hoping it works once it reaches production. However, that's not what we're discussing today.

00:00:05 Our next speaker, Igor Kapkov, will be sharing with us the Branch By Abstraction pattern and associated tools to give you confidence when shipping code.

00:00:12 Igor recently worked at Zepto, where he was nominated as a finalist for the Finis 2022 Emerging Fintech Leader of the Year. Igor is the founder of Refactorly, a developer tool that helps teams move faster with confidence. He is an experienced CTO with a robust background working with governments, fintech, e-commerce, and social networks, helping them scale. He is also a former Homebrew core maintainer and active open source contributor.

00:00:35 Igor mentioned that he is transitioning from pairing with developers to pairing with customers by testing in production. Before we proceed, I would like to thank our speaker sponsors, especially Invisible Light, for supporting this conference and the speakers.

00:01:05 Now, let's welcome Igor Kapkov to discuss Testing in Production.

00:01:10 Hello everyone! I hope you're having just as good a time at the conference as I am. It's great to be back in front of an audience and I want to extend a huge thank you to the organizers, volunteers, MCs, and the AV team in the back.

00:01:27 My name is Igor, and I am the founder of Visible Light. I have been developing software for about 15 years and love sharing my knowledge. The slides for my talk are already available on my website if you wish to follow along. While I’m not as active on Mastodon, you can find me there too. I'm happy to take questions at the end of my talk, and feel free to catch me after or reach out to me on Twitter. I enjoy discussing the topic further.

00:01:55 Today, I will be discussing something that is near and dear to my heart: refactoring code with confidence. The concept I will present is quite simple but to understand why we get there and how we achieve this, we first need to grasp the problems we aim to solve and the benefits we can reap from it.

00:02:32 Let's start with a bit of history, so I want to tell you a story about the Basilica de la Sagrada Familia. This is the longest construction project in the world, which began in 1882 under Architect Francisco de Paula. However, within a year of starting the project, he resigned, and Antoni Gaudí took over.

00:02:55 Interestingly, Gaudí didn't immediately become the Architect Director; he was appointed a year later. Almost 40 years later, when Gaudí died, the project was still only 25% complete.

00:03:12 In 1936, ten years post-Gaudí's death, the Spanish Civil War broke out, leading to a fire in the Crypt and damage to Gaudí's workshop, where many of his original plans for the Sagrada Familia were destroyed.

00:03:32 In 2017, the current team stated their intention to finish the project by 2026, coinciding with the 100-year anniversary of Gaudí's death, but the COVID-19 pandemic disrupted those plans and, as of now, there is no new completion date.

00:03:44 Over time, as technology has progressed, the project has faced changes. Original materials, such as sandstone, were sourced from specific regions of the world, which are no longer accessible, resulting in visible color differences in the sandstone used in the construction.

00:04:01 This situation is quite familiar in our industry as well. We often see CTOs or Architects joining projects, only to become frustrated after a year due to various factors such as better offers or management issues. Eventually, they leave, and the team continues without proper leadership.

00:04:30 As time passes, one of our peers may be promoted to lead the team and they try to implement long-term goals based on their previous experiences. However, there are always new features, customers, and unforeseen issues that disrupt long-term planning.

00:05:00 When management approaches us to implement something new, it often doesn’t align with our current architecture, or we may face scaling issues due to new investments or expansions into different regions. Making changes to the live environment that could affect many users requires a high degree of confidence.

00:05:23 Changing code may seem easy, but doing so with confidence and understanding the potential implications is the difficult part. This raises the question: why are developers often afraid to refactor?

00:05:50 Many startups are built in a rush and once they acquire customers, they realize that customers despise having their workflows disrupted. I recall a time when we switched JSON serializers while working with some team members. After making the change, customers began to complain about missing keys and unexpected behavior.

00:06:06 The two main reasons why developers are hesitant to refactor stem from their uncertainty regarding how certain components of the software operate, especially if they are new to the team and have not worked on those parts before. Furthermore, often, the code in question is fragile, mission-critical, and no one wants to risk breaking it.

00:06:37 To increase our confidence when refactoring, we must utilize testing. However, how many of us here can truly say that we have 100% confidence in our test suite? Please raise your hand if you do.

00:07:02 It seems like there’s only one hand raised. Even those who boast 100% code coverage may still lack confidence in their tests. It’s common for our test suites to grow organically over time as we continuously develop features and fix bugs.

00:07:22 One significant issue we face is the quality of the data we use. The tests we write are often synthetic and not truly reflective of the real-world scenarios they are supposed to represent. How often do your tests still use historical data from five years ago? These tests may miss validations and edge cases that have since been forgotten.

00:07:56 Furthermore, getting accurate test coverage that effectively represents our usage is a challenging endeavor. We may write tests for newly introduced cases, but without the proper data, those tests can become obsolete.

00:08:15 Moreover, running tests takes time, both affecting developer productivity and incurring costs for our business. Even with companies sponsoring this conference, the ongoing costs associated with running continuous integration and delivery systems can add up.

00:08:46 Additionally, data-driven decision-making is crucial. We need sufficient use cases covered in our tests. This encompasses input data to ensure the software behaves correctly under various conditions.

00:09:08 To assess that our new code is indeed better or functioning correctly, we must turn to observable metrics and newer DevOps practices. However, many common observability practices have certain shortcomings.

00:09:29 Many companies resort to testing in production. This approach involves releasing changes to the production environment instead of traditional QA teams. We monitor metrics and exceptions closely, seeking immediate feedback from our users.

00:09:56 The three primary practices that people employ are feature flags, canary deployments, and A/B testing. Interestingly, it’s perplexing how the industry has reached a point where we find it acceptable to release untested code to a subset of users.

00:10:09 This method of performance monitoring typically runs over an extended period, meaning we rely on metrics that can be influenced by numerous external factors. In previous instances, I witnessed my team celebrate what we thought was a significant performance improvement, only to later discover that a problematic customer was no longer using our product.

00:10:31 Despite Agile methodologies emphasizing iteration, many still cling to the mindset that major rewrites should happen all at once. Instead of hoping for a perfect flip of a switch in three years, I advocate for embracing gradual benefits through a methodical approach.

00:11:00 Employing the scientific method—which has stood the test of time—consists of formulating a question, hypothesizing, testing, and analyzing the results. This fits seamlessly with software development practices, where we make hypotheses about how our code will behave.

00:11:29 For example, when rewriting a feature, you might find yourself double-writing due to data migrations and inconsistencies that arise. The Scientist library, created by GitHub, is designed to facilitate this approach using the Branch By Abstraction concept.

00:11:47 The Scientist library is not a replacement for writing tests. Instead, it is a tool that allows you to experiment with new code in production without disrupting existing functionality. It evaluates both the existing and new code simultaneously, allowing for real-time comparisons and performance monitoring.

00:12:24 Let’s consider a short example in code to illustrate how this works. We define an experiment with its metadata and pass an experiment object into a block. Inside this block, we have two methods: 'use' for our existing code and 'try' for the new implementation.

00:12:55 The first method executes the existing code, which we know is functional, while the second method runs the new code. Our goal is to test the performance improvements or benefits the new implementation might provide.

00:13:13 Both blocks will run when a user makes a request, with one potentially returning the result before the other. In cases where the new code fails, the error gets swallowed, ensuring that users remain unaware of any issues, allowing us to compare results under the same load conditions.

00:13:43 Consequently, this method allows for an accurate performance evaluation devoid of the variances seen in staging environments. The library also allows for metadata to be sent along with the results for more granular insights.

00:14:00 One critical aspect is that users should not experience any variations in behavior as we conduct these tests. The insights gained are invaluable, as they help us discover use cases we might not have considered before.

00:14:35 I encourage everyone to explore the Scientist library further to discover its capabilities, including conditional runs, rollouts, and comparison logic.

00:14:52 Now, let’s step back and consider some companies using the Scientist library. It was originally developed at GitHub and has been utilized in several major rewrites, allowing teams to benefit continuously while improving their codebase.

00:15:30 For instance, one team at GitHub completely rebuilt a crucial part of the application over a span of three years, reaping benefits consistently each day. Another example is Trello, which maintains the most popular JavaScript fork of the Scientist library.

00:16:10 Switching JSON serializers was a lesson learned after a customer experienced issues. The Scientist library has also aided my team in switching from Postgres to Elasticsearch and continually improving our approaches.

00:16:30 I believe that for consultants entering a project with limited time, employing Scientist can provide immediate value through performance comparisons and safe experimentation. It enables newcomers to the project to learn and adapt without compromising existing functionality.

00:17:15 Before concluding, let’s address some myths surrounding testing in production. Some believe it can be done effectively in staging or sandbox environments, but these do not reflect real user traffic or loads, making accurate comparisons challenging.

00:17:45 Another misconception is that static typing and compiled languages save developers time and reduce risks. However, they still can’t guarantee that production data will be accurately represented during testing.

00:18:07 It’s important to note that many teams continue using the Scientist library even under high loads. Remember, you can combine it with feature flags and canary rollouts to minimize impacts when deploying new changes.

00:18:31 I’ve also encountered the myth that Scientist is only useful for refactoring. In reality, teams can utilize it to quickly test and compare new features, choosing the one that performs better based on defined criteria.

00:19:00 Addressing side effects of a release is a challenge, but there are techniques to adopt that can help identify observable changes caused by these side effects.

00:19:30 I've been using the Scientist library since 2016 and was so impressed that I built a company around it. It's not often that something I read fundamentally changes how I approach my work.

00:20:00 The name of my company is Visible Light, and we offer services built on top of the Scientist library. If your organization values quality, consider reaching out to us for assistance or collaboration.

00:20:39 In conclusion, this is an image of Sagrada Familia from 1905 to illustrate that with good engineering principles, even the most ambitious construction projects can be successfully completed over time.

00:20:56 Thank you all for attending my talk. The slides are available on my website, and I’ll share them on Twitter as well. I’m happy to field any questions you may have.

00:21:24 Are there any questions about the Scientist library, testing in production, or Visible Light?

00:21:37 Great, I see a hand raised. Please introduce yourself.

00:22:00 The question is whether it’s possible to execute the experiment asynchronously. Currently, it depends on the SDK you're using, as the original Ruby version of the Scientist library doesn't support that directly due to its complexity.

00:22:46 However, there are wrappers and alternative languages that offer support for concurrent execution.

00:22:60 Asking if you can test multiple blocks: yes, you can indeed, and you would compare the performance of those blocks against the existing code.

00:23:15 Are there any additional questions? I'm here to discuss anything further.

00:23:36 If there are no further questions, thank you all once again for your time.