Paris.rb Conf 2020

The First Feedback We Get

Paris.rb Conf 2020

00:00:14.130 Hello everyone, I'm Nicolas. I've been a developer for a decade now, and during that time I have made a lot of mistakes. Today, I want to share one of them with you. Preparing for this talk led me to what I call the first feedback we get. This is why this talk is structured the way it is.
00:00:21.580 I will start by telling you more about that mistake. After that, we'll explore a couple of delivery processes with the goal of finding ways to avoid similar mistakes in the future. We will also see how the story ended.
00:00:32.440 I worked for a car-sharing marketplace called Getaround. One of the main features of a marketplace is to process payments and move money from one customer to another. We used Stripe to handle thousands of payments today, but at some point last year, or maybe the year before, we learned about a new regulation that would impact all those payments.
00:00:58.270 This is the regulation in Brittany that had been altered, which required us to migrate from one payment system to a completely different one. Before this regulation, we would ask our provider to charge customers, and we would immediately get an outcome: the transaction would either be a success or a failure.
00:01:14.950 After the regulation, Stripe would require an extra authentication step to be performed by the customer. This meant transactions could remain in a suspended state, and we needed to allow our customers to interact with their bank to resume the transaction and reach a final state.
00:01:53.020 None of our abstractions around payments or customer interactions supported reasonable transaction handling, which made this project quite challenging for me. I usually don't work with mobile applications, which made many of the constraints new to me. For example, maintaining backward compatibility for all clients was a significant challenge.
00:02:09.520 At the time, the company had been acquired for a few months, and stakeholders were busy with strategic integrations and rebranding from Dr. E to Getaround. Additionally, the deadlines set by the regulator were approaching rapidly, while the API from Stripe was still not fixed.
00:02:16.569 All of this was stressful, but it wasn't just about the context. I wasn't sure I could successfully manage this situation. Even if I thought I could, handling payments is very sensitive for a business, and since I frequently make mistakes, I was afraid of breaking something. Unfortunately, that is exactly what happened.
00:03:04.770 During a refactoring process, we used the Stripe API with the 'authorize' and 'purchase' methods to create a charge and passed the capture option as false. We reserved the funds for later instead of taking the money from the customers' bank accounts. The confusion arose because when the capture option is set to true, we take the money immediately.
00:03:39.660 This led us to believe that we were taking the money, but in reality, we were not. As Daniel mentioned earlier, there is always a chance for mistakes, and in this case, it was very likely. I was optimistic at the time, but I was also aware that something could go wrong. Let's consider how bad this could actually get.
00:04:21.660 Our service could end up being free for all users, which sounds great, but we would be paying other users with our own money. The car owners who enlisted their cars for sharing would need to be compensated, and if we didn't have funds, we would run into serious trouble.
00:05:04.789 You would run out of cash, unable to pay employees or partners, which could lead to lawsuits against you. Given that our directors are responsible, there could even be legal consequences. Let me be clear—I don't want that on my resume. I hope I have convinced you that I was scared.
00:05:22.359 Managing stress and reducing risks can be achieved by relying on a robust delivery process. In any company, the aim is to solve problems. In our case, we needed to comply with the new regulation.
00:05:44.960 We brainstormed solutions, negotiated, and estimated the work involved. At some point, the stakeholders aligned, and we agreed on a shared understanding of what we needed to do and what we would offer our users. My role was to communicate this understanding to our users.
00:06:16.099 To facilitate communication, I followed a series of steps that I define as the delivery process. The simplest process looks like this: I edit the code locally, I help deploy it to a production server, and voilà—it is available to all our users. Many of us have operated this way in the past, and some of you might still do so.
00:06:42.720 This workflow can be effective in some contexts, such as a personal website, but for a team of our size and for a marketplace, it looks dangerous. This chart represents my stress levels during the process of moving understanding from my mind to production.
00:07:15.330 As I mentioned, I wasn't sure I could even complete this task. When the card is ready, I usually feel a bit better because the unknowns decrease, but I still experience significant stress when I send the card to the server. I have drawn a spike around the release because I often don’t know whether the changes will work or if they might break something.
00:07:45.330 No one checks what I do, which places a lot of pressure on my shoulders. Over a certain limit, stress can be painful. I can work longer hours, skip meals, lose sleep, and become rude, and there are other symptoms too.
00:08:10.600 I think the process we just went over would lead me into a stressful situation. If I am subjected to high stress for too long, my options become limited: I can either escape, fight, or give up.
00:08:24.160 Escaping might involve asking to be reassigned to another task or simply leaving the company. Fighting would require more resources like asking for additional time or help from others. It could also mean working on improving the process with my team.
00:09:03.740 Giving up, on the other hand, looks like staying silent even if I know the project is set up to fail. In the past, I have exhibited all of these responses, and I believe many of you have too. We cannot escape stress, but we must learn how to respond to it.
00:09:30.540 The sooner I become aware of my own stress levels, the more control I have over my responses. Enjoying stress is crucial; the more I embrace stress, the less energy I have to fight back.
00:09:56.280 Instead of fleeing or giving up, I can channel my energy into combating the process that feels overly stressful. It's essential to introduce changes like staging environments, which parallel our production environment.
00:10:25.350 Before deploying new code to users, we should always release it in a safe environment where stakeholders can test it. At this point, the release peak of stress almost disappears. I am less worried because I have seen that the code works well in staging.
00:10:45.170 Additionally, this manual QA process shares responsibility between the stakeholders who test the code and myself as the coder. Sharing responsibility means less stress for everyone involved.
00:11:09.060 However, this new step is to ensure that the understanding we reach among stakeholders does not lead to regression. Unfortunately, the bugs aren’t always easy to spot during QA, particularly for transactions that appear successfully processed.
00:11:22.150 I was anxious about regression in the process but not about misunderstandings. I expected a healthy profit process to lower my stress to manageable levels, allowing me to release with confidence.
00:11:52.440 With that in mind, we now have three key areas to address—I handily labeled them on the chart: stress at the top, comfort at the bottom, and growth in the middle. If I stay in the comfort zone for too long, I won’t learn or grow much; I will become complacent.
00:12:20.310 In the growth area, I feel challenged and motivated. Thus, we want to find a balance between stress, comfort, and growth. My target is 60% growth, 30% comfort, and 10% stress.
00:12:41.590 I recognize that my teams might not aim for the exact same ratios as I do. With that delivery process in place, things improved, but I did not reach the comfortable area where I could release without worry.
00:13:06.380 Some steps can reduce stress and risk more than others. In the next delivery process, let's consider adding a more effective step. In my experience, having a good automated testing suite is the best way to prevent regression.
00:13:31.590 With a well-designed test suite that covers all use cases, we can confidently release only when the tests pass. Raise your hand if you only release when the tests pass and never experience bugs in production.
00:14:02.100 Alright, so tests are not entirely foolproof. If you look at my stress levels, you’ll see they start lower because just the idea of having a test suite relieves some of the stress. I trust the test suite to catch regressions.
00:14:36.340 When I saw that the test suite returned green, I felt relieved. That confidence allowed me to proceed without hesitation. However, that is how that bug ended up in production despite a green test.
00:15:04.740 Lower test impact on me now shows how my confidence threshold moved down. As my comfort area expands, the boundaries of high-pressure situations increase and adapt based on experience and mistakes.
00:15:31.950 Through small adjustments and mindfulness, I learn faster and adapt my development process without facing the same level of stress.
00:16:00.910 Now, what if the company had gone down? Would I still be speaking here? I hope so, but I cannot be certain. What I do know is that we managed to recover, and here’s how.
00:16:28.200 Remember, the delivery process continues beyond code deployment. Once the code is live, I still ask the same question: what could go wrong now? Is everything functioning as expected? Will a potential break occur?
00:16:45.900 The implications of failure include losing customers, revenue, data, or even my job. With this knowledge, I gather feedback early to mitigate stress. Not knowing if something has gone wrong slowly raises my stress levels.
00:17:13.720 We can minimize uncertainty and limit stress by implementing proven strategies. We can reduce the impact of each release, automate feedback loops, and build infrastructures that are more observable.
00:17:38.690 Here’s a more detailed version of my release process: code reviews can catch bugs when implemented correctly. However, my pull request was too large, and the comments were overlooked. Following best practices is essential, and speeding up the process led me to overlook crucial checks.
00:18:04.790 We also implemented gradual rollouts to limit the impact of new features by making them available to a portion of our users. Because risks are reduced, stress levels do not escalate as significantly.
00:18:39.900 With continual monitoring, I can cycle through rolling out updates, assessing responses, and gradually increasing availability. For instance, I rolled out a feature to 5% of our users and closely monitored their interactions.
00:19:04.940 Despite my efforts, the bug ended up affecting the remaining 95%. I was focused on what I believed could go wrong but failed to account for the overall project integrity.
00:19:28.100 In a couple of days, I received a crash report from Stripe indicating that I could not refund money that hadn't been taken. Although you can partially refund on Stripe, this message was misleading.
00:19:58.200 Thanks to the robust dashboard of Stripe, I spotted the issue quickly. Within a few hours, I was able to deploy a fix by adding the necessary tests. While the impact on customers was minimal, I bore the brunt of the stress.
00:20:25.230 Any mistakes push me to challenge the delivery process continually. It encourages a cultural evolution to adjust my judgment and decision-making. Improving existing steps, such as adding metrics to monitoring tools or refining tests, may not seem exciting.
00:20:50.010 Nonetheless, small improvements can compound over time, leading to a more robust process. This ultimately leads to significant differences that exceed the superficial impact of good judgment or the latest tools.
00:21:12.180 So, to recap, my story of how I made a silly mistake, likely by copying and pasting code, reflects how I blindly trusted the test suite. That consistently blinded me throughout the process.
00:21:37.560 I hope you can see that a good delivery process can both save you from potential mistakes and offer opportunities for you to learn rather than detrimentally impact the company.
00:21:57.190 The delivery process serves as a safety net, fostering growth and transforming bugs into opportunities for learning and development.
00:22:16.000 Ultimately, it is the people—our teams—who transform those opportunities into meaningful improvements. Stress and mistakes often drive our growth, but stress is deeply personal.
00:22:35.580 No two people experience the same level of pressure or confidence, nor do they utilize tools the same way. Yet, this uniqueness should be viewed as a strength.
00:22:54.590 In conclusion, stress is the first feedback we receive, and it is essential to share and leverage it within the team. By embracing stress, your team can excel.
00:23:36.460 I also want to thank my team. Creating a constructive environment is not an easy task, and since I have made a lot of mistakes, it’s a place where I can learn a great deal.
00:23:50.780 Thank you to Michèle and I get for their help during the preparation of this talk. Obviously, I'd like to extend my gratitude to the rest of the organizing team of the conference.
00:24:06.000 If you have any feedback or reactions you'd like to share, please feel free to find me after this session or connect with me online.
00:24:25.520 Also, if you have questions, perhaps it's best to discuss them with your team since I may not be the best person to answer.