Scaling Teams using Tests for Productivity and Education

00:00:00.030 Hi, my name is Julianna Dell, and I'm a production engineer at Shopify. You might have seen my colleague Edward in the last presentation; he and I work on the same team. You can find me here under the same username on both GitHub and Twitter.

00:00:05.460 I'm going to be talking about avoiding cognitive overload using tests, error messages, and other techniques to increase productivity and educate developers in your organization. So, a bit of background first.

00:00:20.279 Shopify is a multinational commerce company with over 600,000 merchants. We have one of the largest Ruby applications in the world, with over 12 years of commits from thousands of developers. Shopify started in Canada, specifically in Ottawa, where our co-founders wanted to sell snowboards. However, they found that no commerce offer really provided the online experience they envisioned, so they built the software. It turned out that people wanted to buy the software more than they wanted to buy their snowboards. Thus, they chose a new technology at the time called Ruby on Rails, becoming one of the first Rails applications in the world—we started on Rails 0.18 and now run on 5.2. We have offices across the world, including one here in Tokyo, Japan.

00:01:03.300 Within Shopify, I work on a specific team called Developer Productivity. My job is to ensure developers are fast, can work efficiently, are aware of what's going on, and have everything out of their way so they can focus on developing. One of the things I've found is that patterns and bad error messages can significantly impact the experience for developers. I decided to talk about this because I think it is a very important topic.

00:01:39.960 I'm going to start by talking about mistakes. You know, everyone makes mistakes—you do, I do; it doesn't matter who you are or what level of developer you are. And I want to discuss the impact those mistakes can have on developers, specifically regarding cognitive overload. I will also talk about documented mistakes, such as a scenario where you might want to ban specific gems in your organization or certain non-idiomatic patterns.

00:02:03.710 I'll cover the concept of just-in-time education and messaging, explaining how you can write error messages that are actually helpful. Additionally, I'll mention some techniques and bundler plugins. To start this presentation, I assume your organization has a culture where continuous integration is essential. This means, on a pull request, you can't merge without the tests passing. If this isn't true, you may not get as much benefit from this talk, but you'll still learn something.

00:02:54.709 So, who thinks a user might make a mistake and forget a require statement? This may seem obvious, but if you've ever worked with autoload paths in Rails, you'll know that require statements can sometimes not be necessary or sometimes needed, leading to confusion. This can create significant impacts if it goes into production. Users may make mistakes such as adding a gem to the codebase where there is already a similar one. For example, adding the Faraday gem while only having HTTP Party—that's a mistake because you don’t want two different things doing the same job.

00:03:21.500 Another common mistake occurs when a user forgets to include a helper. Their code might run fine, but they could end up with a different API they can't introspect properly. These mistakes can happen frequently, but rather than focusing solely on the mistakes themselves, I want to highlight the impact these mistakes have on developers. The impact isn't solely that they produce errors; how we react to these errors can also affect developers.

00:04:06.889 If you constantly tell your developers, 'Hey, look, you can't do this or that,’ it becomes a bit overwhelming. I've mentioned cognitive overload a few times; I want to define it to ensure everyone is on the same page. Cognitive overload refers to being given too much information at once, resulting in the inability to process or remember everything efficiently—kind of like how many of us feel during these presentations.

00:04:52.380 I have a few more examples of cognitive overload from Shopify. For instance, we often see messages such as, 'don't add this gem' or 'make sure you include a space after curly braces' to ensure consistency. This might seem arbitrary, but these guidelines help everyone read code more quickly. The reality is that people need to make around 40,000 decisions a day. If there’s even a 1% mistake rate, that results in around 400 wrong decisions every day across the organization, leading to a lot of mistakes.

00:05:22.650 We need to find ways to inform people about these mistakes, helping them avoid them without overwhelming them with long lists of 'remember this' messages. That approach quickly becomes unsustainable is one of the main reasons I believe in using tests to help automate this process. It's easy to illustrate why relying on memory alone isn’t effective; after all, people forget things.

00:06:06.210 As developers, we create software to automate repetitive and tedious tasks. Therefore, why should we expect our developers to handle the burden of remembering all the details? We can automate away that pain. But we also need to address what happens if we don't automate.

00:06:18.420 If we don't, developers have to keep track of this extensive list of things they need to remember while making their decisions throughout the day. This obviously detracts from their primary focus of building software and crafting great code. And let's not forget, human beings are error-prone; mistakes are inevitable. Communication can also be a challenge. Even if you email everyone, there’s no guarantee that each person will read everything sent out.

00:06:54.540 In a large organization with thousands of developers and multiple time zones, it becomes increasingly difficult for everyone to read every message. As a result, mistakes happen, and applications break because simple requirements weren’t included, leading to cognitive overload. When an error or logic mistake is found in the codebase, many developers will reach for a unit test. However, when it comes to project or infrastructure mistakes, often developers turn to guidelines or documentation.

00:08:10.470 While documenting mistakes can be beneficial, people are unlikely to read lengthy guidelines. Thus, developers can become cognitively overloaded. However, we can make mistakes testable, allowing us to find out when issues arise and educate developers accordingly.

00:08:53.310 To fix the problems I just mentioned, we can follow a few key steps: Step one is to identify the problem—clearly define what needs fixing. Step two is to determine whether it can be automated. For instance, can certain outputs be detected or files be scanned effectively? Are we able to parse a Gemfile or lock file to identify dependencies?

00:09:14.880 Step three involves implementing tests to detect and educate developers on these issues. The remainder of this talk will focus on examples as we dive deeper into these steps.

00:09:51.420 To summarize briefly, people make mistakes—that is inevitable. Informing people about what mistakes to avoid can result in cognitive overload. Mistakes will happen again, but we can use tests to identify and automate these mistakes to facilitate education, which ultimately aids in improving productivity.

00:10:04.290 I'm going to start by talking about what I like to call documented mistakes. These are real-world examples from within the Shopify organization. One specific example is related to broken maintenance tasks within our core application. We have a concept called tasks to fix broken data, and we have tests for these maintenance tasks.

00:10:21.720 These tests assume that a task named 'my_test.rb' will have an associated file located in the same subdirectory structure under the test folder named 'my_task_test.rb'. However, sometimes developers do not name them correctly, particularly if they refactor or forget to rename associated files, causing our tooling to break.

00:11:05.310 To illustrate this, we wrote a test that checks all directories to find any files ending in 'task.rb' as well as those ending in 'task_test.rb'. Then, we ensure that there is a corresponding task for each test. If there is a discrepancy, we send an informative error message informing developers that something is wrong. It's also essential that we explain why the error occurred and suggest steps for resolution—these last two elements are crucial.

00:12:04.660 When developers receive the error message, it tells them exactly what the issue is—specifically that the maintenance task and test file names need to match for the task they’re attempting to run. Additionally, we can call attention to things such as spelling mistakes, which may otherwise go unnoticed.

00:12:49.220 Another example comes from one of our command-line applications where we've optimized for quick boot times. We need to keep the load path on boot as small as possible to ensure near-instantaneous operations. For this, we disable gems, meaning we have no Bundler load path in the application. Next, we include the required libraries and restrict boot only to those specific libraries.

00:13:36.560 By setting up our environment this way, we can also print the Ruby constant loaded features, which includes all currently required files within our load path. This allows us to test the application effectively.

00:14:18.760 By comparing the expected files against what is loaded, we ensure that everything required is included and that the array doesn’t become outdated. This simple process became particularly important as our team grew and contributions from outside our team increased.

00:14:59.480 If you need to prevent the addition of specific gems or patterns, a common occurrence in organizations involves declaring certain gems as non-idiomatic or problematic. Over time, teams might wish to consolidate on certain patterns or APIs. In such cases, we employ a concept referred to as 'lists.' With over 180 mentions in our Shopify core application, lists allow us to identify deprecated practices or avoid adding new ones.

00:15:38.380 We write tests to ignore existing patterns while preventing new violations, effectively helping to migrate to better practices over time without overwhelming developers. For instance, we’re currently translating our core application into Japanese, and we’ve created a list of all English-only content to enforce that new contributions adhere to our guidelines.

00:16:05.170 In terms of infrastructure, we recently evaluated the use of 'getgems' and found them to be non-performant, leading to support issues. Given this, we decided to ban them using a similar principles I just mentioned. By parsing the Gemfile, our tooling can detect any new 'get' based gems and promptly notify developers that they should avoid them.

00:16:50.110 Our documentation will clarify why using 'get' based gems is problematic. For example, suggesting alternatives such as packages from our private gem host while guiding users on next steps.

00:17:32.110 There are specific gems within our core code base that we've decided aren’t suitable due to security or performance issues. For instance, we’ve opted not to use RSpec because we have a sufficient replacement in MiniTest. Instead of instructing developers directly not to use RSpec, we write a test that fails when an attempt is made to add RSpec. This helps maintain tooling consistency without overwhelming them.

00:18:07.650 To recap, we can manage mistakes through tests that enforce specific idioms, patterns, and gem usage while using checklists to migrate toward improved practices. These tests help offer just-in-time education.

00:18:57.030 Unit tests are fundamental as they enable testing of logic in your code effectively. For instance, given an integer of nine, we can expect the output of three. Similarly, when inputting the square root of two, we expect approximately 1.4142.

00:19:18.900 However, what if the input were the entire codebase? What can we do with it? We can run tests and expect a set of files to be loaded after launching the application.

00:19:35.830 More specifically, we can assert that all gem sources are reliable and that no harmful gems are present within our dependencies. We can test the code itself without executing it, which is an intricately helpful approach.

00:20:06.450 Furthermore, we could establish a central documentation location for usage tracking as opposed to embedding links within error messages. We do this by scanning through all Ruby files and failing the test if any include a specific string, ensuring proper resolution steps.

00:20:45.290 For example, if a migration references a non-existent maintenance task, we can immediately alert developers about invalid references. This testing approach has dramatically reduced the breakage in our codebase.

00:21:57.550 Prior to implementing these checks, it was possible for a removed gem to be reintroduced back into the project. Now, we explicitly monitor for certain gems we’ve decided against using for various reasons—including security issues or low adoption rates.

00:23:06.030 With proper messaging per gem, developers are educated on why specific gems are not permitted. This allows for smoother operations and reduces the communication burden on higher-up developers.

00:23:25.629 By automating testing mechanics, we can delegate administrative functions to tools, allowing developers to focus on delivering value. As cognitive overload is a problem, we need contextually relevant error messages that provide actionable steps, not just a broad statement telling them something is wrong.

00:24:51.270 For instance, if a developer does not include a necessary test helper in their files, we need to ensure they are reminded of its importance without drowning them in information. We can create tests that validate the inclusion of central modules, which ensure that our testing system effectively communicates through actionable error messages.

00:26:32.020 Style guides can also play a pivotal role in enforcing consistency for developers. Tools like Rubocop serve as static code analyzers that can enforce community or organization-specific style guides, allowing for quick feedback without manual intervention. Their feedback can redirect user behavior based on errors or discrepancies in formatting.

00:27:52.920 The core idea behind just-in-time education is to offer relevant information precisely when developers need it, reducing cognitive costs. A practical approach is continuously evaluating if it is necessary to tell teams to remember certain patterns or guidelines.

00:29:08.460 If educated via tests that intelligently communicate the specific issues with clear context—like why something is an issue and how to solve it—everyone benefits. Errors will decrease, and developers can take actionable steps to solve problems rather than just being told they’re doing something wrong.

00:30:07.050 At the end of the day, we should always ask ourselves: Is it possible to build a better system of documentation and education without overwhelming our developers? Maintaining awareness of relevant practices without cognitive overload is key to fostering an efficient and effective developer team.

00:30:55.550 In closing, I appreciate your attention. Thank you for being here for my talk—I hope you found it insightful.

00:31:41.380 (Applause) Thank you for the talk! I was curious about the example with ImageMagick. Do you think all of these just-in-time education techniques would have to be specific to each gem, or is there hope for a generic error handling approach that could be more widespread?

00:32:03.580 Unfortunately, I believe they must be quite specific in nature. For instance, the implementation around ImageMagick is tailored to its idiosyncrasies. Although broader practices might be developed, addressing tech-specific issues generally remains important.

00:32:36.650 If someone were to venture into a microservices architecture, managing discrepancies across various services becomes complicated. However, contextual awareness of individual microservices can indeed streamline the onboarding and contribution process.

00:33:11.590 For example, maintaining styles and tests across services could either be centralized or managed based on each microservice's needs. But fundamentally, achieving a balance to uphold best practices while enhancing productivity is paramount.

00:33:57.190 As we wrap up, I want to extend my gratitude for your participation. Your engagement today is invaluable!

00:34:19.270 (Applause) Thank you!