Open Source
Come on in! Making yourself at home in a new codebase

Summarized using AI

Come on in! Making yourself at home in a new codebase

Mercedes Bernard • May 17, 2022 • Portland, OR

In her talk "Come on in! Making yourself at home in a new codebase" at RailsConf 2022, Mercedes Bernard addresses the common anxieties developers face when onboarding to new codebases. The primary theme revolves around strategies to facilitate a smoother entry into unfamiliar coding environments, likening the experience to settling into a new home. Below are the key points discussed in her talk:

  • Feeling at Home in Code: Mercedes emphasizes that joining a new codebase can be likened to entering a new house. Familiarizing oneself with the layout, finding essentials, and understanding the space is crucial for comfort.
  • Documentation as Orientation: The first step in immersing oneself in a codebase is to thoroughly examine the documentation. Checking the README, contribution guidelines, and diagrams serves as a map to navigate through the unfamiliar code.
  • Setting Up the Codebase: Just as one would inspect the amenities of a new place, establishing the development environment gives insight into dependencies and the overall architecture. Mercedes highlights the importance of understanding how everything fits into the larger system.
  • Iterative Learning Process: Rather than striving to learn everything at once, starting with small contributions, such as fixing bugs or implementing features, allows developers to gradually understand the codebase. Scaffolding (breaking information into manageable pieces) and spaced learning (regular practice intervals) enhance this process.
  • Version Control Exploration: Utilizing version control history and pull requests can aid in understanding past efforts and solutions, providing context and direction for new contributions.
  • Interactive Debugging as a Tool: Engaging in interactive debugging helps developers discern the functionality of their code, acting like a guided tour through the intricacies of the codebase.
  • Writing Contributions: When making changes, it is vital to document those contributions clearly for future developers, akin to leaving a welcome note for guests in a home. Clear commit messages, pull requests, and proper tests foster a sense of community and ease for future contributors.
  • Maintaining Documentation: Continuous updates to documentation and diagrams ensure that newcomers are not left in the dark, supporting ease of onboarding for future developers.

In conclusion, Mercedes stresses that by fostering a welcoming atmosphere and contributing incrementally, developers can overcome the daunting task of navigating new codebases. With patience and curiosity, we can cultivate comfortable and collaborative coding environments that benefit everyone involved.

Come on in! Making yourself at home in a new codebase
Mercedes Bernard • May 17, 2022 • Portland, OR

"Welcome! We're so excited to have you 🤗 please excuse the mess." – if a codebase could talk

When we join a new team or start a new project, we have to onboard to the codebase. Diving into code that we're unfamiliar with can be stressful or make us feel like we don't know what we're doing. And the longer the codebase has been around, the more intense those feelings can be. But there are steps we can take to understand new code and start contributing quickly. In this talk, we'll cover how to build our code comprehension skills and how to make our own code welcoming to guests in the future.

RailsConf 2022

00:00:00.900 Hello, everyone!
00:00:12.679 I'm probably going to get started just a little bit early because I don't want to run long for you all. I will talk really fast because I know we've all gotten used to watching our videos and podcasts at 1.5x during the pandemic.
00:00:17.940 So, we're just gonna hit the ground running. You can find me after if you have any questions. But welcome! My name is Mercedes Bernard, and today we're going to be talking about how to get comfy in a new codebase. So kick back, relax, and make yourself at home.
00:00:31.619 I work at Cloud City as a principal software engineer, and we're hiring! So, if anything I say sounds interesting today, reach out, connect with me on Twitter, you know, come find me. If you are someone who likes to follow along with slides, I've published them on my website. They are very low resolution since the Wi-Fi is bad, so hopefully they'll work if you need them.
00:00:54.840 Four years ago, Sarah May gave a keynote at RailsConf about livable code, and if you haven't checked it out, I really encourage you to go watch it. She talked about how writing applications is no longer like construction or architecture. We're not focused on making buildings anymore; now we're focused on making those buildings livable for ourselves and our teams.
00:01:11.700 We organize the space and the code so that we can do the things we want to do comfortably. I really liked this metaphor about thinking of a codebase like a house that you decorate and make your own. I've been a consultant for the last ten years, which means that I don't really get to claim any codebase as home. I work in up to 20 repos a year, and just last year, I pushed production code in 18 repos.
00:01:38.460 Since I don't get to claim any codebase as my home base, I have to find ways to make myself feel at home in a place that isn't really mine. It's like I'm a reverse digital nomad, always working out of my home office but in a completely different codebase each time. I don't know about you, but as much as I want to be a world traveler, living out of a suitcase stresses me out. Even though jumping into a new codebase is something I do regularly, it always gives me a little bit of anxiety. What if this is the first one that stumps me? What if I can't find my way around?
00:02:03.420 What if this house doesn't have a blow dryer or towels? This anxiety is super normal, and just like any seasoned traveler will tell you, it gets easier the more you do it. You figure out how to quickly assess a space and see that it has what you need. You learn what bits are most important and which ones actually aren't, or at least not right away. So, on your first morning of your visit, you need to find the towels and figure out how to work the shower, but you don't need to worry about learning the fancy sound system or going through the closets to find all the cleaning supplies just yet.
00:02:43.200 If I'm going to show you how to get comfy in a brand new space, it should be a brand new space for me too. For this talk, I decided to make a contribution to an open-source repo that I've never looked at, but we all use all the time—rubygems.org. So, shout out to Ruby Central, who maintains RubyGems and who is also putting on this wonderful conference for us.
00:03:00.379 Before you can contribute to a new codebase, you have to get a high-level understanding of what's there and where things are. So, when you're staying in an Airbnb, the first thing you do is walk through the house, right? Or, if you're visiting a friend for the first time, they might give you a tour or, at the very least, they'll show you where you're sleeping and where the bathroom is.
00:03:17.399 Documentation is the place where you start your tour. I know not every codebase has the best documentation, but there's almost always something to get you started. Look at the README and follow the links that you find there. Look for contributing guides, listed dependencies, links to staging and production versions of the app, design diagrams—anything and everything that you can find.
00:03:38.100 The README for rubygems.org is pretty slim, but I learned a lot of information from it. The main helpful parts were under the contributions header, where they linked a bunch of different documents including their contribution guidelines. In those guidelines, there's this really helpful entity relationship diagram. When I find diagrams in documentation, I get super excited—it’s like the host left a tray of cookies to greet me! Diagrams are a really simple way to show a lot of information all at once.
00:04:17.400 For example, ERDs show model relationships, sequence diagrams can show you data flow, and a simple flowchart can show you expected app behavior. In this diagram, I can see that the database is pretty simple; there aren't too many models for me to keep straight, and I can start to get a sense of what I’m going to find in the code. For example, right there in the middle, you can see that users and webhooks are related, meaning someone can probably set up events for themselves to subscribe to be notified when something happens.
00:04:42.600 Other places that I like to look for useful information when I'm jumping into new code are Slack threads and project management tickets. Both of those could turn into giant black holes of information because you could just keep scrolling, so it's important to timebox yourself. I typically only look back a sprint or two to figure out what the team has been focusing their time and energy on, and how that fits in with the repo. Any longer than a sprint or two, and you risk wasting valuable time on information that's already outdated.
00:05:09.300 The rubygems.org contribution guidelines provided everything that I needed to know about how to get the project set up. After I read the README and all the linked documents, I typically find that's where I usually go next. I learn a ton about a project by trying to get all the individual pieces working, so I think of this a little bit like walking through the house and finding the things you’re going to need every single day—silverware, check; remote, check; correct Ruby version, yes; database, got it!
00:05:52.620 As you're setting up the repo, pay attention to different services that you're going to need. What database is the code using? Do they have a cache layer? What is it? What web server are they using? Are there any other services that this repo is dependent on? How is the app deployed? All of this information is helpful as you form a mental model of this codebase and how it fits into the wider system.
00:06:33.600 If the project is containerized, looking at the container config is a really good place to start to find this info. But if it’s not, the act of setting the project up will probably require you to set up each of these dependencies separately, so you’ll be aware of them by the time you get the whole app running. During project setup, you’ll also find out how Dev-friendly the codebase is. How much help do you need from other team members just to get the app running? Can you do it by yourself in under an hour?
00:07:09.420 If it takes three days and three different people to help you get unstuck, you should probably start to prepare for the mess waiting for you behind the door. After getting the project set up, I find it hard to absorb much more about the code in the abstract. I think about it like this: if somebody's trying to describe a floor plan to you, they often ask you to picture it. I once had a teeny tiny studio apartment where you walk in and there's a hallway with a bathroom on one side and a closet on the other.
00:07:57.960 Then, there's the main living space right ahead, and it wraps around to a kitchen in the corner with space for a dining room table. No matter how simple I tried to explain that, I bet half of you pictured it wrong. Trying to learn about a codebase by only reading it and its docs is kind of like that, so once I have the project running, I actually pick up a ticket or an issue and start working on understanding it and the changes that it requires.
00:08:22.320 This is the main takeaway of the talk—I'm not hiding it or burying it—so you don't need to learn the whole codebase to make a valuable contribution. By tackling bite-sized issues, ticket by ticket, you'll learn more quickly than if you feel pressured to understand the whole system before you get started. Learning this way takes advantage of both scaffolding and spaced learning.
00:09:07.320 Scaffolding is breaking up learning into chunks and providing a tool or structure with each chunk. In this case, that could be a specific bug to resolve or a small feature to implement. Spaced learning refers to practicing at regular intervals, which is more effective than practicing all at once. There are multiple psychological theories that back up the efficacy of learning this way.
00:09:41.040 In true scaffolding fashion, I looked for a good first issue, and in this case, there was only one, so my choice was very easy. We're going to be working on this issue, which reported a bug where sometimes when you tried to add an owner to a gem, it would silently fail, and the owner wouldn't be added.
00:10:02.520 To start working on a ticket, we get to use one of my favorite strategies for learning more about the relevant code, which is digging into version control. It's a little bit of code archeology. The version control history, the commit messages, and past pull requests tell you a lot about what's been tried before, what's worked, what hasn't, why changes were made, and who made them.
00:10:33.420 This is all valuable information when you're trying to work with code that you've never seen before, and it can save you a lot of time from trying the wrong solution. It can help you understand why part of the code is confusing and doesn't make sense. For instance, someone had opened a PR for this exact issue two years ago, but they'd closed it without merging it. Their solution had been to change `create` to `create!` so that if saving failed, it would raise an error.
00:11:14.700 In the conversation in the PR, one of the members suggested a slight change to check the return value of the save and display errors. However, overall, the solution and, more helpfully for me, the location of the solution was right. But the PR was closed due to lack of activity. Now I know this could be part of what I need to get this issue closed for good.
00:11:44.700 Before I start making changes, I like to make sure I understand what the code is doing—not all the code, but just the bits that I'm concerned with for the ticket that I'm working on. I would love to know how the whole codebase works, but there's no way I can hold all that context in my head, especially at the beginning. So I'm starting small and intentionally scaffolding and spacing my learning.
00:12:12.480 I trace the functionality I'm working with through the call stack to see what methods get invoked and what they do. This seems really simple, but as many of you know, reading someone else's code can be extremely challenging. To get started, you need to find your entry point into the control flow. I tend to like to work from the front end down to the database, finding where a user interacts with a feature and then following the code that direction.
00:12:44.760 However, in some cases, starting at the database and learning about the models and the associations and then climbing your way to the front end might be a little better. So let's start by looking at our model layer. This is the ownership model, and the important bits right now are the associations and the validations. We can ignore any custom class or instance methods until we encounter them while tracing the code.
00:13:14.520 It looks like here we have a few associations, and we have this one interesting validation that says we can only have a single ownership per RubyGems user combo. I wasn't familiar with how ownerships get added, so to start tracing the code, my entry point was finding all the instances where we were creating new ownerships, because I want to ensure that none of them silently fail anymore.
00:13:49.860 You'll see I'm not using any fancy IDE extensions; I'm just using a vanilla code search. To find where this is happening, I leaned on my ActiveRecord knowledge. Tapping into prior knowledge is a scaffolding technique that if anybody here has been a teacher in a prior life, you probably understand it helps with retaining information by building connections of what you're learning to what you already know.
00:14:23.820 First, I tried to find where we’re initializing new ownerships, and apparently, nowhere. So then I thought, okay, well, what about not initializing but creating? This was really interesting because I saw that there’s a custom class method on the ownership model that’s invoked in two places, and I'm going to write those down to go look at them when I start investigating what these do.
00:14:38.460 In that method, it looks like it’s creating ownerships from the RubyGems association. This is also interesting because that’s not what I originally searched for, so I wonder if anywhere else in the codebase follows that pattern. But no, it looks like `ownerships.create` is only in that one ownership class method. If I go back to thinking about using that pattern and searching for `initialize`, I can see that we have `ownerships.new` being used in a couple of different controllers.
00:15:09.240 So, I'm going to note those two places down as well. Right now, I have four places to look at where ownerships are being created and saved to the database. Notice that I still don’t know a ton about these four use cases; neither do you. I couldn’t tell you when they get called or from where, and that’s totally okay. I’m going to keep my focus on the small task I’m trying to complete and not worry about understanding all of it.
00:15:54.960 Let’s take a look at the controllers first. This is where we have the `ownerships.new`, and this is where the changes in that closed PR that we looked at earlier were located. This code must have been updated since the issue was reported because in both of these controllers, the value of save is checked, just like the suggestion had said. So, if the ownership is invalid and it's not saved, then an error is reported to the user.
00:16:40.680 Neither of the controllers seem to still be causing problems, which means we've narrowed down the possible remaining culprits to just the places where that custom class method `ownership.create_confirmed` is called. My absolute number one favorite way to explore and understand new code is interactive debugging. I learn more from this than even the version control history tells me.
00:17:22.920 I'm not a debugging purist, so I love print debugging—I use my `puts` statements whenever the mood strikes. However, I find that the control that two-way communication and interactive debugging provides me helps me learn a lot and very quickly. You can think of interactive debugging like opening all the drawers and closets to take an inventory of what you have available to you.
00:18:10.920 So in this case, I wanted to know what this `create_confirmed` class method was doing and how it handled non-valid non-persisted models. I threw a couple of print statements in the method, set the return value of `confirm!` to a variable so I could inspect it, and then I cracked open the Rails console.
00:18:51.000 Here, you'll see the first thing I did was show that there are no existing ownerships in my database. Then I initialized some variables that I'm going to need to pass as parameters to the `create_confirmed` method. Thus far, this doesn't do anything that interesting, but once I run `create_confirmed`, the code's going to pause execution at my first breakpoint.
00:19:26.280 Here, we see that the breakpoint is after the `create` method and we've created an ownership. When I look at the value of it, it looks good to me; there's an ID, so I know it's persisted, and I can see all of its object properties. Now, I want to step into that `confirm!` method and see what's happening. We can see that in that method, it just calls the vanilla `update` if the model hasn’t been confirmed yet, and then it returns the value of the update.
00:20:03.120 I’m really glad that I stepped into this method because I would have assumed that `confirm!` ran `update!` and raised an error if the model was invalid, but it doesn’t. Instead, there’s no error raised; it’ll just return false. When I go to the next line, I return back to the `create_confirmed` method, where I can check the value of success, which is true, and that’s what would be returned in a successful use case.
00:20:37.440 You can see that we have one ownership in the database after running through this debugging scenario. Now I want to look at the behavior of an unsuccessful case. I'm going to do the exact same thing and try to create a duplicate confirmed ownership for the same user and Ruby gem because remember, we have that validation that this shouldn’t be allowed.
00:21:21.960 When we look at the value of the created ownership, you'll see that it doesn't have an ID, so it wasn't persisted to the database. Because I already know what the `confirm!` method does, I'm just going to next over it and verify that this method would return false, which it would—status is false. I'm also going to verify that my assumption that the ownership is invalid is indeed correct, and I can look at the errors which match what we’d expect. You can see 'user has already been taken' so it's violating that uniqueness constraint we saw earlier.
00:22:06.060 Because this `create_confirmed` method just returns the value from `confirm!`, when an ownership can't be created, it’s going to return false. So where are we using this `create_confirmed` method? We're using it in the `ownership_request` model when we are approving an ownership request, and we're also using it in another small method in the `rubygem` model.
00:22:44.460 This tiny method is only called from one location in the code, and we’re going to look at that in a minute. But all of the code we just looked at and debugged is really straightforward; it’s not doing anything complex. However, it did take us a minute to get through a couple of layers of methods, and we found that interesting convention violation, where the `confirm!` method called the regular `update` and not `update!` as most Rails developers would expect.
00:23:33.960 No matter how senior we get, whenever we move into a new codebase, we always have to start tracing the code somewhere and checking our assumptions. Tracing the code you need for one bug or feature at a time is a great way to learn the whole codebase over time. You’ll get an awareness of all the important pieces of the code and the pieces that change frequently. In this way, you're learning as needed, and you won’t waste any of your time on parts that don't matter or that very rarely change.
00:24:12.660 At this point, I’m feeling sufficiently welcome in the codebase. I hope you all are feeling just as welcome! I now have an understanding of the bits of code that I’m going to be working on to resolve this first issue. So now I can settle in and start to make the place feel a little more like my own.
00:24:47.760 At this point, I know where ownerships are getting created. This is our original four locations that we talked about, and we’ve ruled out silent failures in the two controllers. These are now checking the value of save, doing exactly what that suggestion said, so we're good. We verified in our debugging that trying to update and confirm an invalid ownership will return false. So it seems like approving an invalid ownership from the ownership request model will behave the way we expect it to.
00:25:41.640 It looks like there’s only one place left where create could silently fail, so let’s focus there and try to fix that. When we’re making changes to a new codebase, just start by mimicking existing examples. This code comes from the `Pusher` class, and we can see that it invokes that tiny Ruby gem `create_ownership` method, which calls the custom class method we debugged.
00:26:22.500 This code is doing a few other things, and it rescues exceptions, so I want to be careful with my code changes so that if creating an ownership fails, I don't leave open the possibility of saving a broken state. Using an active record transaction feels like a good idea to me, but I also want to ensure that I’m matching current codebase conventions.
00:26:55.920 I looked in the code and found a few examples where they open transactions from the relevant ActiveRecord model, so I feel like I can go ahead and use this strategy for wrapping this create logic. When I was working on this, I actually couldn’t remember how ActiveRecord transactions handle different error types. Honestly, I never do—which brings me to the next strategy for settling in: documentation. Don’t be afraid to look up documentation.
00:27:35.640 Look it up for ActiveRecord classes, any other dependencies, external libraries, browser behavior, database specs, CI—whatever you could possibly need. Because even when you’ve been coding for decades, we never remember everything, and answers are only a Google search away. I looked up how ActiveRecord transactions handle exceptions.
00:28:15.420 All exceptions are re-raised except ActiveRecord rollback exceptions, which I always forget. So, I'm going to wrap this code in a transaction. I’ll keep the rescue block but then raise a rollback exception if creating an ownership fails. This way, the code's going to roll back any database operations in the event of an exception, but it will still return false as expected.
00:29:07.800 We just added one extra false case so that if an ownership is not created, it'll also return false. I chose to explicitly raise an exception right here so that it would be clear to future contributors what my intention was, but I also could have chosen to refactor that create ownership method and do the exception handling in there. It’s really just personal preference.
00:29:51.240 While you're making a change, if you get any errors while running your code, be sure to slow down and read what your errors are telling you. This seems really obvious, but honestly, it's one of the tips I give most frequently when I’m pairing with someone new, regardless of how many years of experience they have. We have a tendency to want to go fast and be productive, especially when we're trying to prove that we're competent in unfamiliar code.
00:30:30.960 So, we tend to make a lot of assumptions about why our code is broken; we're quick to assume that we screwed up, which makes us forget to read the error message and look at what the computer is telling us. If you want to speed up your coding skills, this one small strategy—to slow down—will actually make you go significantly faster in the long run.
00:31:16.240 I've made my change, left my mark on the place, and now I share in the responsibility to make this codebase welcoming to future guests. It’s my turn to clean up and maybe put out some cookies to make the next guest feel comfy. Commit messages and PR descriptions are going to help future guests and contributors when they need to do their own code archeology.
00:31:53.640 So be sure to leave enough info in the history that someone following you six months from now, which could be you, understands what you did and why. In a commit message, you don’t have to limit yourself to the 50-character summary; you can also include a description with more in-depth info about what you did. This is particularly helpful if you need more than what you can describe in just the little message.
00:32:41.160 For instance, if you're squashing a bunch of commits, really put as much as you can from those commits into the description. Or if you want to include a rationale for a decision in the description for future contributors, include that as well. In your PR descriptions, instead of just describing like what change you made, be sure to include context or why you made the changes.
00:33:16.080 Link to things liberally—link to documentation, link to code samples, GitHub issues, Stack Overflow threads, anything that helped you find the answer, because you never know when that’s going to be helpful to someone later who is going to have to go digging through the history.
00:33:59.400 Then write valuable tests that describe expected behavior, rather than just testing that your code didn't raise an error or only testing happy paths. Be sure to describe and test edge cases. When I opened a PR for this change to prevent silent failures, I got a bunch of failing tests, like this one, which is a controller test to create a new version of an existing Ruby gem.
00:34:36.060 It might be a little hard to see, but the key part of the description is 'with confirmed ownership should respond with success.' The only time creating an ownership is invalid is if one already exists, but this test was added a year and a half ago after that original issue was opened to test that this case silently fails while still returning success.
00:35:16.020 If I had looked at these tests before I started coding, I might have caught this. Valuable tests will not only help prevent future guests like me from breaking expected functionality, but they also serve as documentation for contributors to refer to.
00:36:02.760 So, I ended up closing my PR and letting the maintainers know that the original issue we looked at was outdated and no longer needed. Even though the code that I wrote for this issue didn't get merged, I'm not bummed! I still helped close the issue, and as I got comfy in the code, I actually opened up a couple of other PRs with small things I noticed, like strong parameters and other things that were missing.
00:36:43.200 Finally, good hosts create documentation and keep it up to date. I hope you're noticing a pattern: documentation is really, really helpful! I’m trying not to belabor the point, but this includes tests, code comments, READMEs, diagrams—everything. It doesn’t have to be long-form written docs, but make sure that if you see something is out of date, you do your part and update it.
00:37:12.600 When I was working on this issue, I noticed that the ERD we talked about earlier was out of date and missing some key models, like that ownership request that we saw. I opened a PR to update the ERD. The diagram is just an SVG inside the repo, and I didn't know how to update it, so I went digging through git history.
00:37:53.760 Big shout out to Carrie, who left the perfect commit message so that I would know how to update this two years later! After opening that PR, a contributor gave me a great suggestion to turn it into a GitHub action and make checking that the ERD is up to date part of CI since it’s so easy for new contributors to miss that step.
00:38:35.760 This kicked off this talk all over again because then I had to go make a contribution to Rails ERD so that we could have consistent output from that gem in order to have the consistent output for our GitHub action. Whether you're visiting a codebase for just a short period of time or moving in for a bit longer, you don't need to know it inside and out in order to make valuable contributions.
00:39:31.680 By making small contributions right away, you'll get comfy bit by bit, and you'll end up learning how all of the parts of the system fit together faster than trying to learn it all at once. You don’t need to know where all the pots and pans are or how the stove works and then go out and buy groceries on your first day. You can microwave some soup while you look around the kitchen and get your bearings.
00:40:00.600 Like I mentioned earlier, I work for Cloud City, a certified B Corp that works with socially responsible clients. So if that sounds interesting, they would love if you would come talk to me! Reach out to me on Twitter or wherever you find me. Thank you so much for coming!
Explore all talks recorded at RailsConf 2022
+68