rubyday 2015

Deep diving: how to explore a new code base

Deep diving: how to explore a new code base

by Christophe Philemotte

In his talk "Deep diving: how to explore a new code base" at RubyDay 2015, Christophe Philemotte emphasizes the importance of efficient code exploration, which he likens to scuba diving. He points out that while writing code is essential, reading and understanding code constitutes around 85% of development activities, increasing to about 95% for newcomers. To effectively navigate a new codebase, he presents a structured approach:

  • Define a Clear Goal: Establish a specific, reasonable aim such as fixing a bug, implementing a feature, or writing documentation. This goal helps focus your exploration.

  • Create a Map: Familiarize yourself with the project's structure, including its README files and contribution guidelines. These will provide essential context and resources.

  • Choose the Right Tools: Set up your development environment with suitable tools and dependencies to facilitate navigation and interaction with the code.

  • Iterate and Learn: Read the codebase in alignment with your goal, gradually expanding your understanding as you make changes or fixes.

A significant example presented is Philemotte’s investigation into a bug in the OpenURI module, where he used the TracePoint API to track method calls effectively, allowing him to identify where the bug was located in just a couple of hours of focused work.

He emphasizes the necessity of collaboration and documentation, urging developers to ask for help when stuck and to document their findings for future contributors. The session illustrates that diving into a codebase without preparation can lead to confusion, just as diving into water without planning can pose risks.

Philemotte concludes by reasserting that successful code exploration involves setting clear objectives, employing helpful resources and tools, and engaging in continuous learning, encouraging developers to adopt this structured approach for lasting productivity.

Overall, the talk provides a strategic framework that helps developers more effectively manage the complexities of new codebases while also stressing the significance of understanding and reading code over merely writing it.

00:00:13.400 Hi everybody! Let's start with the introduction. I'm Christophe, nice to meet you.
00:00:19.740 I'm very happy to be in Turin today. Arc organized several Ruby events in Belgium, and I'm the founder of Poor Review, which is an automated code review service for Ruby.
00:00:27.119 If you have any questions, we can talk after the talk. Today, I would like to share with you my approach to exploring a new codebase, because I think it's very important to be efficient.
00:00:35.040 I suggest you dive with me, because I like scuba diving, and I see discovering a codebase like scuba diving.
00:00:40.379 Code needs to be written, but we forget that we read far more code than we write. Reading and understanding code is, by far, the most common development task.
00:00:48.420 Still, we picture ourselves typing on the keyboard, but if you pause for a moment and pay attention, you’ll realize that you're not typing code. You're reading it—likely the code you just typed, the code you typed two months ago, or even a colleague's code.
00:01:02.600 Sometimes, it's the code of a gem that you need to use, and you want to know how to implement it.
00:01:11.400 We often dream of continuous typing, but in reality, most of the time we find ourselves looking at the code, trying to understand it while thinking about what we could write.
00:01:22.740 How much do we read? We can state the following: every line of code written has been read at least once.
00:01:28.200 Based on that fact, we can do a quick estimation of the amount of reading based on the number of changes we make in a codebase.
00:01:36.480 For example, in one of my projects, I spent about 300 days to obtain 15,000 lines of code in total.
00:01:42.400 However, when I look at the number of commits I made in that repository, I found that I made about 600,000 changes.
00:01:48.660 This means that for each line of code I changed, I read it 43 times more. And that doesn't account for every commit or every change I didn't submit, nor the time I spent trying to understand what I've written before.
00:01:55.620 Still, this gives you a good estimation of the ratio between reading and writing.
00:02:04.080 The bottom line is that it is widely accepted that about 85 percent of development activities consist of reading. If you're new to a project and need to discover it, that can increase to around 95 percent.
00:02:15.560 Imagine you have a typical workday of 8 hours. That means you have barely 30 minutes to achieve anything of substance during the day. Thirty minutes is hardly enough.
00:02:26.799 If you want to achieve something within that time, you need to quickly find what you're looking for.
00:02:33.060 You need to learn how to deep dive into the project.
00:02:41.400 Once you've found what you're looking for, it's crucial to use that information to fix a bug or implement the feature you want.
00:02:47.700 I see this process a bit like scuba diving; you don't want to dive without any planning or clear destination, as that's the best way to get lost.
00:02:53.700 In scuba diving, lack of preparation can lead to severe issues, and in the case of coding, it can hinder your productivity.
00:03:01.140 From the simplest to the most complex dives, you will prepare for what you will do. That's the same with code.
00:03:08.040 If you wander in the codebase without a clear destination, you will waste time, which is the best way not to ship anything on day one.
00:03:15.060 You don’t need to read the entire codebase at once; you can do it step by step while simultaneously writing code, which is really valuable.
00:03:22.920 So for me, it's really a journey, and you need to plan it. I'll present an approach I use every time I need to discover a codebase, which is really about reading the code.
00:03:30.120 At the same time, you should strive to achieve something like fixing a bug or implementing a feature, because at the end of the day, that's what we are paid to do.
00:03:38.520 The plan is very basic: first, we will define a reasonable goal. Then we will get a good map to know where to go.
00:03:45.600 Next, we will choose the right equipment in order to dive into the codebase effectively and achieve what we want to do.
00:03:52.760 Finally, we will iterate and do it again.
00:03:58.920 First, you need a good goal—something reasonable that you can plan for and that motivates you to dive into the codebase.
00:04:06.000 This could be fixing a bug, implementing a feature, or writing some documentation—something that you really want to do.
00:04:13.200 For the next example, I will describe my experience from two years ago. I was browsing Stack Overflow and found a question about the Open URI module.
00:04:21.000 Before Ruby 2.0, if you passed nil to the option proxy when opening a new URI, it should have ignored the environment variable for the proxy.
00:04:30.000 However, in Ruby 2.0, it appeared that this was no longer the case, which surprised me since I believed it worked as expected.
00:04:38.640 The first thing I did was attempt to reproduce the problem. I wrote a small, straightforward script to open the URI for Google, passing nil to the option proxy.
00:04:47.100 It should have worked, and the environment variable HTTP_PROXY should have been ignored, but I encountered this error.
00:04:55.200 Clearly, it was broken, so for me, fixing that bug became a reasonable goal to explore the MRI codebase.
00:05:01.500 Having a goal is not enough in itself; you also need a map. Without a map, you're going nowhere because you won't know how to reach your goal.
00:05:08.640 It's essential to know where the project and codebase are located and where the introductory documentation is, such as the README file.
00:05:16.920 This documentation will inform you how to use the software, what it can do for you, how to install it, and where to find the contributors.
00:05:23.280 In my case, it was relatively simple; I found the issue tracker, which is still a Redmine one, on GitHub.
00:05:29.520 The main repository is still on Subversion. Having a map is great, but you still need to read it.
00:05:35.520 You require a legend, something that will help you understand the code.
00:05:41.760 The first thing you should look for is the contribution guidelines. This document will teach you how to contribute to the project.
00:05:49.440 It will inform you about which code conventions to follow, how to report a bug, and how to submit a patch.
00:05:56.040 If you need to write tests, that's also included in the contribution guidelines. Additionally, there is a less technical guideline called the code of conduct.
00:06:03.480 It reminds you about how to behave when contributing. Both guidelines are essential; not following them is considered rude.
00:06:09.360 If you're unsure about something, just ask. People will help.
00:06:17.730 In my case, the contribution guidelines were located in the repository. I found two important links—one to the issue tracker and another to the wiki.
00:06:26.520 I read various pages on both the issue tracker and the wiki, where I learned about the coding conventions for MRI, how to report a bug, and how to submit a patch.
00:06:33.120 Next, you need to understand the hierarchy of the directories—basically, where to find the source code, the documentation, and the tests.
00:06:40.140 You don't need to be exhaustive in this; you don't need to understand the entire hierarchy.
00:06:46.620 Just focus on what you need to find according to your goal.
00:06:54.420 For the OpenURI example, I was looking for the files containing the OpenURI module, so I searched for that.
00:07:02.520 I found the directory where the relevant code is stored, along with the tests and the documentation.
00:07:10.320 Sometimes, it can be challenging to find all this information, so don't hesitate to ask people who know.
00:07:17.520 They can guide you and provide a quick overview of the project.
00:07:24.000 For open-source software, resources like mailing lists or issue trackers can be helpful, but at work, your colleagues will be your best resource.
00:07:31.680 Just ask them during a coffee break, or maybe send an email to set up a good time to discuss.
00:07:39.960 This tip applies less to open-source projects but definitely pertains to your job.
00:07:46.680 In development meetings, take the opportunity to listen and learn.
00:07:55.020 Understand the current problems, learn how people work, and ask questions if you need help.
00:08:02.040 It’s also a good time to discuss what you can do if you're unsure.
00:08:10.680 If you find yourself stuck or blocked, don't hesitate to ask for assistance.
00:08:18.600 People are generally willing to help because eventually, you'll be contributing back.
00:08:26.040 It's important for them to help you, as you'll then be in a position to help them.
00:08:31.680 Please remember the importance of the information available to you. Someone has taken the time to create guides like contributing guidelines and README files.
00:08:39.840 It’s considerate to cultivate the habit of documenting information in return, whether it's an open-source project or not.
00:08:47.220 Next, when it comes to the codebase, reading the documentation will help you learn more about where to start in the project.
00:08:54.840 Having a good understanding will make it easier to navigate.
00:09:03.720 Of course, you'll need the right equipment to read and write code.
00:09:11.040 Make sure you have a suitable editor. I won’t spend much time on that topic.
00:09:17.520 If you're overwhelmed by too many options, stick with what works best for you—be it Emacs, Vim, or any other.
00:09:24.000 But I find that simpler editors can help you navigate your code more effectively, so choose wisely.
00:09:31.560 Also, prepare your environment before diving in; discover what dependencies you need.
00:09:37.320 Do you need a database? Any gems? For Ruby projects, especially Rails ones, you can gain a better understanding of the architecture.
00:09:44.040 If you have a Gemfile, use Bundler to get an overview of all dependencies.
00:09:51.840 For instance, in GitLab, if you run 'bundle', you will see a whole graph of dependencies.
00:09:59.520 You can zoom in on that graph to identify the different dependencies.
00:10:06.000 Sometimes, it's helpful to know what your project relies on.
00:10:13.080 Running tests is likely the best way to check whether you have the minimum requirement for development in the project.
00:10:20.520 Also, check if you need continuous integration. Does the team use any tool to verify the code, like RuboCop?
00:10:27.840 In my case, I found it fairly straightforward; for me, understanding how to build a patch for MRI and how to test it was crucial.
00:10:35.280 Running the application gives you insights into what the project does and helps you understand how it works under the hood.
00:10:42.240 However, you don't need to test every feature—only those that make sense according to your goal.
00:10:50.520 For my case, I didn't test every feature of Ruby; I only focused on the one that interested me.
00:10:58.320 I wrote a small script to verify that it was indeed buggy.
00:11:05.520 Now it’s time to read the codebase and dive deeper into it. This may seem basic, but it’s important to state.
00:11:14.640 It can serve as a checklist to approach different codebases, especially if you have the opportunity to coach juniors.
00:11:20.820 There are things that newcomers are often unaware of, and it’s essential to share these insights.
00:11:28.080 Thanks to all the information you've collected, you should now know more about the codebase, but you might still wonder what you should read.
00:11:36.840 If I take the OpenURI module as an example and list the number of lines per file, you'll find there are more than 3,000 lines.
00:11:44.880 Here, you have two options: you can read everything or focus on what matters to you. I prefer the second option.
00:11:53.520 First, you need to identify what you’re looking for. In my case, I needed to find where proxy management is implemented in the code.
00:12:00.780 I needed to understand how it interacts with other parts of the code, especially when opening an HTTP resource. This is crucial to addressing the bug.
00:12:08.220 The best way to discover what you need to read is to get deeper into the new codebase.
00:12:16.320 When a bug occurs, the first thing to do is start with the call stack.
00:12:23.520 However, remember that the cause of the bug is usually executed beforehand, so you may not find the solution just in the stack trace.
00:12:30.780 For my case, a quick look at the code revealed that it was trying to open the proxy variable, but it found no proxy and crashed.
00:12:37.920 My issue, however, lay in how the option was being managed, which wasn't included in the call stack.
00:12:45.300 A better option is to get the whole overview of your use case, and for that, I utilized the TracePoint API.
00:12:51.720 This has been available since Ruby 2.0 and is quite convenient.
00:12:59.940 I will explain its different parts.
00:13:06.360 First, you have a variable scope, which I used to keep track of different scopes in the various calls. Next, we create a TracePoint.
00:13:13.920 This will track two types of events: first, call events, which occur every time a method is called, and second, line events, which occur each time a line of code is executed.
00:13:20.520 You can pass a block to it, inside which you define how you want to handle the different events.
00:13:27.300 For call events, I printed information about which method is called and in which class or module.
00:13:32.480 As for line events, that's where I saved the variable scope of the call.
00:13:39.120 This is crucial because the line event occurs before the call event, allowing me to keep track of the scope.
00:13:46.680 In the block, you can access the line number, file, method, class, and more.
00:13:53.040 Finally, you can enable the tracing around a specific block of code.
00:14:01.200 In my case, this block would be the code I want to track. This allows me to see the code graph for the open method.
00:14:09.120 I wrapped everything into a gem called Triscals, available in RubyGems, and you can find it on GitHub. I'll share the URL at the end.
00:14:16.680 After requiring the gem, enable the tracing for the desired block and then print the information.
00:14:24.000 The result is more than just strings; you get the execution graph, allowing you to navigate through it.
00:14:32.520 This structure is more complicated than I initially showed you, and it provides a complete overview of each method and the sequence of calls.
00:14:39.120 I did this for my use case, which was opening a URL.
00:14:46.920 Looking at the call graph, you'll first see calls to open.
00:14:54.720 Reading through it, you'll realize it involves parsing the URL string to create an instance of the URL object.
00:15:01.680 You'll see a lot of 'initialize' methods here, always in the same scope (your URI).
00:15:09.840 Next, it addresses various options for the method, managing how you handle the different parameters.
00:15:16.920 You'll then see that it opens the HTTP resource—although you won't create the HTTP client at that moment.
00:15:24.720 Now, I know where to start my reading, focusing on the relevant parts of the codebase.
00:15:32.520 By identifying these key areas, I could find the bug, which turned out to be located in a single line of code.
00:15:40.680 The line, while looking somewhat generic, was about creating the HTTP client.
00:15:47.880 During this process, it typically checks the environment variable for the proxy; in this case, it didn't. Therefore, the default proxy detection was disabled.
00:15:56.160 By tracing through the code base for my use case, I learned a lot about the OpenURI library without having to read every line of code.
00:16:04.560 This offers valuable insights into what to search for next, allowing you to use commands to look for where the open method is called or to define the HTTP class.
00:16:12.120 As a side note, if you're working on a Ruby on Rails project or any project that has a Gemfile, you can use Bundler to open the source of a specific gem.
00:16:20.040 This technique can be useful, especially if your application relies heavily on a particular gem.
00:16:27.720 Another strategy for diving into a codebase is to look at the test cases.
00:16:36.240 Select a test, inspect what code was executed during that test case, and consider how to analyze that context.
00:16:43.560 Context is critical because everything you're reading needs to have meaning. You shouldn't simply read files randomly.
00:16:50.520 With context, you understand when it's called and how it reacts.
00:16:59.400 With all this knowledge, we're now able to read the codebase—focusing only on what we need.
00:17:06.840 It's now your time to modify the code and fix the bug. Fortunately, the bug was confined to just one line.
00:17:12.960 While I read about 100 lines, that’s still less than 3,000 lines over which I could increase my understanding.
00:17:19.680 This demonstrates an effective way to navigate through your codebase and dive into it within the right context.
00:17:26.640 Then it was time to report the bug and submit the patch.
00:17:32.760 Driven by this goal, I completed both actions in tandem.
00:17:38.760 This was my first bug report on MRI, and it took about two hours to navigate the MRI structure, read what I needed, fix the bug, and submit the patch.
00:17:45.600 I even answered the corresponding Stack Overflow question.
00:17:53.280 I use this method every time I need to explore a codebase or find a bug in unknown territory.
00:18:01.080 I make it a habit to gather information about the projects and utilize tools to guide me.
00:18:08.640 Avoid blindly reading code without a clear destination.
00:18:15.840 If you find yourself stuck, don’t hesitate to ask someone for help.
00:18:22.680 Doing it once makes it easier to do it a second time.
00:18:28.920 Now I’d like to discuss what makes a good goal.
00:18:35.880 How long should a goal last? Initially, keep it short—your main objective is to learn about the codebase through doing.
00:18:42.600 It’s essential that you have a time frame that keeps you motivated. I find half a day is an ideal period, with one day maximum.
00:18:49.440 Only when you know more about the codebase can you plan for more complex goals.
00:18:56.520 If you’re unsure what to do, check the issue tracker for a possible bug or a small feature you can tackle.
00:19:04.680 Revisit the contributing guidelines; contributors usually list numerous opportunities.
00:19:11.880 You can also use tools such as RuboCop or other static analysis resources to fix reported issues.
00:19:18.840 It's a great exercise; it feels like a game, allowing you to navigate through the codebase effectively.
00:19:24.840 Keep things simple. You know how to plan your journey, and now you’re free to explore.
00:19:32.520 Engage directly in the process of writing code or documenting—avoid reading code solely for the sake of it.
00:19:39.480 And if you feel blocked, just ask for help.
00:19:47.040 This is one of the most common bad habits among juniors, as they may feel intimidated or unsure.
00:19:54.600 It’s important to reassure them that you're there to assist.
00:20:01.680 To recap, if you want to dive into a new codebase, aim to define a good goal, such as fixing a small bug.
00:20:08.280 Get a map to help locate what you need, then equip yourself properly to immerse in the codebase and pursue your goal.
00:20:15.840 And remember, once you’ve done it, do it again. Thank you!