00:00:13.400
Hi everybody! Let's start with the introduction. I'm Christophe, nice to meet you.
00:00:19.740
I'm very happy to be in Turin today. Arc organized several Ruby events in Belgium, and I'm the founder of Poor Review, which is an automated code review service for Ruby.
00:00:27.119
If you have any questions, we can talk after the talk. Today, I would like to share with you my approach to exploring a new codebase, because I think it's very important to be efficient.
00:00:35.040
I suggest you dive with me, because I like scuba diving, and I see discovering a codebase like scuba diving.
00:00:40.379
Code needs to be written, but we forget that we read far more code than we write. Reading and understanding code is, by far, the most common development task.
00:00:48.420
Still, we picture ourselves typing on the keyboard, but if you pause for a moment and pay attention, you’ll realize that you're not typing code. You're reading it—likely the code you just typed, the code you typed two months ago, or even a colleague's code.
00:01:02.600
Sometimes, it's the code of a gem that you need to use, and you want to know how to implement it.
00:01:11.400
We often dream of continuous typing, but in reality, most of the time we find ourselves looking at the code, trying to understand it while thinking about what we could write.
00:01:22.740
How much do we read? We can state the following: every line of code written has been read at least once.
00:01:28.200
Based on that fact, we can do a quick estimation of the amount of reading based on the number of changes we make in a codebase.
00:01:36.480
For example, in one of my projects, I spent about 300 days to obtain 15,000 lines of code in total.
00:01:42.400
However, when I look at the number of commits I made in that repository, I found that I made about 600,000 changes.
00:01:48.660
This means that for each line of code I changed, I read it 43 times more. And that doesn't account for every commit or every change I didn't submit, nor the time I spent trying to understand what I've written before.
00:01:55.620
Still, this gives you a good estimation of the ratio between reading and writing.
00:02:04.080
The bottom line is that it is widely accepted that about 85 percent of development activities consist of reading. If you're new to a project and need to discover it, that can increase to around 95 percent.
00:02:15.560
Imagine you have a typical workday of 8 hours. That means you have barely 30 minutes to achieve anything of substance during the day. Thirty minutes is hardly enough.
00:02:26.799
If you want to achieve something within that time, you need to quickly find what you're looking for.
00:02:33.060
You need to learn how to deep dive into the project.
00:02:41.400
Once you've found what you're looking for, it's crucial to use that information to fix a bug or implement the feature you want.
00:02:47.700
I see this process a bit like scuba diving; you don't want to dive without any planning or clear destination, as that's the best way to get lost.
00:02:53.700
In scuba diving, lack of preparation can lead to severe issues, and in the case of coding, it can hinder your productivity.
00:03:01.140
From the simplest to the most complex dives, you will prepare for what you will do. That's the same with code.
00:03:08.040
If you wander in the codebase without a clear destination, you will waste time, which is the best way not to ship anything on day one.
00:03:15.060
You don’t need to read the entire codebase at once; you can do it step by step while simultaneously writing code, which is really valuable.
00:03:22.920
So for me, it's really a journey, and you need to plan it. I'll present an approach I use every time I need to discover a codebase, which is really about reading the code.
00:03:30.120
At the same time, you should strive to achieve something like fixing a bug or implementing a feature, because at the end of the day, that's what we are paid to do.
00:03:38.520
The plan is very basic: first, we will define a reasonable goal. Then we will get a good map to know where to go.
00:03:45.600
Next, we will choose the right equipment in order to dive into the codebase effectively and achieve what we want to do.
00:03:52.760
Finally, we will iterate and do it again.
00:03:58.920
First, you need a good goal—something reasonable that you can plan for and that motivates you to dive into the codebase.
00:04:06.000
This could be fixing a bug, implementing a feature, or writing some documentation—something that you really want to do.
00:04:13.200
For the next example, I will describe my experience from two years ago. I was browsing Stack Overflow and found a question about the Open URI module.
00:04:21.000
Before Ruby 2.0, if you passed nil to the option proxy when opening a new URI, it should have ignored the environment variable for the proxy.
00:04:30.000
However, in Ruby 2.0, it appeared that this was no longer the case, which surprised me since I believed it worked as expected.
00:04:38.640
The first thing I did was attempt to reproduce the problem. I wrote a small, straightforward script to open the URI for Google, passing nil to the option proxy.
00:04:47.100
It should have worked, and the environment variable HTTP_PROXY should have been ignored, but I encountered this error.
00:04:55.200
Clearly, it was broken, so for me, fixing that bug became a reasonable goal to explore the MRI codebase.
00:05:01.500
Having a goal is not enough in itself; you also need a map. Without a map, you're going nowhere because you won't know how to reach your goal.
00:05:08.640
It's essential to know where the project and codebase are located and where the introductory documentation is, such as the README file.
00:05:16.920
This documentation will inform you how to use the software, what it can do for you, how to install it, and where to find the contributors.
00:05:23.280
In my case, it was relatively simple; I found the issue tracker, which is still a Redmine one, on GitHub.
00:05:29.520
The main repository is still on Subversion. Having a map is great, but you still need to read it.
00:05:35.520
You require a legend, something that will help you understand the code.
00:05:41.760
The first thing you should look for is the contribution guidelines. This document will teach you how to contribute to the project.
00:05:49.440
It will inform you about which code conventions to follow, how to report a bug, and how to submit a patch.
00:05:56.040
If you need to write tests, that's also included in the contribution guidelines. Additionally, there is a less technical guideline called the code of conduct.
00:06:03.480
It reminds you about how to behave when contributing. Both guidelines are essential; not following them is considered rude.
00:06:09.360
If you're unsure about something, just ask. People will help.
00:06:17.730
In my case, the contribution guidelines were located in the repository. I found two important links—one to the issue tracker and another to the wiki.
00:06:26.520
I read various pages on both the issue tracker and the wiki, where I learned about the coding conventions for MRI, how to report a bug, and how to submit a patch.
00:06:33.120
Next, you need to understand the hierarchy of the directories—basically, where to find the source code, the documentation, and the tests.
00:06:40.140
You don't need to be exhaustive in this; you don't need to understand the entire hierarchy.
00:06:46.620
Just focus on what you need to find according to your goal.
00:06:54.420
For the OpenURI example, I was looking for the files containing the OpenURI module, so I searched for that.
00:07:02.520
I found the directory where the relevant code is stored, along with the tests and the documentation.
00:07:10.320
Sometimes, it can be challenging to find all this information, so don't hesitate to ask people who know.
00:07:17.520
They can guide you and provide a quick overview of the project.
00:07:24.000
For open-source software, resources like mailing lists or issue trackers can be helpful, but at work, your colleagues will be your best resource.
00:07:31.680
Just ask them during a coffee break, or maybe send an email to set up a good time to discuss.
00:07:39.960
This tip applies less to open-source projects but definitely pertains to your job.
00:07:46.680
In development meetings, take the opportunity to listen and learn.
00:07:55.020
Understand the current problems, learn how people work, and ask questions if you need help.
00:08:02.040
It’s also a good time to discuss what you can do if you're unsure.
00:08:10.680
If you find yourself stuck or blocked, don't hesitate to ask for assistance.
00:08:18.600
People are generally willing to help because eventually, you'll be contributing back.
00:08:26.040
It's important for them to help you, as you'll then be in a position to help them.
00:08:31.680
Please remember the importance of the information available to you. Someone has taken the time to create guides like contributing guidelines and README files.
00:08:39.840
It’s considerate to cultivate the habit of documenting information in return, whether it's an open-source project or not.
00:08:47.220
Next, when it comes to the codebase, reading the documentation will help you learn more about where to start in the project.
00:08:54.840
Having a good understanding will make it easier to navigate.
00:09:03.720
Of course, you'll need the right equipment to read and write code.
00:09:11.040
Make sure you have a suitable editor. I won’t spend much time on that topic.
00:09:17.520
If you're overwhelmed by too many options, stick with what works best for you—be it Emacs, Vim, or any other.
00:09:24.000
But I find that simpler editors can help you navigate your code more effectively, so choose wisely.
00:09:31.560
Also, prepare your environment before diving in; discover what dependencies you need.
00:09:37.320
Do you need a database? Any gems? For Ruby projects, especially Rails ones, you can gain a better understanding of the architecture.
00:09:44.040
If you have a Gemfile, use Bundler to get an overview of all dependencies.
00:09:51.840
For instance, in GitLab, if you run 'bundle', you will see a whole graph of dependencies.
00:09:59.520
You can zoom in on that graph to identify the different dependencies.
00:10:06.000
Sometimes, it's helpful to know what your project relies on.
00:10:13.080
Running tests is likely the best way to check whether you have the minimum requirement for development in the project.
00:10:20.520
Also, check if you need continuous integration. Does the team use any tool to verify the code, like RuboCop?
00:10:27.840
In my case, I found it fairly straightforward; for me, understanding how to build a patch for MRI and how to test it was crucial.
00:10:35.280
Running the application gives you insights into what the project does and helps you understand how it works under the hood.
00:10:42.240
However, you don't need to test every feature—only those that make sense according to your goal.
00:10:50.520
For my case, I didn't test every feature of Ruby; I only focused on the one that interested me.
00:10:58.320
I wrote a small script to verify that it was indeed buggy.
00:11:05.520
Now it’s time to read the codebase and dive deeper into it. This may seem basic, but it’s important to state.
00:11:14.640
It can serve as a checklist to approach different codebases, especially if you have the opportunity to coach juniors.
00:11:20.820
There are things that newcomers are often unaware of, and it’s essential to share these insights.
00:11:28.080
Thanks to all the information you've collected, you should now know more about the codebase, but you might still wonder what you should read.
00:11:36.840
If I take the OpenURI module as an example and list the number of lines per file, you'll find there are more than 3,000 lines.
00:11:44.880
Here, you have two options: you can read everything or focus on what matters to you. I prefer the second option.
00:11:53.520
First, you need to identify what you’re looking for. In my case, I needed to find where proxy management is implemented in the code.
00:12:00.780
I needed to understand how it interacts with other parts of the code, especially when opening an HTTP resource. This is crucial to addressing the bug.
00:12:08.220
The best way to discover what you need to read is to get deeper into the new codebase.
00:12:16.320
When a bug occurs, the first thing to do is start with the call stack.
00:12:23.520
However, remember that the cause of the bug is usually executed beforehand, so you may not find the solution just in the stack trace.
00:12:30.780
For my case, a quick look at the code revealed that it was trying to open the proxy variable, but it found no proxy and crashed.
00:12:37.920
My issue, however, lay in how the option was being managed, which wasn't included in the call stack.
00:12:45.300
A better option is to get the whole overview of your use case, and for that, I utilized the TracePoint API.
00:12:51.720
This has been available since Ruby 2.0 and is quite convenient.
00:12:59.940
I will explain its different parts.
00:13:06.360
First, you have a variable scope, which I used to keep track of different scopes in the various calls. Next, we create a TracePoint.
00:13:13.920
This will track two types of events: first, call events, which occur every time a method is called, and second, line events, which occur each time a line of code is executed.
00:13:20.520
You can pass a block to it, inside which you define how you want to handle the different events.
00:13:27.300
For call events, I printed information about which method is called and in which class or module.
00:13:32.480
As for line events, that's where I saved the variable scope of the call.
00:13:39.120
This is crucial because the line event occurs before the call event, allowing me to keep track of the scope.
00:13:46.680
In the block, you can access the line number, file, method, class, and more.
00:13:53.040
Finally, you can enable the tracing around a specific block of code.
00:14:01.200
In my case, this block would be the code I want to track. This allows me to see the code graph for the open method.
00:14:09.120
I wrapped everything into a gem called Triscals, available in RubyGems, and you can find it on GitHub. I'll share the URL at the end.
00:14:16.680
After requiring the gem, enable the tracing for the desired block and then print the information.
00:14:24.000
The result is more than just strings; you get the execution graph, allowing you to navigate through it.
00:14:32.520
This structure is more complicated than I initially showed you, and it provides a complete overview of each method and the sequence of calls.
00:14:39.120
I did this for my use case, which was opening a URL.
00:14:46.920
Looking at the call graph, you'll first see calls to open.
00:14:54.720
Reading through it, you'll realize it involves parsing the URL string to create an instance of the URL object.
00:15:01.680
You'll see a lot of 'initialize' methods here, always in the same scope (your URI).
00:15:09.840
Next, it addresses various options for the method, managing how you handle the different parameters.
00:15:16.920
You'll then see that it opens the HTTP resource—although you won't create the HTTP client at that moment.
00:15:24.720
Now, I know where to start my reading, focusing on the relevant parts of the codebase.
00:15:32.520
By identifying these key areas, I could find the bug, which turned out to be located in a single line of code.
00:15:40.680
The line, while looking somewhat generic, was about creating the HTTP client.
00:15:47.880
During this process, it typically checks the environment variable for the proxy; in this case, it didn't. Therefore, the default proxy detection was disabled.
00:15:56.160
By tracing through the code base for my use case, I learned a lot about the OpenURI library without having to read every line of code.
00:16:04.560
This offers valuable insights into what to search for next, allowing you to use commands to look for where the open method is called or to define the HTTP class.
00:16:12.120
As a side note, if you're working on a Ruby on Rails project or any project that has a Gemfile, you can use Bundler to open the source of a specific gem.
00:16:20.040
This technique can be useful, especially if your application relies heavily on a particular gem.
00:16:27.720
Another strategy for diving into a codebase is to look at the test cases.
00:16:36.240
Select a test, inspect what code was executed during that test case, and consider how to analyze that context.
00:16:43.560
Context is critical because everything you're reading needs to have meaning. You shouldn't simply read files randomly.
00:16:50.520
With context, you understand when it's called and how it reacts.
00:16:59.400
With all this knowledge, we're now able to read the codebase—focusing only on what we need.
00:17:06.840
It's now your time to modify the code and fix the bug. Fortunately, the bug was confined to just one line.
00:17:12.960
While I read about 100 lines, that’s still less than 3,000 lines over which I could increase my understanding.
00:17:19.680
This demonstrates an effective way to navigate through your codebase and dive into it within the right context.
00:17:26.640
Then it was time to report the bug and submit the patch.
00:17:32.760
Driven by this goal, I completed both actions in tandem.
00:17:38.760
This was my first bug report on MRI, and it took about two hours to navigate the MRI structure, read what I needed, fix the bug, and submit the patch.
00:17:45.600
I even answered the corresponding Stack Overflow question.
00:17:53.280
I use this method every time I need to explore a codebase or find a bug in unknown territory.
00:18:01.080
I make it a habit to gather information about the projects and utilize tools to guide me.
00:18:08.640
Avoid blindly reading code without a clear destination.
00:18:15.840
If you find yourself stuck, don’t hesitate to ask someone for help.
00:18:22.680
Doing it once makes it easier to do it a second time.
00:18:28.920
Now I’d like to discuss what makes a good goal.
00:18:35.880
How long should a goal last? Initially, keep it short—your main objective is to learn about the codebase through doing.
00:18:42.600
It’s essential that you have a time frame that keeps you motivated. I find half a day is an ideal period, with one day maximum.
00:18:49.440
Only when you know more about the codebase can you plan for more complex goals.
00:18:56.520
If you’re unsure what to do, check the issue tracker for a possible bug or a small feature you can tackle.
00:19:04.680
Revisit the contributing guidelines; contributors usually list numerous opportunities.
00:19:11.880
You can also use tools such as RuboCop or other static analysis resources to fix reported issues.
00:19:18.840
It's a great exercise; it feels like a game, allowing you to navigate through the codebase effectively.
00:19:24.840
Keep things simple. You know how to plan your journey, and now you’re free to explore.
00:19:32.520
Engage directly in the process of writing code or documenting—avoid reading code solely for the sake of it.
00:19:39.480
And if you feel blocked, just ask for help.
00:19:47.040
This is one of the most common bad habits among juniors, as they may feel intimidated or unsure.
00:19:54.600
It’s important to reassure them that you're there to assist.
00:20:01.680
To recap, if you want to dive into a new codebase, aim to define a good goal, such as fixing a small bug.
00:20:08.280
Get a map to help locate what you need, then equip yourself properly to immerse in the codebase and pursue your goal.
00:20:15.840
And remember, once you’ve done it, do it again. Thank you!