Code Spelunking: teach yourself how Rails works

by Jordan Raine

In this talk presented at RailsConf 2019, Jordan Raine explores the intricacies of the Ruby on Rails framework, focusing on how developers can better understand and navigate complex codebases through a practice known as "code spelunking." Raine emphasizes that getting lost in someone else's code can often lead to deeper insights and understanding. Here are the key points covered during the session:

Understanding Developers' Time: Raine highlights research indicating developers spend approximately 82% of their time on program comprehension and navigation, with only 5% of their time devoted to editing code. This underlines the importance of effective navigation techniques.
Challenges with Rails: Raine discusses several obstacles that make Rails challenging for developers:
- Size: A new Rails application might come with a vast amount of code, including dependencies, making it larger than the application itself.
- Complexity: Rails is structured in small components, which can span many files, complicating the flow of logic.
- Dynamic Nature: Rails employs conventions and dynamic code that can be confusing if not well understood.
- Familiarity: Developers often lack familiarity with the Rails internals as they do not actively contribute to it.
Techniques for Code Spelunking: Raine presents techniques to explore and understand code effectively:
- Ask a Question: Start with a specific question that needs answering.
- Start with What You Know: Leverage existing knowledge to guide the exploration process.
- Read Relevant Code: Focus only on the code that pertains directly to answering the question.
Example Exploration: Through a practical example, Raine illustrates how to navigate a method in Rails without the documentation by using method introspection. He shows how to differentiate between similar methods (e.g., try and try!) in Rails, emphasizing the importance of understanding what each method does through the source code.
Real-World Application: Raine presents a scenario where a developer needs to track request IDs for jobs in a background processing system like Sidekiq. He details the spelunking process used to determine how to inject the request ID into job processing, showing the iterative method of following methods to gain insights.

In conclusion, Raine encourages attendees to practice code spelunking and to invest in their ability to understand unfamiliar code, asserting that small improvements in comprehension can lead to significant gains in productivity. He recommends further exploration of debugging tools like Byebug and Pry for enhanced understanding of their codebases.

00:00:20.810 Good afternoon everyone! Thank you so much for coming. I hope you're all having a really great couple of days at RailsConf so far.

00:00:26.430 I just wanted to introduce myself. I'm Jordan Raine, from Vancouver, Canada.

00:00:31.970 If you want to reach me, I'm at jranallen on most platforms. I'm a staff developer at a company called Clio.

00:00:37.290 Like I said, we're based in Vancouver, and since 2008, we've been using Rails to try to change and improve how lawyers and law firms practice law.

00:00:43.080 If you're interested in this topic, come talk to me afterwards. We are remote-friendly.

00:00:48.570 Today's talk is about how getting lost in somebody else's code is sometimes the best thing you can do.

00:01:00.390 It starts with something that I think many of us have experienced quite often.

00:01:07.740 I love to program, and if you give me a problem, I can go for hours.

00:01:15.149 But sometimes it can feel like half of our time is spent trying to figure out what to do next.

00:01:20.340 I've been reading Ruby code since the Rails 2.3 days, but before that, I spent my time writing websites in PHP.

00:01:28.920 During that time, building a website meant opening a connection to the database at the top of the file and crafting a SQL string.

00:01:37.770 Sometimes, that SQL string included user input, and then you used the result to render the page below.

00:01:46.410 There were no models at this stage, and very few classes.

00:01:53.459 This was the whole process. As odd as it seems now, adding little snippets of code into HTML files to do everything myself seemed reasonable.

00:02:05.209 Everything was laid out right in front of me, so when I became stuck, it was often easy to find a way out.

00:02:11.310 However, we know now that working like this can be limiting. The possibilities for what I can do are reduced to my ability to cram everything into a single file, and to do things correctly every time.

00:02:23.930 When it came time to build something more complex, a co-worker pointed me to a framework called Rails.

00:02:29.700 It really changed what I thought was possible. I didn't have to craft SQL by hand anymore; I could use Active Record.

00:02:37.550 I didn't need to repeatedly copy and paste my URLs everywhere; I could use URL helpers. I could organize my code into neat buckets of behavior using the Model-View-Controller architecture, making everything easier.

00:02:54.420 Rails even has generators that would write some of the code for me. However, because not everything was in front of me anymore, getting stuck now felt deeper than ever before.

00:03:18.480 When a stack trace arose, it often included just one or two lines of my code. I didn't know how to differentiate between an application trace or a framework trace. And even if I did, I didn't know where to look for further information.

00:03:42.239 My best defense was to Google the error. If I found a fix, great; if not, I was left tweaking the code until the error disappeared.

00:03:54.750 As developers, we face problems every day, and one beautiful aspect of using Rails is that it frequently offers solutions.

00:04:07.799 However, the longer we work on an application and the more features we add, the more likely we are to come across something unusual.

00:04:12.900 It could be something that no one on the team has seen before, and you can't find any mention of it in the Rails documentation or guides.

00:04:22.140 This situation can leave you feeling trapped. During this session, I want to teach you how to navigate this predicament.

00:04:27.660 When you hit a wall, and the only way out is to dive deep within the Rails codebase, you'll know how to find the answers.

00:04:33.030 My hope is that when you sit down at your desk next week, you'll have new techniques that will help you understand how your app behaves.

00:04:38.610 You'll be capable of teaching yourself how any part of Rails works.

00:04:53.550 The talk will be divided into four parts, starting with how we spend our time.

00:05:00.479 A few years ago, a group of researchers wanted to measure how developers spend their time.

00:05:06.210 Specifically, they were interested in how much time we spend on program comprehension.

00:05:12.389 In other words, how much time do we spend trying to figure out what to do?

00:05:17.430 To gather this data, they installed software on the computers of 78 developers, tracking everything they did.

00:05:22.440 These developers worked at two companies on real projects for two weeks.

00:05:28.050 They managed to gather over 3,000 hours of data.

00:05:34.620 Once they had sufficient data, they categorized it into four groups.

00:05:41.460 The first was comprehension, defined as reading code, bug reports, and trying to understand project requirements.

00:05:50.250 This included any time spent trying things out in a browser or running tests.

00:05:55.800 The second group was navigation, defined as browsing through software.

00:06:02.789 This encompassed searching Google for answers, searching your codebase for the right method to call, or reading how something works on Stack Overflow.

00:06:09.060 The third group was editing, defined as adding, modifying, and deleting code, which is what most of us consider programming.

00:06:16.979 Finally, the fourth group was labeled 'other,' which included reading news online, shopping, or browsing social media.

00:06:23.460 For our purposes today, it makes sense to group navigation and comprehension together.

00:06:28.650 Before I show you the findings of this study, think back to your last week.

00:06:35.000 How much time did you spend stuck, reading on Stack Overflow, or searching Google?

00:06:41.550 And how much time did you actually spend writing code?

00:06:50.270 So here's what they found: on average, developers spent 82% of their time on program comprehension and navigation.

00:06:56.270 Yet, they only spent about 5% of their time editing code.

00:07:07.060 This indicates that we need to understand what our code does.

00:07:13.370 Time spent on comprehension and navigation is definitely not wasted.

00:07:20.510 However, looking at this data, I can't help but think that the less time I spend in this area, the more time I can dedicate to writing code.

00:07:25.880 The study also found several factors that can increase comprehension time.

00:07:33.050 After reviewing these, it really boils down to two things that we can do to improve matters.

00:07:38.360 Firstly, we can write code that is easier to use and change.

00:07:45.680 This benefits future you; the more time spent today adding tests, writing documentation, and carefully naming things, the less time you'll need to spend in the future figuring it out.

00:07:52.240 Secondly, we can improve our ability to find answers.

00:08:04.210 The first aspect benefits your future self, while the second helps you understand any code you come across today.

00:08:11.890 The study wasn't all negative; it revealed that the more experience someone has, the less time they spent on program comprehension.

00:08:18.590 Researchers found that experienced developers had techniques that allowed them to quickly and easily reach understanding.

00:08:30.320 It wasn't that they knew the answers; they just knew how to find them.

00:08:36.560 If we want to improve how we understand code, we need to recognize what we're up against.

00:08:43.070 This brings us back to Rails. What makes Rails difficult to understand, and what can we do to prepare for it?

00:08:53.780 Firstly, Rails is big. A new Rails 6 app comes with 77 gems, 12 of which are from the Rails repo, containing 75,000 lines of Ruby code.

00:09:07.779 When you include those 77 gems, it brings the total to over a quarter million lines of Ruby code.

00:09:16.700 For many Rails applications, the Rails framework and all its dependencies can be larger than the app itself.

00:09:23.300 Secondly, Rails is complex. While it's structured in neat buckets of behavior, consisting of small components that work together, this can be overwhelming.

00:09:37.550 For instance, if you call 'save' on an Active Record object, the process initiates several method calls across different files.

00:09:43.790 An example is that the behavior for 'save' is split across four different files, making it difficult to trace without understanding the order of calls.

00:09:49.820 Finally, Rails is dynamic, or you might call it 'magic.' While it simplifies our lives, it often involves code that can be difficult to grasp.

00:10:02.240 You might come across methods whose names change depending on the value of certain variables.

00:10:11.130 Or you may encounter code that dynamically evaluates strings as Ruby code. These patterns can lead to confusion.

00:10:22.220 Lastly, remember that Rails is not your codebase. Remember the time it took you to get comfortable with your current job's codebase. Getting to know Rails can take even longer, as most of us aren't constantly opening pull requests on Rails.

00:10:49.120 Since we're not immersed in the Rails codebase day in and day out, we don't easily gain an understanding of its inner workings.

00:10:54.310 To navigate around this vast size and complexity, that's where code spelunking comes in.

00:11:03.970 Code spelunking provides the ability to confidently explore an unfamiliar codebase, diving into tight corners and following twists and turns to find answers.

00:11:11.300 This could mean exploring Rails, another gem, or the codebase at your job.

00:11:18.610 When I go spelunking, I like to follow these three guidelines: ask a question, start with what you know, and only read code relevant to your answer.

00:11:26.650 Let’s try this on a simple question: what's the difference between 'try' and 'try!'?

00:11:35.180 A quick review: if you call 'length' on a string, it returns the number of characters in that string. If you call 'length' on nil, it raises an exception.

00:11:41.920 When you're unsure if something is nil, you can use 'try.' If you call 'try length' on a string, you will get a number.

00:11:48.390 However, calling 'try length' on nil will return nil, avoiding the exception.

00:11:57.660 Now, what about 'try!'? It looks similar; 'try!' on a string returns a number, while 'try!' on nil returns nil.

00:12:02.250 At first glance, they seem identical; so, why both methods?

00:12:10.630 If your first instinct is to look this up in the Rails docs, that's good.

00:12:19.670 However, let's pretend that you don't have access to the Rails documentation.

00:12:28.660 This presents a great opportunity to use method introspection.

00:12:38.860 If you're unfamiliar with method introspection, it's a very useful feature of Ruby.

00:12:43.390 It enables you to ask objects what methods they respond to and discover various attributes.

00:12:48.920 Let's look at an example. Imagine this class for a dog.

00:12:54.200 If we instantiate a new dog instance and ask for all its methods,

00:12:58.610 we would receive an overwhelming list of method names.

00:13:02.870 This can feel a bit overwhelming, as rails and Ruby include a lot of methods by default.

00:13:10.890 To clarify this, we can take all the methods from Object.new and subtract them from the dog's class, leaving us with only the methods we've implemented.

00:13:20.140 With method introspection, you can search through available methods, seeking something that includes a specific term.

00:13:27.290 We can request a specific method and inquire where it was defined, revealing its path and line number.

00:13:37.660 This can lead you directly to the source you're interested in.

00:13:42.990 Everything up to this point is standard Ruby, but to dig deeper, we can use a gem called method_source.

00:13:50.920 In Rails, this gem is typically included by default, but you can add it to any projects if needed.

00:13:56.780 Once you call the method, you retrieve a string with the source code.

00:14:06.660 Utilizing the `.display` method will present the string to standard output.

00:14:12.620 This method works for our code as well as for gems.

00:14:17.610 If you've ever wondered how something works in Rails, you can simply take a peek at the source.

00:14:24.680 Often, this reveals learning opportunities.

00:14:33.480 For example, I recently discovered that 'create' takes an array of attributes while preparing for this talk.

00:14:41.820 Returning to Active Record's 'save' method, we can explore its different implementations.

00:14:48.640 We find several definitions, including one from Active Record's persistence class.

00:14:59.640 Tracing through these implementations gives us the order in which they execute at runtime.

00:15:09.230 It’s important to understand that sometimes you'll encounter 'source not found' exceptions.

00:15:14.790 This is because parts of Ruby are written in C, so their source code may not be available.

00:15:22.470 However, for most cases, this isn't problematic since the majority of Rails is Ruby.

00:15:34.240 Going back to our earlier example, what's the difference between 'try' and 'try!'?

00:15:41.920 First, let's examine the relevant code. For 'try,' check the first line and eliminate any code that doesn't pertain to our inquiry.

00:15:51.880 We see the method name, nil, and a block given. Since we didn't provide a block, that line is irrelevant.

00:15:58.870 After cleaning it up, we're left with the line 'if respond_to? method public send method,' which is straightforward.

00:16:06.320 Next, let's reflect on 'try' when called on nil.

00:16:10.740 Not all of Rails is complicated! For any value given, it returns nil.

00:16:17.480 Now, comparing 'try!' shows similarities; we only need to investigate the relevant code.

00:16:24.300 After trimming it down, we have a similar structure again. Both methods appear alike.

00:16:33.780 When comparing the two, the critical difference is that 'try!' always calls the method.

00:16:45.040 To illustrate this, let’s return to the console. If we input a method name that the string doesn't recognize, 'try' will yield nil.

00:16:52.010 Conversely, 'try!' will raise an error.

00:16:59.610 Thus, we have our answer: 'try!' is more strict compared to 'try.'

00:17:07.510 As we previously discussed, we asked a question, looked at a couple of known methods,

00:17:13.400 and skimmed through around 20 lines of code to capture our answer.

00:17:19.310 That was a relatively simple question which we could likely have answered from the Rails Docs.

00:17:25.000 But let's apply this to a more compelling situation. Imagine being at work next week when your manager approaches you.

00:17:38.540 They mention a need to track job requests and queues. Lately, they've noticed many failed jobs in the background and are struggling to identify the users who triggered them.

00:17:46.580 So, they request that you add the request ID when jobs fail.

00:17:55.000 Your familiarity with Active Job has led you to believe that calling 'perform later' kicks off some magic.

00:18:02.430 Thus, let’s explore the code further. We have the Application Controller with a before action taking the request ID and assigning it to current_request_id.

00:18:10.580 In the controller action, we'll queue a job.

00:18:18.370 When that job runs, we want the request ID accessible to us.

00:18:27.000 With some groundwork already laid in Application Job, we have a request ID attribute.

00:18:35.000 In the job we're queuing, we print this out to the console to test our behavior.

00:18:50.170 If we set the request ID and queue the job, we verify through Sidekiq that it works.

00:18:57.900 So, how can we add the request ID to every job? Let's start with what we know and examine the 'perform later' source code.

00:19:05.740 This code is simple: it calls 'job or instantiate,' then calls 'enqueue' on the result.

00:19:15.660 Next, we want to delve into 'enqueue,' but we need the return value of the previous call.

00:19:23.450 We can do this by calling 'job or instantiate' on our job and grabbing the 'enqueue' method.

00:19:32.020 However, when we do this, we encounter an error, as 'Job#run_queue' is a private method.

00:19:39.080 In production, it's best to avoid calling private methods, but while spelunking, breaking this rule is sometimes necessary.

00:19:48.160 So, let's send a request and obtain the 'enqueue' method.

00:19:58.240 This gives us more code than we've seen thus far, and it could be overwhelming.

00:20:05.020 I like to squint a little and identify natural breaks in the code.

00:20:12.670 In this case, I see three segments.

00:20:19.190 The first sets variables based on the provided options. Since we haven't provided any options, we can ignore this.

00:20:27.300 The second segment reveals a few methods of interest, so I’ll keep this to examine further.

00:20:36.130 The final section is a large conditional statement. Depending on our success, different values return.

00:20:43.780 However, we don't concern ourselves with the return value.

00:20:50.480 After reviewing what we've gathered, we realize there is a call to 'queue adapter' that also utilizes that enqueue.

00:20:57.850 So, let's look up 'queue adapter,' which can be accessed from our job.

00:21:05.520 Let’s examine the source code for an understanding of how it queues jobs.

00:21:10.310 Again, I observe three sections.

00:21:16.140 The first section is the 'enqueue' method, whose role is to push a job into Sidekiq.

00:21:22.640 The second section briefly addresses time-stamping, but scheduling is not our concern here.

00:21:28.950 The final segment features a sidekick job with a one-line perform method.

00:21:36.400 This method retrieves job data, performs operations, and forwards that along to 'Base#execute.'

00:21:44.090 Let's place a break point here, instigating the included 'byebug' debugger.

00:21:51.360 After restarting Sidekiq, queue the job, and it triggers a breakpoint.

00:21:59.760 This behaves similarly to a Rails console, and we can analyze the job's data.

00:22:08.710 We find that the data is a serialized hash with our request ID included.

00:22:19.580 Let's descend into 'Base#execute' to see what it entails.

00:22:26.640 It runs a series of callbacks and calls 'deserialize' on the job data.

00:22:34.940 Exploring 'deserialize,' we can identify where job attributes and global values are assigned.

00:22:41.640 It would be beneficial to add the request ID.

00:22:48.670 Let's take the same approach we used for 'serialize' to extend its functionality.

00:22:55.600 Calling 'super' retains all existing behavior, allowing us to store the request ID.

00:23:03.470 Going back to the console, we set the request ID, call 'perform later,' and check in Sidekiq.

00:23:09.370 Joyfully, it works! We've successfully added the request ID to every job.

00:23:16.640 We modified serialization and deserialization processes to facilitate this.

00:23:25.940 To summarize, we tackled a question, examined a method we were familiar with,

00:23:32.150 and skimmed numerous lines of code to find our answer.

00:23:42.010 We've reached the end of our spelunking journey, I hope you've enjoyed your exploration.

00:23:52.000 Your participation has been exhilarating!

00:23:59.100 The techniques discussed today only scratch the surface; I encourage you to practice them further.

00:24:08.210 If you're interested in further learning, explore the 'byebug' debugger.

00:24:15.410 This tool allows you to step through code, dive into methods, and replaces your default Rails console.

00:24:21.540 It also enhances Ruby spelunking with features like method introspection, syntax highlighting, and command history.

00:24:31.840 Furthermore, you can delve into C code if desired, and recently discovered you can inspect where objects have been monkey patched.

00:24:41.670 Navigating others' code can be confusing and frustrating, but I assure you it's not going away.

00:24:49.900 The good news is that there are skills you can learn and practice.

00:24:57.550 These will allow you to confidently explore any unfamiliar code.

00:25:05.280 Investing in your ability to understand code can yield significant results.

00:25:13.650 Remember, you don't need to know all the answers; you just need to know where to find them.

00:25:20.620 Thank you!