00:00:17.810
Good afternoon, everyone! Thank you for hanging tough until the last session before the closing keynote. My name is Jason Clark, and I work as an engineer at New Relic. This background informs a little bit of what we're going to talk about today.
00:00:25.230
I'm going to start off with a couple of different examples that you may be able to relate to. The first one is: has anyone here ever had a Ruby program that's running but stopped? It's still there but not doing what it's supposed to do; it’s deadlocked in some fashion or hung up somewhere, and you don't really know what's going on.
00:00:43.710
You look at the process, and while it is there, it's hardly turning in any CPU cycles. There's just no detectable activity. You check the log file, but it has nothing. Imagine you're in this scenario, and just to up the ante a little: perhaps your co-workers have been talking about adding some multithreading support to this service. They believe it will make it much faster if we put some threads in there.
00:01:14.670
Suddenly, you have this deep and abiding dread because you think you might have a deadlock. Your Ruby process is stuck, and you have no idea why, nor do you know where in the code this is happening. That's the sort of situation where GDB can come to your rescue.
00:01:35.190
GDB stands for the GNU Debugger. It is actually a very old tool; the first version was released in 1986. However, it is a C-level debugger, among other things, with lots of extensions. It allows you to dig into the native layer of code beneath where your Ruby is executing. We’re at RubyConf—why am I talking about a C debugger? Why do we care about this?
00:01:59.880
The majority of us use MRI, which is the reference implementation for Ruby, also known as Matz's Ruby. The truth is that Mats actually prefers we call it CRuby. The Ruby VM that executes your code is written in C, and GDB, as a tool, has visibility into that layer. It lets us see what's happening underneath as we try to solve these problems.
00:02:27.890
This talk is intended as a gentle introduction; you're not expected to know C. I will step you through how you can use GDB as a tool to understand what's happening in these under-the-hood processes.
00:02:52.820
With our stuck process, there are a couple of conventions we will follow during this talk. When there's a dollar sign at the beginning of a line, it indicates a command we will issue at our terminal. At our terminal, we're going to say 'gdb -p' followed by the process ID of the process that we care about.
00:03:05.980
GDB is going to throw out a fair amount of output. It will inform us that it's attaching to the process, and you may see it spit out information about symbols it's loading. Eventually, you'll arrive at a prompt within GDB where you can issue commands. GDB has taken the process that we were running, paused its execution, and is now asking us, 'What do you want to do?' The first thing you might want to know is where exactly the execution is paused.
00:03:50.420
So, the natural command to issue is 'bt' for backtrace. When you do this, the output may be much more extensive than I've shown here; GDB can generate some pretty large stack traces. However, this isn't something to be scared of. It looks complicated, but it is basically just a stack trace showing us the methods and files that we've stepped through. Let's try to break down what GDB has shown us and see where it is similar to what we know.
00:04:23.060
The output is lengthy, but it gives us nice line numbers as some of the lines may wrap, which will help in tracking frames. Each frame shows the function name associated with it. You can think of these frames a lot like our Ruby methods, which you would identify in your code. If the appropriate debugging symbols are available, it will even tell you where in the source code that came from.
00:05:09.740
For instance, you see something in our Unix system dependencies, and on the third one, that corresponds to MRI itself. If we reference line 42 of that file, we could look and see exactly where this is executing. This information is quite pertinent; we're suspecting some sort of deadlock, and MRI methods conventionally start with an 'rb_' in the function name. Here we have 'rb_mutex_lock,' which certainly smells like a deadlock.
00:05:55.149
This is a snapshot of what's going on in this process, but simply observing one thread in a lock doesn't provide us a complete picture. Let’s ask GDB what else is going on in the process. If we use 'info threads,' GDB will give us a list of all the threads in the current process and show us where they're actually executing.
00:06:22.999
The threads are numbered, and those numbers remain consistent across the lifetime of the thread. As seen in the backtrace, GDB provides the function names. Importantly, we can see that this 'pthread_cond_wait' was the top of the frame for that backtrace we were paused on, and here we have two separate threads both waiting on a condition in the pthread library.
00:06:59.999
This gives us further information and reinforces our theory about the deadlock, but we want to see how we arrived at those locations. GDB provides a command called 'thread apply' that allows us to select certain threads and have GDB run a specific command across those threads.
00:07:29.999
The command we can use is to show all the backtraces for all the threads. If we execute that command, we notice that thread number three has the 'rb_mutex_lock'—exactly where we had been paused before. Additionally, thread one is also paused at that same lock. With this information secured, we can approach our co-worker—who thought they were clever for adding multithreading—and point out that they likely introduced a deadlock. Hopefully, they are wise enough to fix it!
00:08:27.130
Now, to exit GDB, we type 'exit.' However, 'exit' isn’t actually defined, unfortunately, so we simply type 'quit' instead.
00:08:40.830
Before moving on, let’s review briefly what we’ve discussed. We have seen how to attach GDB to a running process. You can do this with any process on your system as long as you have the permissions to do so. However, it’s wise to avoid critical processes since it can be easy to make things stop or behave badly from within GDB. Nonetheless, it is tremendously useful to stop processes to inspect and debug them. We've explored several commands that allow us to look at the state of our processes and understand where our code is executing.
00:09:40.800
Lastly, having found the answers we needed, we can gracefully exit GDB and return to our tasks in the real world. That is a concrete example of using GDB to debug a live process. However, I have found GDB is also incredibly useful for exploration.
00:10:43.990
Next, we're going to discuss an intriguing problem that I encountered in Ruby itself. I didn’t quite understand why it was behaving the way it was, and we can utilize GDB to dig into the details and figure out what’s happening in the VM beneath us. The specific setup for this is the 'uniq!' method. You see, this method takes an array and is supposed to return the list of unique elements.
00:11:38.480
The bang methods, as you may know, are mutating methods. Typically, there is a non-bang counterpart available in the standard library, which means it should modify the original array in place. In this case, the actual array we have would modify its members. However, when we use 'uniq!', it also surprises us by returning the array of unique elements.
00:12:02.760
If you call 'uniq!' on an array that contains no duplicates, you get back 'nil,' which seems puzzling. When I first encountered this behavior, I was sure something strange was occurring. To investigate further, we will set up a little script that replicates this bug, allowing us to analyze what’s happening.
00:12:42.350
When we execute this script, we output the array that has been processed by 'uniq!' and observe that it returns the expected results. However, it also returns 'nil.' This was quite perplexing. How do we initialize this with GDB?
00:13:07.140
As expected, GDB has facilities to start an arbitrary process instead of attaching to one that is already running. You can achieve that using '--args' from the command line. We provide the same command you would normally use to run that Ruby script. GDB starts up and drops us to a prompt. The program is paused before anything happens.
00:13:43.240
To get it started, we instruct GDB to run the program. As before, we see a variety of outputs. There will be several outputs detailing the handling of threads and debugging symbols. Yet in the midst of all that, we see output to standard output, which matches what we anticipate from our program, confirming it's running correctly.
00:14:31.370
Now that we've established a process to run the same test case repeatedly from GDB, we want to analyze what's happening specifically when we invoke this method on an array. To do this, we need to find a way to instruct GDB to pause at the specific point of interest.
00:15:40.270
Fortunately, we have the Ruby documentation as a valuable resource to guide us through the bridge between Ruby code and C functions. For instance, if you lookup the 'uniq!' method in the Ruby documentation, you can toggle and expand a section showing the source code for that Ruby method.
00:16:46.790
While some methods might be implemented in Ruby, many core features of Ruby are actually implemented in C. For those unfamiliar or a bit rusty, critical pieces to understand include that C functions are key for pinpointing locations to debug in GDB.
00:17:32.650
Now, if we look at the code that appears when we expand the Ruby docs, we notice fundamental pieces resembling the C functions we discussed. For example, the return value is declared above the function name while 'value' denotes the type and refers to a Ruby object that is passed around throughout MRI.
00:18:49.290
We also see the code for 'uniq!' takes an array as the parameter and is designed to return a Ruby object. The key takeaway to progress with our debugging is the identifier 'rb_array_uniq_bang,' which GDB can utilize to halt execution when this method is accessed.
00:19:29.130
As mentioned before, the 'rb_' prefix is standard in MRI to indicate Ruby methods, while 'ary_' denotes functions related to the array class. This naming convention helps link C functions with their Ruby equivalents.
00:20:33.550
To invoke GDB so it breaks every time 'rb_array_uniq_bang' is called, we set that up during our initial run command. If we then execute this, rather than completing the program all the way through, GDB halts and presents us with a prompt, allowing us to inspect what is occurring around that location.
00:21:13.430
Now let's take a moment to survey where we are. Since we've gotten to this paused state with GDB, we can inspect the variables available to us and check their values. GDB will also show us the source code preceding and succeeding the line we're currently paused at.
00:22:01.020
Within this paused execution, we can display any variable at hand, including our array parameter. However, displaying this array yields an address in memory—essentially a pointer, indicating where in memory the array data is stored.
00:23:06.620
What we really desire is to see the actual values within this array. To comprehend how those values are organized in memory, we introduce the concept of a struct—a way that C organizes and groups data.
00:23:59.310
For instance, a struct called data might represent an integer in memory for us. When we create this struct, the layout corresponds to our specification, allowing us insights into how Ruby objects relate to their internal struct definitions in C.
00:24:49.750
Hence, every Ruby object operates with an underlying struct called 'RBasic.' This contains flags indicating the type of Ruby object and a pointer to its class type. This principle means that we can interpret object attributes yet require structural familiarity to unwrap their details.
00:25:56.900
If we execute GDB commands to examine the 'RB' structures, GDB displays their details neatly, allowing us to decipher what data is present in memory. However, this can be complicated when you're thinking about the underlying structure of arrays. Fortunately, Ruby introduces 'rb_obj_print,' a function that helps us print a Ruby object’s state.
00:26:56.550
By invoking this function, we can get a clear representation of our array. Yet as we pause and check, we notice that the array yields a string rather than the expected array elements. This prompts us to further investigate its origin.
00:27:55.250
Utilizing 'gdb list,' we can review the nearby code context as well as the values around us. Meanwhile, the 'display' command will print a neat representation of any local variable that’s accessible during our paused state.
00:28:44.550
As we explored the output, we noticed some unexpected values in our array, which raises questions of their origins. Following this exploration, MRI provides another helpful function, 'rb_backtrace,' which gives us Ruby's perspective on the current stack.
00:29:35.610
Upon inspecting the backtrace, we realize that the execution flow isn’t focused on our script but rather through Ruby gems that are invoked before our method executes. So, as per the user experience, Ruby preparations that run upon starting the script led it to call 'uniq!'
00:30:38.490
Knowing this allows us to address the root cause of our issues, and if we say '--disable-gems' when running our Ruby code, we find that the expected output is achieved, as we're now directly invoking our script.
00:31:19.820
Moving further into the unique bang method, it is practical to set breakpoints throughout without cluttering our execution flow. We start by checking the length of our array; if it's less than or equal to one, we return 'Qnil'—the C representation of 'nil'. This line shows the intention of design to return 'nil' in cases where the input is already unique.
00:32:54.950
Thereafter, the method checks if a block is given; otherwise, it maps the array into a hash based on the values presented. This leads us to realize that our intuition about memory efficiency might be misleading because it trades off memory usage by adding yet another structure.
00:34:31.380
This means that if the initial length matches the created hash size, we again return 'nil.' Thus, the behavior we saw arose from the function returning 'nil' meaningful contextually.
00:35:05.440
Having examined this method's internals, we can guide our process forward—back to our active execution! Using 'continue,' we can allow the program to resume its course.
00:35:45.540
Throughout this section, we’ve learned how to traverse methods inside Ruby while also utilizing GDB’s extensive features. Moreover, Ruby's documentation provides robust guidance on connecting Ruby's interface with its underlying native code.
00:36:39.740
Finally, as we conclude, I’d like to mention a few unconventional tricks you can implement once you get accustomed to operating under this level. Sometimes, GDB's output can be perplexing, especially if it conflicts with your application writing logs to standard output. A novel approach involves closing file descriptors for standard out and error, redirecting them to log files.
00:37:47.040
This achieves cleaner debugging, as it eliminates interleaved output that can distract during the inspection. GDB also allows executing arbitrary Ruby code via 'call' which provides further insights into the backtraces of all threads—albeit with certain risks.
00:39:10.390
GDB also allows scripting commands to automate tasks, and we have developed a script that captures thread backtraces for effective debugging. With a simple setup, you can customize your GDB environment to streamline commands and facilitating smoother operations with pleasant little features.
00:40:09.520
Let's wrap up with a recap: We started off talking about how GDB can effectively debug Ruby processes, especially in situations where deadlocks occur. Looking at a unique problem, we navigated the depths of initializing code, encompassing a variety of techniques and capabilities.
00:40:57.900
By exploring GDB alongside Ruby’s integration with C, we broadened our understanding of how low-level programming intersects with high-level debugging. If the content sparks curiosity, I hope that you'll ponder experimenting with GDB yourself and explore processes on your machines in the near future! Thank you!
00:42:53.800
So the question is, did I find out why uniq! returns nil? The answer is no, but it is clearly the specified behavior of that method.
00:42:56.510
During the Q&A, someone asked whether you need to recompile Ruby with a debugging flag, and the answer is yes; you would need to check specifics based on how you're installing Ruby. If you want a fast play environment, the official Docker images for Ruby on Docker Hub have all the necessary setups.
00:43:22.350
Regarding re-entrancy issues encountered while debugging, I haven’t personally run into that, but it is indeed a possibility.
00:43:36.570
Additionally, yes, you can disable the garbage collector or check its running status during debugging, which is indeed advisable when setting up test cases.
00:43:44.920
Thank you all for your attention!