Summarized using AI

Threads, callbacks, and execution context in Ruby

Andrey Novikov • April 12, 2024 • Sydney, Australia • Talk

In his talk "Threads, Callbacks, and Execution Context in Ruby" at RubyConf AU 2024, Andrey Novikov delves into the intricacies of Ruby's blocks and their execution contexts. He emphasizes the importance of understanding blocks for Ruby developers, especially when dealing with callback-based APIs in various Ruby gems. Here are the key points discussed in the presentation:

  • Introduction to the Speaker: Andrey Novikov, a veteran Ruby developer with nearly 15 years of experience, shares his background and the significance of the Ruby community.

  • Understanding Callbacks vs. Blocks: Novikov differentiates between callbacks and blocks in Ruby. He warns against misusing Active Record callbacks, advocating for their use only to maintain data consistency.

  • Blocks as First-Class Citizens: Blocks in Ruby serve as a unique feature different from functions in languages like JavaScript. They are more akin to closures, with an internal environment that remembers local variables and the context in which they were defined.

  • Execution Context and Binding: The complexity of blocks lies in their ability to access the environment in which they were declared. Changes to this environment, particularly the 'self' object reference, can lead to unexpected errors.

  • Real-World Implications of Blocks and Threads: Novikov highlights the potential pitfalls when using blocks within threaded environments. For instance, reliance on Thread.current can lead to erratic behavior when blocks are executed across threads, particularly when using thread pools.

  • Performance Considerations: He notes the performance benefits of using thread pools over individual threads when managing Ruby applications, citing specific examples related to the NATS Ruby gem and how they optimized message handling.

  • Conclusion and Best Practices: He wraps up with advice on avoiding common pitfalls related to blocks and threading, emphasizing the need for Ruby developers to read and understand the source code of libraries to avoid subtle bugs.

Overall, Novikov's talk serves as an insightful exploration of Ruby's blocks, equipped with practical experiences and recommendations for developers to enhance their understanding and efficiency in writing Ruby code.

Threads, callbacks, and execution context in Ruby
Andrey Novikov • April 12, 2024 • Sydney, Australia • Talk

When you provide a block to a function in Ruby, do you know when and where that block will be executed? What is safe to do inside a block, and what is dangerous? What is a block, after all? Blocks in Ruby are everywhere, and we’re so used to them that we may even be ignorant of all their power and complexity. Let’s take a look at various code examples and understand what dragons are hidden in Ruby dungeons.

Ruby programmers usually don't need to think much about blocks… until they have to. As a contributor to Ruby gems that have callback-based API for developers, I've found that internal implementation details of these gems affects how these callbacks are executed and resulting behavior can be quite surprising sometimes, and I think that good Ruby developer should know inner workings of blocks to better understand execution flow of complex Ruby programs.

RubyConf AU 2024

00:00:03.959 Okay, hello everyone! Nice to meet you here in Sydney. My name is Andrey Novikov, and I am a software developer who has been using Ruby for almost 15 years now. I also work with various technologies around Ruby. I'm a fan of DevOps; I can deploy to Kubernetes, but I don't particularly enjoy it. Originally, I'm from Russia, but in 2022, I relocated to Japan, partly due to the unfortunate actions of Russian authorities and partly because I have a deep appreciation for Japan. I've been happily residing in Japan with my family ever since, enjoying the Japanese order, safety, and traffic while riding my moped around the neighborhood.
00:01:01.320 If you're considering moving to Japan, I’d love to chat about it after the talk, as I have some experience to share. I’m also very excited to finally be in Australia. When I saw the RubyConf Australia call for speakers, I looked at the distorted Mercator map and thought, 'Wow, Japan and Australia seem not so far from each other.' Relatively, yes, they are, but the Pacific Ocean is vast, and Australia itself is massive. I was surprised to learn that it's almost 8,000 kilometers from Sydney to Osaka. This geographical perspective distortion is due to being accustomed to maps based on the Mercator projection, which makes areas near the poles appear much larger than they actually are. This has often bothered me, and we will revisit this topic later.
00:02:02.759 I have to confess I’m an alien—an evil Martian agent undercover here on Earth, coming to take over Australia. My company, Evil Martians, is a distributed web development agency that assists companies of all sizes, from small startups to large enterprises, to thrive and prosper. We help startups grow effectively, and we assist enterprises in scaling their monolithic applications and testing hypotheses. Recently, we have been focusing on developer tools, particularly open-source projects and effective software development practices. Australia is still quite an enigma for us, and we look forward to assisting Australian companies in building better software.
00:02:16.680 If you need help with performance scaling, architecture, or implementing best practices, or if you want to strengthen your team with experienced engineers, feel free to reach out to us. We currently have a few engineers in Japan, including myself, which means we’re almost in the same time zone, which can be very helpful. One thing we truly love is open-source software; we enjoy using it and contribute back enhancements to the community. We’re here to share the results of our work, whether as a Ruby gem or an NPM package. This will not only help us reuse our own solutions but also provide assistance to others in solving their problems. Often, we will receive feedback or contributions in return, creating a win-win situation.
00:02:45.239 As Tesh mentioned in the previous talk, contributing to open source helps us gain recognition among you. It also aids in hiring, making it a beneficial initiative from all perspectives. For many years, we have created literally hundreds of open-source products, both big and small, some of which are well-known and widely used. It’s likely that your application already depends on several of our Martian gems, so take a moment to check your Gemfile and see how many of our gems you are utilizing.
00:03:15.160 For instance, I created a small Yaba mini-framework for instrumenting Ruby applications to facilitate monitoring and observability. Some of our open-source products have even evolved into commercial projects, such as any cable or image proxy, while still remaining open-source, following the open-core model. I’m very proud of this evolution. However, let’s return to the theme of worldview distortions and explore how programmers sometimes undervalue even the most basic language constructs simply because they are so accustomed to them. Let’s talk about callbacks in Ruby.
00:04:16.240 Now, what are callbacks? Not these! Many people, including myself, have had bad experiences with them. I advise against sending emails from Ruby on Rails models! Please, only use Active Record callbacks to maintain data consistency. Let’s shift our focus to Ruby blocks. They are so idiomatic and widely used that no one really realizes that while they perform their required task, they are technically very different from traditional for loops found in Ruby or other programming languages. A block in Ruby is not just a sequence of instructions between 'do' and 'end'; it is much more closely aligned with functions or methods.
00:05:00.280 First, you declare a block, and then it gets called by the times method multiple times using the iterator as an argument. Internally, executing a block is very similar to calling a method. It requires the creation of a special data structure in C, pushing it onto the stack along with any parameters, and so forth. While I won’t dive deeply into the internal details today, I strongly recommend an enlightening book titled 'Ruby Under a Microscope.' This book is truly enlightening, especially if you're familiar with how computers operate at a low level.
00:05:29.000 Of course, all the additional steps required to execute the block make it slightly slower than a simple for loop. However, in real-world applications that demand more than just incrementing a counter, this difference becomes negligible. Don't compromise readability by fixating on micro-benchmark results, as that does not represent the primary focus of this talk. The key point to remember is that in Ruby, blocks are used as callbacks. In contrast to languages like JavaScript or Go, where you would pass a function as an argument to another function, Ruby has a unique entity with special syntax for declaring callbacks, making them first-class citizens.
00:06:06.720 It's important to know that a block is more than just a type of function; it's also a closure. Every Ruby block consists of two parts: the code that executes and the environment in which the code executes. The first part is straightforward; it contains the instructions between 'do' and 'end,' which form the block's body. The environment is a bit more intricate.
00:06:45.000 Blocks remember all local variables defined before their declaration, as well as the object in that context, which is typically 'self.' When a block is called, it executes in this context, referencing variables that might have been defined a long way from its creation. However, this environment isn’t immutable; it can change. The most frequently modified aspect of the environment is the binding of the 'self' object.
00:07:02.000 Generally, such changes are intentional, allowing for the creation of nice Domain Specific Languages (DSLs). Yet, if you’re not aware of the changes occurring, the method receiving your block will use the current instance; you may end up scratching your head at cryptic error messages about undefined methods. For instance, if we look at a method that receives a block and executes it but alters the context to that of an instance method, the code using this API may attempt to define a block with the 'self' reference from the block's declaration context and it will fail, producing an enigmatic error message.
00:07:45.600 Imagine a situation where you're referencing a variable from within a block, but upon execution, it finds a different 'self.' This can reshape your understanding, so be prepared. You might consider saving the current 'self' to a local variable in your block, and while that would work, it isn't a reliable solution. Let’s say you have a block that uses a local variable, and you reassign that variable before calling the block again.
00:08:13.680 You will find that the block, originally defined, now uses the new value of the variable, not the value it was created with. This behavior can be quite mind-boggling, yet it remains valid. The reasoning behind this behavior is to enable multiple blocks to share access to common variables, as illustrated in the context of working with a counter from the book 'Ruby Under a Microscope.' Now, let’s explore other components that can be included in this environment.
00:08:49.360 One notable example is the current thread that executes the block. Interestingly, thread creation itself utilizes blocks to define the body of the new thread, but that isn’t the focus of today's discussion. Nevertheless, you can call a block from different threads and even multiple threads simultaneously, and closed variables will remain accessible as long as you do not attempt to mutate shared objects.
00:09:19.080 The real challenges begin if you attempt to rely on thread-local data within a block that could execute across different threads. Specifically, if you use `Thread.current` with the same key from one block to another, you might find that there's no assurance that the same thread will execute the same block—especially when using higher-level concurrency primitives like thread pools or promises from the concurrent Ruby gem.
00:09:31.760 This is not merely theoretical; let’s consider the Nut client for the NATS Ruby gem. NATS is a performant and simple message broker written in Go. Anyone who has attempted to deploy Kafka on Kubernetes will recognize the pain involved. NATS is swift and uncomplicated to set up, making it an attractive option. However, for versions of the NATS Ruby client before 2.3, a new thread was created for each subscription, and each new message for the same subscription callback was processed in that designated thread.
00:10:08.480 If you used `Thread.current` within the callback, it would work at that time. However, I later changed the implementation to use a thread pool where every callback is executed in some thread, meaning you could no longer rely on `Thread.current`. Why did I make this change? Primarily for performance reasons. Context switching is an expensive operation, and creating a new thread for each subscription to manage sporadically received messages is an utter waste of resources.
00:10:52.560 There is also a maximum limit on the number of threads that can be created in a single process, especially when you have numerous subscriptions, possibly tens of thousands. In testing, Ruby crashed after surpassing around 32,000 subscriptions and threads, respectively. In contrast, with a fixed-size thread pool, you can control the number of threads, reuse them, and avoid the overhead of context switching. My benchmarks indicated that using a thread pool for tens of thousands of subscriptions results in several times better performance than using individual threads, solely by reducing that context switching overhead.
00:11:41.560 Once this change was made and merged, I started to consider whether the behavior of the NATS client had changed, and I realized it certainly had. Frankly, it could be perceived as a breaking change. However, this behavior was not defined anywhere, meaning it could be treated as a minor internal implementation detail, thus negating the need to bump the major version. Thankfully, it appears that no one relied on `Thread.current` in NATS subscription callbacks, as it seemed not to be a common practice.
00:12:13.520 It’s crucial to understand that, in real-world applications, your code typically runs inside a thread pool. This could be with a Puma application server or within a Sidekiq worker. Therefore, avoid using `Thread.current` in your applications. Instead, utilize the current attributes in Rails to achieve the desired results. However, the most frustrating aspect of this situation is the lack of an easy mechanism to determine whether a block you defined will execute within its original instance context or in a separate thread.
00:12:38.520 The only way to ascertain this is to read and grok the source code of every library you use, which is a good practice, albeit challenging. We’ve reached the end of this small journey; I hope you enjoyed this exploration of Ruby blocks and gained new insights to help you avoid subtle bugs in your applications. Thank you very much for your attention.
00:13:22.560 Please check out our Martians blog, called Martian Chronicles. We have many interesting articles concerning Ruby, Rails, front-end design, and other topics. I also have some Martian stickers, like this one, so if anyone is interested, feel free to come to me after the talk to grab one. Let’s chat about anything—from Ruby to Japan, motorcycles, or anything else. Don't hesitate to reach out to me on social media and ask questions even after the conference. And if you happen to visit Japan, particularly Osaka, let me know so we can meet.
00:14:14.560 Thank you so much for your attention, and a special thanks to Ruby Australia for organizing this amazing event and for selecting my talk. I’m grateful for the opportunity to speak here, and now, let's get ready for lunch!
Explore all talks recorded at RubyConf AU 2024
+14