Evented Ruby vs Node.js

00:00:14.690 Hello everyone, my talk today is about evented Ruby versus Node.js. First, let me share a little bit about myself. My name is Jerry, and I work at GitHub on the enterprise team. I primarily work with Ruby and JavaScript, but over the past year, I've been exploring evented programming a bit more. I wanted to share some of the insights I've gained and how you can apply these concepts in your own projects.

00:00:22.500 Before diving into the talk, I want to outline some key principles I keep in mind whenever I think about performance issues. When optimizing for performance, it should always be in service of our end-users. Too often, we chase after micro-optimizations that, while technically interesting, do not justify the effort involved. It's essential to identify the low-hanging fruit regarding performance problems in your project.

00:00:40.530 Consequently, we should aim to implement sustainable changes—those that are relatively straightforward to make and easy to maintain over the long term. It can be frustrating to return to a project six months later and find yourself questioning why you made certain choices that have since become problematic.

00:01:01.830 In this talk, I will cover a variety of evented programming topics. I will begin with a high-level introduction to evented programming and discuss its applications in server-side programming, particularly in web development. Next, we will explore the similarities and differences between evented programming in JavaScript and Ruby, along with the pros and cons of each approach. Finally, I will explain how you can implement evented programming techniques in your existing applications, whether within your Rails apps or by writing specific features that work alongside your web applications.

00:01:37.169 So, what exactly is evented programming? Essentially, it involves registering callback functions for events that you care about. This is something we already do daily when working with client-side JavaScript. For instance, you might have a code snippet that changes the text color to red whenever there is a click event on the DOM. This is a simple example of the reactor pattern.

00:01:54.659 The reactor pattern consists of a system—the reactor—that listens for triggered events. In our example, the reactor is the browser. When it detects a click event or a drag event, it delivers those events to any registered callbacks. Certain domains naturally lend themselves to this evented reactor pattern, such as UI events like keyboard, mouse, and touch events. We cannot predict when these events will occur, so we define the appropriate event handlers and register them at the start of our program.

00:02:05.450 The advantage of having reusable reactors is that they save us a significant amount of time that would otherwise be spent writing mouse detection and related functionality. However, despite the intuitive nature of UI events, we can also define events for other, less obvious occurrences. For example, we might want to register an event for when a wireless connection becomes available or when data is ready to be read from the disk.

00:02:27.480 On the Node.js side, we see that Node uses an event-driven, non-blocking I/O model. This concept overlaps with the terminologies we’re discussing, but what does it mean in a server-side context? Unlike the browser example, you can frame Node.js as a general-purpose reactor. Instead of merely dealing with mouse clicks and events, it serves as a reactor capable of delivering any arbitrary events that we care about.

00:02:42.260 Regarding blocking I/O, I love this quote from Christian Lange, who compares RAM to an F-18 Hornet with a maximum speed of 1,200 miles per hour and disk access speed to a banana slug with a top speed of 7,000 miles per hour. Intuitively, we all understand this: CPU speeds are faster than memory, memory is faster than disk, and disk is faster than network access. When our CPU lacks data to process, we have to idle it and wait.

00:03:02.129 Fortunately for us, the operating system and hardware caches hide these complexities. For example, when we perform file I/O to read a text file, it functions straightforwardly, but under the hood, the CPU might be idle, leading to wasted processing power. Luckily, the operating system is intelligent enough to switch between different processes when it notices that one CPU is busy.

00:03:23.010 This ability is evident when loading a song in iTunes; it doesn’t cause our entire system to grind to a halt. You can still operate other applications, like your browser and Rails app, concurrently. While this may not be the most optimized method, it's simple, leading to less complexity in managing concurrent processes.

00:03:50.490 Thus, what Node.js provides is an optimization of the I/O step by delegating some of the responsibilities to the application layer. Instead of merely relying on the operating system to switch between processes, now you have a reactor in the Node process that can manage I/O callbacks and switch between tasks internally.

00:04:16.320 Up to this point, we've primarily discussed evented programming. But why should we care about this for web apps, especially when we already have process concurrency? The answer lies in the fact that we're still performing a lot of blocking I/O operations that can slow things down significantly.

00:04:38.880 In my opinion, it's crucial we look towards a future direction that allows us to slim down processes and make better use of our resources. Some everyday use cases include interacting with databases, hitting external APIs, or even image processing, where waiting for other subprocesses to finish can lead to significant bottlenecks.

00:05:02.570 For instance, if we're writing a controller to save tweets, we might first create a new tweet object, then shorten any links within the text, and finally save the tweet to a database. Even in these three simple lines, we're hitting the network and touching the filesystem, which makes us block on I/O. This blocking means our CPUs cannot handle multiple requests until the existing one has finished.

00:05:27.360 The current workaround in Rails is to launch multiple processes, with each handling one request at a time. This approach is straightforward and functional, but it consumes a lot of memory. If this same logic were to be implemented in Node, we would substitute all blocking calls with function callbacks.

00:05:46.800 Instead of calling shorten links and saving directly, we would instruct the code on what to do upon completion—like shortening my links first, then calling a function to save my tweet to the database once that’s done. When we visualize what the concurrency looks like, we see that Node operates within a reactor.

00:06:05.850 As requests come in, Node works quickly by registering callbacks and giving control back to the reactor. Consequently, while the first request undergoes blocking I/O, the reactor thread retains control and can immediately start handling new incoming requests.

00:06:24.480 At some point, when I/O finishes for the first request, we invoke the callback, shared with other requests as they complete. Notice that even though we're initiating new requests during blocking I/O, only one thing is actively processing at any given time, eliminating concerns regarding race conditions and deadlocks.

00:06:55.740 With Node, the reactor efficiently switches between requests internally. Each reactor can process more requests while maintaining simplicity in code execution. If one Node process becomes overwhelmed, we can still achieve process concurrency by launching additional Node processes.

00:07:15.150 It's vital to remember the distinction between latency and concurrency. The Node process may handle a higher number of requests in a single instance, but if there's a slow incoming request, Ruby will respond in just the same timeframe as JavaScript for an equivalent request.

00:07:37.010 If you experience a very slow response, optimizing for response latency becomes crucial, as this is what users will undoubtedly notice. The trade-off for Node brings attention to the fact that our application code becomes conscious of blocking I/O.

00:08:04.870 If our domain model handles tweets, we can find ourselves focused on managing blocking I/O, which can be challenging for those accustomed to Ruby's maintainability. This leads to 'callback spaghetti,' making the code less readable and manageable.

00:08:28.950 We know that Node can yield improved concurrency compared to Rails. However, it brings along the challenge of complicated callback structures. The question then arises, can we achieve similar benefits with Ruby while avoiding these drawbacks?

00:08:51.360 Ruby is indeed capable of evented programming and supports general-purpose utility. While we generally write Ruby in a procedural manner, we can also adopt Ruby's evented style if the problem fits, allowing us to combine these paradigms.

00:09:03.480 The goal is to create an effective reactor. Unlike Node, where the reactor starts automatically, in Ruby, you need to explicitly initiate your reactor. One common solution is to use a gem like EventMachine, which is widely adopted within the Ruby ecosystem.

00:09:32.900 When setting up a reactor, you would call `EM.run` and pass a block of code that will run inside the reactor. While you can manually start a reactor, most app servers in Ruby are reactor-aware, meaning once you configure things correctly, your application code will run within a pre-existing reactor.

00:10:04.510 After initiating the reactor, we need to subscribe to the relevant events that matter for our application. To illustrate this, let’s take an example involving web requests. To request a webpage, we create a request object and register a callback for when this request concludes.

00:10:26.199 Sadly, even this Ruby example resembles the earlier Node.js example closely. Thus, even while fetching a webpage, we are still managing blocking I/O and callbacks ourselves. The issue arises that irrespective of which language or framework you're using, you must handle this inversion of control pattern inherent to evented programming.

00:10:48.470 In fact, I’d argue that this callback nature of Ruby makes it even worse compared to the Node version. Writing this code in Ruby may lead to valid syntax, but it doesn't feel natural for Ruby developers, straying from its expressive, readable nature.

00:11:11.130 We would ideally want to maintain a procedural syntax while executing code in an event-driven style in the background. Conceptually, code execution would resemble the following, allowing effortless page fetching with a more intuitive syntax.

00:11:29.269 Libraries such as Faraday enable this fluidity. Faraday allows us to abstract away blocking I/O intricacies, replacing the need for callbacks. By using a suitable adapter like EventMachine synchrony, we can conceal event callbacks within the library, keeping our application code clean and directed.

00:11:57.690 With this approach, page fetching effectively runs in evented fashion, while your code at the application layer remains intuitive. The trick is leveraging Ruby's fibers, which facilitate lightweight cooperation concurrency. Fibers can create blocks of code that can be paused and resumed, but scheduling is left for the programmer to arrange.

00:12:31.600 In practice, fibers function akin to coroutines. You start a request, pause it when anticipating blocking I/O, and yield control back to the reactor. When the data is eventually ready, the paused request can be resumed.

00:12:52.110 Although this setup can still feel clunky, the benefit lies in our ability to have multiple requests executing in their specific fibers; thus, we can manage the task-switching effectively as web requests are inherently independent.

00:13:10.014 A practical implementation of this involves wrapping each request in its fiber using a Rack middleware called RackFiberPool. Configuring Rails, which is Rack-compliant, is a breeze; you simply need to position the middleware at the top of the stack.

00:13:30.070 Other frameworks that work with Rack, such as Sinatra and Grape, can also utilize this middleware to take advantage of evented programming without altering much of the application code. Thus, by selecting a reactor for our app server and embedding each request into its fiber, we start to benefit from an event-driven architecture.

00:13:48.720 However, after such configurations, benchmarking might leave you disappointed, as all the underlying libraries may still operate in blocking fashion. The Ruby ecosystem was not designed initially with evented programming in mind, leading to code that blocks and stalls the reactor.

00:14:06.040 While the reactor kicks off the first request, any blocking library will hold it in place, thereby blocking the handling of new requests until the first one concludes. This scenario leads us to the same position as before: using process concurrency.

00:14:29.810 Nonetheless, we can still do better; we’re not worse off than we began. Each Rails process continues to handle only one request and it is possible to improve upon this structure. Our initial step should focus on unblocking the reactor to ensure better performance.

00:14:44.380 To this end, examining external HTTP calls, data stores, and system calls becomes paramount. A lot of data store drivers offer inherent support for evented libraries, requiring minimal configuration. For HTTP integrations, utilizing Faraday provides built-in adapters for smooth operation.

00:15:13.240 Additionally, for system calls, EventMachine provides its own non-blocking version of Kernel#pOpen, enhancing compatibility. If rewriting a section of code isn’t feasible, we can call EM.defer, which will kick off a separate thread, complete the task, and relay the outcome back into the main reactor.

00:15:40.630 Taking these steps to adapt libraries for reactor awareness allows a pattern of execution increasingly reminiscent of Node.js. So, then why, and when, should you opt for Ruby? Implementing evented I/O into your Rails application will yield clear benefits.

00:16:03.360 You can leverage existing code resources, ensuring performance doesn't take a hit and retaining the elegant Ruby readability everyone appreciates. Moreover, Ruby supports multi-paradigms, so you can customize and optimize specific features based on individual programming styles.

00:16:22.450 On the contrary, when utilizing Node, the primary ongoing benefit arises from the consistent evented paradigm inherent throughout the language. You’ll find that every library aligns seamlessly, handling callbacks intuitively during blocking I/O with no other alternative.

00:16:46.250 Moreover, the Node community is more robust than Ruby's evented community at this time, providing better support, documentation, and resources for those who may encounter issues. While crafting utility scripts with evented I/O is fairly straightforward in either Ruby or JavaScript, transitioning an entire application might not yield the best results.

00:17:06.400 Instead of attempting to rewrite everything in another language, whether migrating from Ruby on Rails to JavaScript or vice-versa, it is advisable to focus on developing new specific features that can harmoniously coexist with your existing systems.

00:17:27.540 For instance, at GitHub, we render all our dot-com pages through Rails; however, despite lacking the advantages of evented programming in this context, we still prioritize a fast experience for users. This is evident with features like the zip button allowing users to download a zip archive of a Git repository.

00:17:50.860 In such cases, a Node process operates alongside the Rails process. When an archive request comes through, we allow Node to generate the zip file on the fly, sending it back to the user immediately.

00:18:12.950 To wrap things up, evented programming can be challenging in any language you might choose, and rather than a strict choice between Ruby and JavaScript, focus on where evented programming is suited for specific types of problems.

00:18:31.490 Remember, it doesn’t magically enhance your request speeds, so it’s vital to keep response times reasonable prior to diving into evented paradigms. Ultimately, during maintenance and readability, always hide I/O logic behind libraries specifically designed for I/O management.

00:18:50.430 Avoid muddling your core business logic with callbacks concerning your underlying database functions. When experimentation occurs, don’t hesitate to tinker with these concepts in a dedicated branch. Benchmark your application to discern which optimizations offer significant improvements and which may just complicate matters.

00:19:12.490 Thank you for your attention.

00:25:17.630 If you have any questions, I’d be happy to address them. One thing I mentioned was the importance of concealing the complexities of evented programming within libraries, as exemplified by Faraday. However, this leads to a point of discussion about juggling blocking I/O tasks, such as handling HTTP requests alongside disk requests.

00:26:03.820 So, while you can optimize for specific tasks, are we limited to juggling similar events at the expense of others? The key focus should be on ensuring your reactor remains unblocked. If the event managers for your database calls and HTTP libraries are reactor aware, they can execute independently.

00:26:37.720 Does that provide any advantages with reactor management? Even with the reactor and added complexity, you should be able to handle multiple types of calls effectively. So therein lies the ability to optimize specific segments without compromising overall performance.

00:27:00.000 Thank you, everyone, for listening. If there aren’t any additional questions, I appreciate your involvement and hope this gave you insights into evented programming.