Charles Nutter

Summarized using AI

High Performance Ruby

Charles Nutter • February 20, 2013 • Earth

In the talk titled "High Performance Ruby" presented by Charles Nutter at RubyConf AU 2013, the speaker addresses common misconceptions regarding Ruby's performance and outlines strategies to optimize Ruby code for efficiency. The presentation emphasizes that while Ruby may face criticism for being slow, it is essential to evaluate whether it is 'fast enough' for practical purposes. The key points discussed include:

  • Performance Optimization Approaches: Ruby can be optimized for overall execution time, memory usage, developer time, and even developer happiness. It's crucial to measure what aspects need improvement through benchmarking, while being wary of the limitations of such benchmarks.
  • Understanding JRuby: Nutter, a significant contributor to JRuby, explains how this particular Ruby implementation leverages the Java Virtual Machine (JVM) to achieve better performance. He highlights the optimizations present in JRuby and encourages developers to consider the underlying Ruby implementation to enhance their coding practices.
  • The Golden Rule of Optimization: Particularly in Ruby, doing less work is crucial. This involves avoiding repetitive tasks such as redundant object allocations and method calls. Efficient coding techniques include caching method lookups, inlining methods, and careful memory management.
  • Common Pitfalls: The presentation identifies pitfalls that hinder Ruby's performance such as excessive use of eval, creating singletons, dynamic method definitions, and altering class hierarchies at runtime. These practices complicate performance optimizations due to the dynamic nature of Ruby.
  • Memory Management: Nutter emphasizes the significance of minimizing object creation to prevent memory bloat, suggesting the use of constants and frozen literals for better memory utilization. He also advises against inefficient data handling that can lead to increased garbage collection overhead.
  • Leveraging JVM Features: Since JRuby runs on the JVM, it benefits from its optimizing capabilities. Nutter discusses how JRuby has improved its performance significantly over the years owing to advancements in JVM technology, indicating a potential 300% increase in performance simply by utilizing the JVM's evolution.

In conclusion, the talk reinforces the idea that Ruby developers have the power to enhance their application's performance significantly by understanding the language's underlying architecture, applying effective coding strategies, and avoiding common performance traps. By embracing these optimization techniques, Ruby can shed its reputation for being slow and evolve into a tool that meets high-performance requirements.

High Performance Ruby
Charles Nutter • February 20, 2013 • Earth

RubyConf AU 2013: http://www.rubyconf.org.au

Ruby has become a global phenomenon, and more people use it every day. But too often, Ruby is maligned for poor performance, poor scalability, and inability to efficiently parallelize. It's time we changed those impressions.
This talk will cover strategies for optimizing Ruby code, from using different Ruby implementations with better performance characteristics to generating code and avoiding excessive object churn. We'll see how one implementation, JRuby, optimizes different Ruby language constructs and how knowing a bit about your Ruby impl of choice will help you write better code. And we'll explore the future of Ruby for high performance computing and solutions to get us there.
If you don't know how to write faster, better Ruby code by the end of this talk, I'll let you buy me a beer and we'll keep talking.

RubyConf AU 2013

00:00:05.200 Thank you, thank you. All right, good morning, good morning.
00:00:11.679 We'll dive right into this here because I've got a lot of fun stuff to cover.
00:00:18.160 Basic information about me: I have been a Java developer in the past, probably since the beginning.
00:00:23.359 I've been working on JRuby a lot since about 2006. It's amazing that I've managed to convince multiple companies to pay me full-time to work on JRuby for the past seven years. Hopefully, we can keep that up.
00:00:30.800 Now I ended up at Red Hat, which seems like a nice place for us, working in the JBoss Polyglot group.
00:00:36.079 As one of the JRuby developers, I'm also one of the co-authors of the JRuby book, so if you're interested in JRuby, I recommend you buy many copies of it and give them to all your friends and family, if possible.
00:00:42.079 So let's actually get into the content here. Who would say that Ruby is fast? It's not really the question that gets answered with "yes" very often,
00:00:49.440 and in fact, anyone who isn’t in the Ruby community will say that Ruby is downright slow.
00:00:55.520 The question usually ends up becoming, is Ruby fast enough? Who feels like Ruby is fast enough for what you do?
00:01:02.559 Of course, for most people, that answer is yes; otherwise, you wouldn’t be here. If it's not fast enough for you to get your job done, you probably wouldn't be using it in the first place.
00:01:08.479 But "fast enough" is even a relative term. Fast enough for what? What are we trying to optimize for?
00:01:14.560 We have a couple of different options here: we can optimize for the overall execution time of a program; that's the most straightforward and typical way that people think about optimizing performance.
00:01:25.200 We can also optimize for memory use. Memory is cheap, but it's not free. If you're running large instances on AWS, for example,
00:01:31.360 you are going to pay for a lot of memory and for those instances running, so keeping memory use under control can be useful.
00:01:36.960 Optimizing for developer time is a different aspect or metric that isn't directly related to the hardware you're running on. It's about how long it takes you to get up and going, get your program running.
00:01:42.960 That's also an important optimization metric.
00:01:49.759 As Michael mentioned, we should also consider optimizing for developer happiness, which is a very subjective metric that's hard to benchmark.
00:01:55.200 I haven’t seen very good benchmarks for happiness, but it is something we hold very important as Ruby developers.
00:02:01.200 In order to determine what we're actually measuring or how to decide whether we've optimized enough—whether we're running fast enough or using the right amount of memory—we use benchmarks.
00:02:10.240 Well, we can measure all this stuff, but does it actually tell us what we want to know? Does it help us improve the process?
00:02:18.319 Does it help us write faster programs that use less memory and help us get more done in a shorter amount of time?
00:02:23.840 We have to take benchmarks with a grain of salt as we go. They are, again, relative measurements.
00:02:30.239 Most people have seen the benchmarks game, which is a lovely set of meaningless benchmarks that everyone seems to quote all the time.
00:02:36.560 Here is the overall score given to different languages based on this, and yeah, Ruby is not a particularly fast language.
00:02:44.400 Ruby 1.9 is down there toward the bottom, a little bit better than Python 3 but not quite as good as PHP.
00:02:49.120 If we look at these comparisons in more detail, Ruby versus Python looks like it’s a little bit faster on some benchmarks and a little bit slower on others.
00:02:55.200 Ruby and Python kind of end up being about even in performance these days.
00:03:02.000 Let’s go with another enemy of the state: PHP. It’s not looking as good here.
00:03:08.000 PHP has spent more time optimizing; it’s not as complicated a language internally so Ruby falls a little bit further behind.
00:03:16.720 Now we can go with another flavor of the day: Clojure. We're starting to see some more depressing metrics here in Ruby's straight-line performance.
00:03:22.240 Although we're doing much better on memory use, this is essentially Ruby against the JVM.
00:03:29.120 The JVM does an excellent job of optimizing straight-line performance, but you pay a cost and end up using more memory.
00:03:35.440 Of course, there’s the whole Node.js and JavaScript world, where they're among the most vocal saying that Ruby is slow.
00:03:41.600 For straight-line performance, they kind of have a point; they run their benchmarks better than standard Ruby 1.9.
00:03:47.920 It can get even worse when we discuss Java. If you want to write Java, Ruby isn’t going to compare favorably in terms of performance.
00:03:53.440 And in the end, we all get beaten by C. Let's just write everything in C because we would use less memory and less execution time.
00:03:59.240 Of course, that's not telling the whole story if we just look at these benchmarks because the benchmarks aren't really what's important.
00:04:05.760 If we want to talk about optimizing Ruby, having a high-performance environment to run Ruby code in is essential.
00:04:11.040 We need to learn from those benchmarks how we can improve our code, improve the implementations of the language we love,
00:04:17.440 and create better applications that run faster and do more.
00:04:23.840 Now we actually get to the base of this talk: how we can optimize Ruby, what we can do in the code that we are writing,
00:04:29.120 and in the VMs that we build for Ruby.
00:04:35.440 It all boils down to the golden rule of optimization: simply do less work.
00:04:42.000 Ideally, you do only the work that is absolutely necessary to get your application running and get the job done.
00:04:48.080 In Ruby, this means looking for unnecessary work to remove, such as looking up the same data repeatedly.
00:04:55.360 This includes constantly hitting hashes, constantly going after the same methods without caching,
00:05:02.000 and frequently accessing constant or static values that don’t change.
00:05:08.240 On the other hand, allocating too many objects leads to performance issues.
00:05:14.080 Memory does play into straight-line performance as well; creating objects has a cost.
00:05:20.000 In acquiring memory, initializing it, setting it to something, and then later on, garbage collecting it,
00:05:26.160 cleaning it up and returning it to the system.
00:05:32.960 These two areas are where we need to look at optimizing to improve the performance of Ruby code.
00:05:39.520 So, what is it that we want from our programs? What do we want out of Ruby that makes us happy as Ruby developers?
00:05:45.680 Firstly, the fact that we can generate code at runtime is awesome.
00:05:52.239 Every new developer that comes to Ruby usually starts by evaluating code and creating methods on the fly—doing all this meta-programming simply because it feels so powerful.
00:05:58.080 Injecting behavior when you need to is a significant advantage.
00:06:05.600 Whether it’s including a module with utility functions into any object or adding a method at runtime,
00:06:11.840 it’s a fun and fast feature of Ruby that we enjoy.
00:06:18.080 We often think that creating more objects is free because memory is free, and the garbage collector will clean it all up.
00:06:23.360 So, we tend to create more objects as needed, accumulating lots of intermediate states.
00:06:30.240 This makes it easier to follow the program, allowing us to abstract things into multiple levels of classes.
00:06:37.520 However, why do Ruby applications run slow?
00:06:44.960 Unfortunately, these are the exact same reasons.
00:06:52.679 These problems, while potentially useful, can negatively impact performance.
00:06:59.360 And while we often see that these features can lead to optimizations, folks continue to use methods like eval at runtime.
00:07:05.760 They wonder why their programs aren't fast or why performance isn't improving.
00:07:12.480 People continue to create new types and extend modules into objects, not understanding the performance costs.
00:07:18.080 It brings to mind something that Einstein said: insanity is doing the same thing over and over again and expecting different results.
00:07:26.000 People don’t seem to learn that there are things we can do in Ruby programs to improve performance.
00:07:32.800 There are two key areas we can look at improving in Ruby applications: execution, which involves calling methods and running code,
00:07:39.760 and allocation, which deals with memory management.
00:07:47.600 Let’s get into execution.
00:07:54.560 In Ruby, we have dynamic invocation, which means we don’t know the method we’re going to call until runtime.
00:08:01.920 This involves looking up the method based on the target object and the method name.
00:08:09.520 In some languages, we also have to look at the parameters to see which method to call.
00:08:16.240 In Ruby, it’s simpler; we only need to know the target type and the method name.
00:08:22.800 However, it’s still an expensive process.
00:08:29.200 We look up the method in the target class; if it's not there, we look at the superclass.
00:08:36.080 We keep going up the inheritance chain until we find the method we’re looking for.
00:08:43.440 So ideally, we want to cache the method.
00:08:50.960 If we have that method in hand, we don’t want a lot of indirection; we’d like this to be as close as possible to a static call, like you'd have in C or Java.
00:08:57.680 Indirection can cause performance issues.
00:09:05.040 A lot of run-times, both in Ruby and in other languages, inline methods, combining them into the same piece of code.
00:09:12.320 We cache it, inline it, and then a lot of optimizations come out of that.
00:09:18.160 Let’s look at how this actually functions in practice.
00:09:25.520 We have a call to a method, let’s call it foo.
00:09:30.160 Our foo class over on the other side defines some methods.
00:09:36.799 First, we go and look up the method. We ask the VM, 'I need to make this call. I have this object and the name of the method I want to look up; please find that method for me and bring it back.'
00:09:42.480 Once we do the invocation, we branch to that code and make the call, and ideally, we cache it close to the main call site.
00:09:50.480 There are other ways to cache this—across classes or with global caches, depending on how we track the last method that was called.
00:09:56.480 Once we have this method cached, we can inline and optimize since we know we’re calling the same method repeatedly.
00:10:02.320 Let’s look at a quick pseudo-example of inlining some code.
00:10:09.200 We have a little loop calling a function, here called invoker, that calls foo.
00:10:15.680 If we know that invoker always calls the same foo method, we can inline foo into invoker.
00:10:21.760 Essentially, invoker just does the same thing that foo does. We avoid the extra call and the extra cost that comes from that invocation.
00:10:28.480 We can inline invoker as well, pulling that numeric value into the loop.
00:10:35.040 Now we’ve eliminated unnecessary method calls. However, that value might not even be used.
00:10:41.840 What we can end up with is a loop that actually does nothing except increment a variable up to a certain limit.
00:10:48.920 If that variable is never read, we can optimize further by ignoring the logic entirely.
00:10:55.920 We've distilled our programs down to only the essential work that must be done.
00:11:02.880 Applying the golden rule of optimization: do less work. If we can do none at all, that's ideal.
00:11:08.240 What gets in the way of this process? We have cached methods and build assumptions in frameworks.
00:11:15.120 We assume that the cache is valid and that it always references the correct target method, so the VM must ensure this.
00:11:22.720 Typically, we do this in Ruby by checking that we are still calling the same target type.
00:11:28.240 If it's always strings, we know that we have the same methods there every time.
00:11:34.080 However, we also have to check that the method table hasn't changed.
00:11:40.160 Unlike languages like C++ or Java where the structure of the method table is mostly static,
00:11:46.960 in Ruby, the method table can change at runtime.
00:11:53.440 Every class in Ruby starts as a blank slate until it's filled with methods.
00:12:00.480 What does this lead to? If we have a new type every time at a given call site, we can’t cache anything.
00:12:06.800 We know that we will get a different method table, meaning the cache is now defeated and we have to look it up again.
00:12:13.680 Making modifications to the method table at runtime makes it almost impossible to optimize based on that method.
00:12:19.920 The problem with these many lovely features we want in Ruby is that we often end up in an awful cycle.
00:12:26.560 We are constantly updating the method table, creating new types, and we cannot implement optimizations.
00:12:32.240 The inability to cache the method means we can’t inline or optimize the performance of our code.
00:12:39.840 There’s only so much that VMs can do to optimize in this situation.
00:12:45.600 What are the usual suspects causing these problems? The first one is eval.
00:12:52.160 Most people know that evaluating code at runtime is not a good idea, but many still do it.
00:12:58.480 So, what hurts performance in this case? If the code is constantly changing, there’s nowhere to cache anything.
00:13:05.600 We have no static view of the methods or the code being executed, and thus there’s no optimization possible.
00:13:12.160 If the VM can't cache it, it can't see patterns that would allow for inlining.
00:13:18.160 It can't see values that are always static and optimize around them.
00:13:24.400 As a result, with eval, there’s really no optimization possible.
00:13:30.080 We can speed up eval itself by caching the parse tree, but generally, if code is evaluated repeatedly,
00:13:37.040 there’s no way to optimize it.
00:13:44.080 How do we fix this in our code if we think we need eval? The simplest way is to evaluate that code into a static structure.
00:13:50.560 You can evaluate it into a method body and then call that method.
00:13:57.920 Most of the time, evaluated code is still largely static, so we can stick it into a method or proc and call that instead.
00:14:04.080 The VM then has something it can hold onto: a method body, which never changes, allowing some caching.
00:14:10.000 In JRuby or Rubinius, if you put that into a method body, it will eventually JIT compile to native code.
00:14:16.640 With dynamic state, like some interpolation or other dynamic content, you can just pass that in as well.
00:14:22.080 Branches are much cheaper than evaluating code repeatedly.
00:14:28.000 If you can avoid eval in the hot path of your application, that would be best.
00:14:34.320 Perform evaluations up front in your code, get everything designed into classes, methods, and procs, and at runtime, only hit those.
00:14:40.000 Now, singletons are a little different.
00:14:48.160 Creating a singleton from a Ruby object creates a new anonymous synthetic type with every request.
00:14:54.080 As far as the VM is concerned, it looks like a completely new type, and all caching must be done anew.
00:15:00.880 Additionally, singletons define new methods, leading to a method table that’s entirely new from the previous one.
00:15:07.920 This leads to lots of optimizations being defeated.
00:15:14.160 In some Ruby implementations, like Ruby 1.9 and MRI, creating singletons can blow the method cache for the entire system.
00:15:21.920 Since there isn't a valid per-class caching mechanism, it ends up invalidating everything.
00:15:28.880 So how do we fix this problem?
00:15:36.080 It’s better to create some of these aggregate types upfront. It’s far better to have a hundred or even ten thousand classes defined at the start.
00:15:41.680 Defining them programmatically is another option, but defining them at the program's start is key.
00:15:48.120 Different design patterns and architectures can be beneficial, such as the entity-component architecture.
00:15:55.440 This separates the data of the application entities from the behavioral components.
00:16:01.920 You don’t have to inject behavior into the data objects you're passing around.
00:16:08.000 Just use those components or external representations that manipulate the data.
00:16:14.640 One concrete example of this is a recurring issue I filed—still unresolved in MRI—related to introducing heavy extensions into certain objects at runtime.
00:16:22.560 This is a trivial change to make in almost any program.
00:16:29.120 Instead of extending and dynamically adding behavior, create several concrete classes.
00:16:35.680 In JRuby, for instance, we have a readable and writable class that simply reflects the behavior of the original.
00:16:41.440 We can avoid the complex extensions entirely.
00:16:48.960 Now, let's move off the execution side and look at accessing data in memory.
00:16:56.160 Most of the time, we should stick data in constants, class variables, and optimize access to that data.
00:17:02.560 Constants are defined in tables living on classes and modules with various mechanisms for looking them up.
00:17:09.840 This data is generally assigned once, so we can cache it and avoid repeated lookups.
00:17:16.160 What does this look like in practice, then?
00:17:23.440 When we ask the VM to find a value, it goes through the lexical scopes and the class hierarchy to locate it.
00:17:30.560 It brings back the value and caches it as close to the access site as possible to avoid extra overhead.
00:17:37.360 However, we need a way of validating that this cache remains correct.
00:17:43.680 Since constant lookups are both lexical and hierarchical, caching becomes complicated.
00:17:50.880 It's not as simple as saying you just need to look at a certain type or know you will get the right constant for the value.
00:17:57.760 Because of this structure, cash invalidation in many implementations uses a single global serial number.
00:18:05.680 Every time a new constant is defined, that serial number increments, and all caches must update.
00:18:11.120 Unfortunately, this means if we redefine a constant in development mode, one of the reasons Rails can be slow is that,
00:18:17.680 It has to reevaluate code, constantly redefining constants.
00:18:24.320 Using many new lexical scopes at runtime can also be problematic and create inefficiency.
00:18:31.040 This involves calling blocks and introducing many new class structures or method definitions.
00:18:38.000 As new scopes are encountered, caches need to be rebuilt.
00:18:45.680 Evaluated code will also contribute to cache busting, and altering class hierarchies will blow away constant caches.
00:18:52.480 If you include modules at runtime, or modify class hierarchies through extending singletons,
00:18:59.440 this can force constant cache updates across the whole system.
00:19:06.080 How do we fix this? First, don't modify constants; leave them constant.
00:19:12.960 When I tell folks in Java and other languages, they often don't realize Ruby's constants can be lazily defined and can be redefined.
00:19:19.600 So let's treat constants as constant to see better results.
00:19:26.000 Avoid runtime class hierarchy changes as well. Modifications have far-reaching effects that VMs have little ability to handle.
00:19:32.560 Now let’s move on to allocation.
00:19:39.760 Ruby loves objects; we all love creating them as quickly as possible.
00:19:46.320 In every single request of a Rails application, there can be hundreds or thousands of strings and arrays created.
00:19:53.600 If you look at a memory profile of a Rails running app, you'll see that its slowness likely comes from all the objects it creates.
00:19:59.040 Closure state can also be an issue, causing more objects to be created just for capturing state.
00:20:06.000 This eventually leads to less efficient memory management and slows performance.
00:20:12.040 The culprits here are literals; creating a literal string, like 'foo', generates a new object every time.
00:20:19.600 A literal array also creates a new array object every time regardless of what’s in it.
00:20:25.600 Concatenating strings without realizing you're creating throwaway objects can also be wasteful.
00:20:32.320 Doing slice and enumeration patterns is particularly inefficient, creating multiple intermediate arrays that are never used.
00:20:39.200 So what can we do to fix these issues? Start with literals; embrace constants.
00:20:45.680 If you have data that’s mostly constant, convert those literal strings into constants and freeze them.
00:20:52.160 Most VMs excel at optimizing these, leading to immediate performance improvement.
00:20:59.440 Cache your most common interpolated strings and regular expressions.
00:21:05.200 Usually, only a small number of frequently used values cause unnecessary allocations.
00:21:13.120 Study memory profiles! While tools for Ruby aren't always perfect, you'll be able to find issues using JRuby.
00:21:21.440 Fixing concatenation and copying will also improve performance, even if it leads to trade-offs regarding thread safety.
00:21:29.120 Fix enumeration chaining by condensing into fewer steps; avoid the chain in the first place.
00:21:40.320 As an alternative, consider the loop. You’ll often find better results by creating a simple loop over values with necessary changes.
00:21:46.560 What do you get from making these changes in your application?
00:21:54.240 The case study I know best is JRuby's performance improvements.
00:22:01.680 We run on the JVM, giving us all its optimizations for free, allowing JRuby's performance to improve.
00:22:09.600 We’ve been applying these optimizations to the JRuby codebase since its start; everything I’ve shown you here involves similar strategies.
00:22:18.080 What if we just let the JVM developers do the work? We’d see JRuby’s performance improve significantly.
00:22:26.240 When beginning with Java 1.4 back in the early days, JRuby's performance improved by 300% just thanks to JVM progress.
00:22:33.760 By doing nothing since 2006, we would have surpassed Ruby 1.8 simply due to their efforts.
00:22:40.480 How do we glean improvements from the JVM?
00:22:47.840 We convert Ruby source into our AST, compiling it into JVM bytecode.
00:22:54.160 The JVM bytecode runs through its interpreter, eventually turning it into native code, leading to faster execution.
00:23:03.120 The JVM can improve native code over time, allowing cacheable calls and avoiding common pitfalls.
00:23:10.640 JRuby evolves, leveraging the JVM for many benefits, as seen in our evolution over time.
00:23:17.840 Working on JRuby on OpenJDK 8 has yielded an eightfold improvement in performance.
00:23:25.920 Looking at the JVM’s optimization strategies and our improvements, we can show significant gains after applying strategies effectively.
00:23:34.560 We've continually seen performance levels skyrocket by refining internal implementations and caching strategies.
00:23:42.400 For instance, invoking dynamic allows the JVM to better understand Ruby's requirements.
00:23:50.560 With dynamic calls being inlined, less overhead is seen, helping improve performance across the board.
00:23:58.560 The payoff lies in how improvements can lead to faster execution, better memory management, and resource optimization.
00:24:06.240 This leads us to the ultimate lessons learnt: performance doesn’t just happen. Don’t sit down and wait for your applications to get faster.
00:24:14.080 You really need to understand how Ruby gets optimized.
00:24:21.760 Deeply dive into memory profiles, so you understand where allocations are happening.
00:24:28.560 Ultimately, don’t be hindered by the myth that Ruby is slow—there’s plenty of work being done.
00:24:36.720 You can make strides in your programs to facilitate Ruby’s speed.
00:24:44.640 All right, thank you!
Explore all talks recorded at RubyConf AU 2013
+21