00:00:05.200
Thank you, thank you. All right, good morning, good morning.
00:00:11.679
We'll dive right into this here because I've got a lot of fun stuff to cover.
00:00:18.160
Basic information about me: I have been a Java developer in the past, probably since the beginning.
00:00:23.359
I've been working on JRuby a lot since about 2006. It's amazing that I've managed to convince multiple companies to pay me full-time to work on JRuby for the past seven years. Hopefully, we can keep that up.
00:00:30.800
Now I ended up at Red Hat, which seems like a nice place for us, working in the JBoss Polyglot group.
00:00:36.079
As one of the JRuby developers, I'm also one of the co-authors of the JRuby book, so if you're interested in JRuby, I recommend you buy many copies of it and give them to all your friends and family, if possible.
00:00:42.079
So let's actually get into the content here. Who would say that Ruby is fast? It's not really the question that gets answered with "yes" very often,
00:00:49.440
and in fact, anyone who isn’t in the Ruby community will say that Ruby is downright slow.
00:00:55.520
The question usually ends up becoming, is Ruby fast enough? Who feels like Ruby is fast enough for what you do?
00:01:02.559
Of course, for most people, that answer is yes; otherwise, you wouldn’t be here. If it's not fast enough for you to get your job done, you probably wouldn't be using it in the first place.
00:01:08.479
But "fast enough" is even a relative term. Fast enough for what? What are we trying to optimize for?
00:01:14.560
We have a couple of different options here: we can optimize for the overall execution time of a program; that's the most straightforward and typical way that people think about optimizing performance.
00:01:25.200
We can also optimize for memory use. Memory is cheap, but it's not free. If you're running large instances on AWS, for example,
00:01:31.360
you are going to pay for a lot of memory and for those instances running, so keeping memory use under control can be useful.
00:01:36.960
Optimizing for developer time is a different aspect or metric that isn't directly related to the hardware you're running on. It's about how long it takes you to get up and going, get your program running.
00:01:42.960
That's also an important optimization metric.
00:01:49.759
As Michael mentioned, we should also consider optimizing for developer happiness, which is a very subjective metric that's hard to benchmark.
00:01:55.200
I haven’t seen very good benchmarks for happiness, but it is something we hold very important as Ruby developers.
00:02:01.200
In order to determine what we're actually measuring or how to decide whether we've optimized enough—whether we're running fast enough or using the right amount of memory—we use benchmarks.
00:02:10.240
Well, we can measure all this stuff, but does it actually tell us what we want to know? Does it help us improve the process?
00:02:18.319
Does it help us write faster programs that use less memory and help us get more done in a shorter amount of time?
00:02:23.840
We have to take benchmarks with a grain of salt as we go. They are, again, relative measurements.
00:02:30.239
Most people have seen the benchmarks game, which is a lovely set of meaningless benchmarks that everyone seems to quote all the time.
00:02:36.560
Here is the overall score given to different languages based on this, and yeah, Ruby is not a particularly fast language.
00:02:44.400
Ruby 1.9 is down there toward the bottom, a little bit better than Python 3 but not quite as good as PHP.
00:02:49.120
If we look at these comparisons in more detail, Ruby versus Python looks like it’s a little bit faster on some benchmarks and a little bit slower on others.
00:02:55.200
Ruby and Python kind of end up being about even in performance these days.
00:03:02.000
Let’s go with another enemy of the state: PHP. It’s not looking as good here.
00:03:08.000
PHP has spent more time optimizing; it’s not as complicated a language internally so Ruby falls a little bit further behind.
00:03:16.720
Now we can go with another flavor of the day: Clojure. We're starting to see some more depressing metrics here in Ruby's straight-line performance.
00:03:22.240
Although we're doing much better on memory use, this is essentially Ruby against the JVM.
00:03:29.120
The JVM does an excellent job of optimizing straight-line performance, but you pay a cost and end up using more memory.
00:03:35.440
Of course, there’s the whole Node.js and JavaScript world, where they're among the most vocal saying that Ruby is slow.
00:03:41.600
For straight-line performance, they kind of have a point; they run their benchmarks better than standard Ruby 1.9.
00:03:47.920
It can get even worse when we discuss Java. If you want to write Java, Ruby isn’t going to compare favorably in terms of performance.
00:03:53.440
And in the end, we all get beaten by C. Let's just write everything in C because we would use less memory and less execution time.
00:03:59.240
Of course, that's not telling the whole story if we just look at these benchmarks because the benchmarks aren't really what's important.
00:04:05.760
If we want to talk about optimizing Ruby, having a high-performance environment to run Ruby code in is essential.
00:04:11.040
We need to learn from those benchmarks how we can improve our code, improve the implementations of the language we love,
00:04:17.440
and create better applications that run faster and do more.
00:04:23.840
Now we actually get to the base of this talk: how we can optimize Ruby, what we can do in the code that we are writing,
00:04:29.120
and in the VMs that we build for Ruby.
00:04:35.440
It all boils down to the golden rule of optimization: simply do less work.
00:04:42.000
Ideally, you do only the work that is absolutely necessary to get your application running and get the job done.
00:04:48.080
In Ruby, this means looking for unnecessary work to remove, such as looking up the same data repeatedly.
00:04:55.360
This includes constantly hitting hashes, constantly going after the same methods without caching,
00:05:02.000
and frequently accessing constant or static values that don’t change.
00:05:08.240
On the other hand, allocating too many objects leads to performance issues.
00:05:14.080
Memory does play into straight-line performance as well; creating objects has a cost.
00:05:20.000
In acquiring memory, initializing it, setting it to something, and then later on, garbage collecting it,
00:05:26.160
cleaning it up and returning it to the system.
00:05:32.960
These two areas are where we need to look at optimizing to improve the performance of Ruby code.
00:05:39.520
So, what is it that we want from our programs? What do we want out of Ruby that makes us happy as Ruby developers?
00:05:45.680
Firstly, the fact that we can generate code at runtime is awesome.
00:05:52.239
Every new developer that comes to Ruby usually starts by evaluating code and creating methods on the fly—doing all this meta-programming simply because it feels so powerful.
00:05:58.080
Injecting behavior when you need to is a significant advantage.
00:06:05.600
Whether it’s including a module with utility functions into any object or adding a method at runtime,
00:06:11.840
it’s a fun and fast feature of Ruby that we enjoy.
00:06:18.080
We often think that creating more objects is free because memory is free, and the garbage collector will clean it all up.
00:06:23.360
So, we tend to create more objects as needed, accumulating lots of intermediate states.
00:06:30.240
This makes it easier to follow the program, allowing us to abstract things into multiple levels of classes.
00:06:37.520
However, why do Ruby applications run slow?
00:06:44.960
Unfortunately, these are the exact same reasons.
00:06:52.679
These problems, while potentially useful, can negatively impact performance.
00:06:59.360
And while we often see that these features can lead to optimizations, folks continue to use methods like eval at runtime.
00:07:05.760
They wonder why their programs aren't fast or why performance isn't improving.
00:07:12.480
People continue to create new types and extend modules into objects, not understanding the performance costs.
00:07:18.080
It brings to mind something that Einstein said: insanity is doing the same thing over and over again and expecting different results.
00:07:26.000
People don’t seem to learn that there are things we can do in Ruby programs to improve performance.
00:07:32.800
There are two key areas we can look at improving in Ruby applications: execution, which involves calling methods and running code,
00:07:39.760
and allocation, which deals with memory management.
00:07:47.600
Let’s get into execution.
00:07:54.560
In Ruby, we have dynamic invocation, which means we don’t know the method we’re going to call until runtime.
00:08:01.920
This involves looking up the method based on the target object and the method name.
00:08:09.520
In some languages, we also have to look at the parameters to see which method to call.
00:08:16.240
In Ruby, it’s simpler; we only need to know the target type and the method name.
00:08:22.800
However, it’s still an expensive process.
00:08:29.200
We look up the method in the target class; if it's not there, we look at the superclass.
00:08:36.080
We keep going up the inheritance chain until we find the method we’re looking for.
00:08:43.440
So ideally, we want to cache the method.
00:08:50.960
If we have that method in hand, we don’t want a lot of indirection; we’d like this to be as close as possible to a static call, like you'd have in C or Java.
00:08:57.680
Indirection can cause performance issues.
00:09:05.040
A lot of run-times, both in Ruby and in other languages, inline methods, combining them into the same piece of code.
00:09:12.320
We cache it, inline it, and then a lot of optimizations come out of that.
00:09:18.160
Let’s look at how this actually functions in practice.
00:09:25.520
We have a call to a method, let’s call it foo.
00:09:30.160
Our foo class over on the other side defines some methods.
00:09:36.799
First, we go and look up the method. We ask the VM, 'I need to make this call. I have this object and the name of the method I want to look up; please find that method for me and bring it back.'
00:09:42.480
Once we do the invocation, we branch to that code and make the call, and ideally, we cache it close to the main call site.
00:09:50.480
There are other ways to cache this—across classes or with global caches, depending on how we track the last method that was called.
00:09:56.480
Once we have this method cached, we can inline and optimize since we know we’re calling the same method repeatedly.
00:10:02.320
Let’s look at a quick pseudo-example of inlining some code.
00:10:09.200
We have a little loop calling a function, here called invoker, that calls foo.
00:10:15.680
If we know that invoker always calls the same foo method, we can inline foo into invoker.
00:10:21.760
Essentially, invoker just does the same thing that foo does. We avoid the extra call and the extra cost that comes from that invocation.
00:10:28.480
We can inline invoker as well, pulling that numeric value into the loop.
00:10:35.040
Now we’ve eliminated unnecessary method calls. However, that value might not even be used.
00:10:41.840
What we can end up with is a loop that actually does nothing except increment a variable up to a certain limit.
00:10:48.920
If that variable is never read, we can optimize further by ignoring the logic entirely.
00:10:55.920
We've distilled our programs down to only the essential work that must be done.
00:11:02.880
Applying the golden rule of optimization: do less work. If we can do none at all, that's ideal.
00:11:08.240
What gets in the way of this process? We have cached methods and build assumptions in frameworks.
00:11:15.120
We assume that the cache is valid and that it always references the correct target method, so the VM must ensure this.
00:11:22.720
Typically, we do this in Ruby by checking that we are still calling the same target type.
00:11:28.240
If it's always strings, we know that we have the same methods there every time.
00:11:34.080
However, we also have to check that the method table hasn't changed.
00:11:40.160
Unlike languages like C++ or Java where the structure of the method table is mostly static,
00:11:46.960
in Ruby, the method table can change at runtime.
00:11:53.440
Every class in Ruby starts as a blank slate until it's filled with methods.
00:12:00.480
What does this lead to? If we have a new type every time at a given call site, we can’t cache anything.
00:12:06.800
We know that we will get a different method table, meaning the cache is now defeated and we have to look it up again.
00:12:13.680
Making modifications to the method table at runtime makes it almost impossible to optimize based on that method.
00:12:19.920
The problem with these many lovely features we want in Ruby is that we often end up in an awful cycle.
00:12:26.560
We are constantly updating the method table, creating new types, and we cannot implement optimizations.
00:12:32.240
The inability to cache the method means we can’t inline or optimize the performance of our code.
00:12:39.840
There’s only so much that VMs can do to optimize in this situation.
00:12:45.600
What are the usual suspects causing these problems? The first one is eval.
00:12:52.160
Most people know that evaluating code at runtime is not a good idea, but many still do it.
00:12:58.480
So, what hurts performance in this case? If the code is constantly changing, there’s nowhere to cache anything.
00:13:05.600
We have no static view of the methods or the code being executed, and thus there’s no optimization possible.
00:13:12.160
If the VM can't cache it, it can't see patterns that would allow for inlining.
00:13:18.160
It can't see values that are always static and optimize around them.
00:13:24.400
As a result, with eval, there’s really no optimization possible.
00:13:30.080
We can speed up eval itself by caching the parse tree, but generally, if code is evaluated repeatedly,
00:13:37.040
there’s no way to optimize it.
00:13:44.080
How do we fix this in our code if we think we need eval? The simplest way is to evaluate that code into a static structure.
00:13:50.560
You can evaluate it into a method body and then call that method.
00:13:57.920
Most of the time, evaluated code is still largely static, so we can stick it into a method or proc and call that instead.
00:14:04.080
The VM then has something it can hold onto: a method body, which never changes, allowing some caching.
00:14:10.000
In JRuby or Rubinius, if you put that into a method body, it will eventually JIT compile to native code.
00:14:16.640
With dynamic state, like some interpolation or other dynamic content, you can just pass that in as well.
00:14:22.080
Branches are much cheaper than evaluating code repeatedly.
00:14:28.000
If you can avoid eval in the hot path of your application, that would be best.
00:14:34.320
Perform evaluations up front in your code, get everything designed into classes, methods, and procs, and at runtime, only hit those.
00:14:40.000
Now, singletons are a little different.
00:14:48.160
Creating a singleton from a Ruby object creates a new anonymous synthetic type with every request.
00:14:54.080
As far as the VM is concerned, it looks like a completely new type, and all caching must be done anew.
00:15:00.880
Additionally, singletons define new methods, leading to a method table that’s entirely new from the previous one.
00:15:07.920
This leads to lots of optimizations being defeated.
00:15:14.160
In some Ruby implementations, like Ruby 1.9 and MRI, creating singletons can blow the method cache for the entire system.
00:15:21.920
Since there isn't a valid per-class caching mechanism, it ends up invalidating everything.
00:15:28.880
So how do we fix this problem?
00:15:36.080
It’s better to create some of these aggregate types upfront. It’s far better to have a hundred or even ten thousand classes defined at the start.
00:15:41.680
Defining them programmatically is another option, but defining them at the program's start is key.
00:15:48.120
Different design patterns and architectures can be beneficial, such as the entity-component architecture.
00:15:55.440
This separates the data of the application entities from the behavioral components.
00:16:01.920
You don’t have to inject behavior into the data objects you're passing around.
00:16:08.000
Just use those components or external representations that manipulate the data.
00:16:14.640
One concrete example of this is a recurring issue I filed—still unresolved in MRI—related to introducing heavy extensions into certain objects at runtime.
00:16:22.560
This is a trivial change to make in almost any program.
00:16:29.120
Instead of extending and dynamically adding behavior, create several concrete classes.
00:16:35.680
In JRuby, for instance, we have a readable and writable class that simply reflects the behavior of the original.
00:16:41.440
We can avoid the complex extensions entirely.
00:16:48.960
Now, let's move off the execution side and look at accessing data in memory.
00:16:56.160
Most of the time, we should stick data in constants, class variables, and optimize access to that data.
00:17:02.560
Constants are defined in tables living on classes and modules with various mechanisms for looking them up.
00:17:09.840
This data is generally assigned once, so we can cache it and avoid repeated lookups.
00:17:16.160
What does this look like in practice, then?
00:17:23.440
When we ask the VM to find a value, it goes through the lexical scopes and the class hierarchy to locate it.
00:17:30.560
It brings back the value and caches it as close to the access site as possible to avoid extra overhead.
00:17:37.360
However, we need a way of validating that this cache remains correct.
00:17:43.680
Since constant lookups are both lexical and hierarchical, caching becomes complicated.
00:17:50.880
It's not as simple as saying you just need to look at a certain type or know you will get the right constant for the value.
00:17:57.760
Because of this structure, cash invalidation in many implementations uses a single global serial number.
00:18:05.680
Every time a new constant is defined, that serial number increments, and all caches must update.
00:18:11.120
Unfortunately, this means if we redefine a constant in development mode, one of the reasons Rails can be slow is that,
00:18:17.680
It has to reevaluate code, constantly redefining constants.
00:18:24.320
Using many new lexical scopes at runtime can also be problematic and create inefficiency.
00:18:31.040
This involves calling blocks and introducing many new class structures or method definitions.
00:18:38.000
As new scopes are encountered, caches need to be rebuilt.
00:18:45.680
Evaluated code will also contribute to cache busting, and altering class hierarchies will blow away constant caches.
00:18:52.480
If you include modules at runtime, or modify class hierarchies through extending singletons,
00:18:59.440
this can force constant cache updates across the whole system.
00:19:06.080
How do we fix this? First, don't modify constants; leave them constant.
00:19:12.960
When I tell folks in Java and other languages, they often don't realize Ruby's constants can be lazily defined and can be redefined.
00:19:19.600
So let's treat constants as constant to see better results.
00:19:26.000
Avoid runtime class hierarchy changes as well. Modifications have far-reaching effects that VMs have little ability to handle.
00:19:32.560
Now let’s move on to allocation.
00:19:39.760
Ruby loves objects; we all love creating them as quickly as possible.
00:19:46.320
In every single request of a Rails application, there can be hundreds or thousands of strings and arrays created.
00:19:53.600
If you look at a memory profile of a Rails running app, you'll see that its slowness likely comes from all the objects it creates.
00:19:59.040
Closure state can also be an issue, causing more objects to be created just for capturing state.
00:20:06.000
This eventually leads to less efficient memory management and slows performance.
00:20:12.040
The culprits here are literals; creating a literal string, like 'foo', generates a new object every time.
00:20:19.600
A literal array also creates a new array object every time regardless of what’s in it.
00:20:25.600
Concatenating strings without realizing you're creating throwaway objects can also be wasteful.
00:20:32.320
Doing slice and enumeration patterns is particularly inefficient, creating multiple intermediate arrays that are never used.
00:20:39.200
So what can we do to fix these issues? Start with literals; embrace constants.
00:20:45.680
If you have data that’s mostly constant, convert those literal strings into constants and freeze them.
00:20:52.160
Most VMs excel at optimizing these, leading to immediate performance improvement.
00:20:59.440
Cache your most common interpolated strings and regular expressions.
00:21:05.200
Usually, only a small number of frequently used values cause unnecessary allocations.
00:21:13.120
Study memory profiles! While tools for Ruby aren't always perfect, you'll be able to find issues using JRuby.
00:21:21.440
Fixing concatenation and copying will also improve performance, even if it leads to trade-offs regarding thread safety.
00:21:29.120
Fix enumeration chaining by condensing into fewer steps; avoid the chain in the first place.
00:21:40.320
As an alternative, consider the loop. You’ll often find better results by creating a simple loop over values with necessary changes.
00:21:46.560
What do you get from making these changes in your application?
00:21:54.240
The case study I know best is JRuby's performance improvements.
00:22:01.680
We run on the JVM, giving us all its optimizations for free, allowing JRuby's performance to improve.
00:22:09.600
We’ve been applying these optimizations to the JRuby codebase since its start; everything I’ve shown you here involves similar strategies.
00:22:18.080
What if we just let the JVM developers do the work? We’d see JRuby’s performance improve significantly.
00:22:26.240
When beginning with Java 1.4 back in the early days, JRuby's performance improved by 300% just thanks to JVM progress.
00:22:33.760
By doing nothing since 2006, we would have surpassed Ruby 1.8 simply due to their efforts.
00:22:40.480
How do we glean improvements from the JVM?
00:22:47.840
We convert Ruby source into our AST, compiling it into JVM bytecode.
00:22:54.160
The JVM bytecode runs through its interpreter, eventually turning it into native code, leading to faster execution.
00:23:03.120
The JVM can improve native code over time, allowing cacheable calls and avoiding common pitfalls.
00:23:10.640
JRuby evolves, leveraging the JVM for many benefits, as seen in our evolution over time.
00:23:17.840
Working on JRuby on OpenJDK 8 has yielded an eightfold improvement in performance.
00:23:25.920
Looking at the JVM’s optimization strategies and our improvements, we can show significant gains after applying strategies effectively.
00:23:34.560
We've continually seen performance levels skyrocket by refining internal implementations and caching strategies.
00:23:42.400
For instance, invoking dynamic allows the JVM to better understand Ruby's requirements.
00:23:50.560
With dynamic calls being inlined, less overhead is seen, helping improve performance across the board.
00:23:58.560
The payoff lies in how improvements can lead to faster execution, better memory management, and resource optimization.
00:24:06.240
This leads us to the ultimate lessons learnt: performance doesn’t just happen. Don’t sit down and wait for your applications to get faster.
00:24:14.080
You really need to understand how Ruby gets optimized.
00:24:21.760
Deeply dive into memory profiles, so you understand where allocations are happening.
00:24:28.560
Ultimately, don’t be hindered by the myth that Ruby is slow—there’s plenty of work being done.
00:24:36.720
You can make strides in your programs to facilitate Ruby’s speed.
00:24:44.640
All right, thank you!