Ruby3x3: How are we going to measure 3x?

In this presentation titled "Ruby3x3: How are we going to measure 3x?" at RubyKaigi 2016, Matthew Gaudet discusses the complexities involved in benchmarking Ruby's performance as the community aims to achieve a threefold increase in speed from Ruby 2.0 to Ruby 3.0. Gaudet underscores that to effectively measure performance improvements, it is essential to first clearly define what metrics will be measured and the methodologies employed.

Key points include:

- Understanding Benchmarking: Gaudet highlights benchmarking as a mix of art and science, emphasizing that while it aims for objectivity, subjective judgment calls significantly influence the outcomes. Benchmarks can vary from micro-benchmarks (small code snippets) to full application benchmarks, each with its respective pros and cons.

- Performance Metrics: The main metrics under consideration include wall clock time, CPU time, throughput, and latency. Gaudet stresses the importance of measuring speedup, which is the ratio of experimental measurements compared to established baselines.

- Common Pitfalls: Several common pitfalls in benchmarking are identified, including using non-Ruby-like code, failing to account for garbage collection, and preparing inappropriate input data. Gaudet explains that the type of data used for benchmarks can materially affect their validity.

- Statistical Methods in Benchmarking: To address run-to-run variance in benchmarking, multiple executions and the use of statistical measures like confidence intervals are recommended.

- Garbage Collection Considerations: Since Ruby utilizes garbage collection, the application’s heap size plays a critical role in performance, affecting how fast and efficiently code executes.

- Trade-offs in Evaluation: Performance improvements often come with trade-offs in other areas, such as memory usage; thus, a balance must be strived for in benchmarks.

- Future Plans for Ruby 3x3: Gaudet proposes selecting nine focused application kernels as benchmarks that will effectively encapsulate performance goals for Ruby 3x3. These would ideally reflect CPU-bound operations and incorporate benchmarks relevant to start-up performance and garbage collection.

In conclusion, Gaudet emphasizes that accurate performance measurement will be vital in understanding Ruby's evolution, advocating for a systematic approach to benchmarking that reflects community goals and aids in future developments. He encourages the Ruby community to engage with this process to ensure benchmarks can drive meaningful performance gains.

For further exploration of benchmarking practices, Gaudet invites attendees to visit the Evaluate Collaboratory website for additional resources and guidance.

Ruby3x3: How are we going to measure 3x?
Matthew Gaudet • September 08, 2016 • Kyoto, Japan

http://rubykaigi.org/2016/presentations/MattStudies.html

To hit Ruby3x3, we must first figure out **what** we're going to measure, **how** we're going to measure it, in order to get what we actually want. I'll cover some standard definitions of benchmarking in dynamic languages, as well as the tradeoffs that must be made when benchmarking. I'll look at some of the possible benchmarks that could be considered for Ruby 3x3, and evaluate them for what they're good for measuring, and what they're less good for measuring, in order to help the Ruby community decide what the 3x goal is going to be measured against.

Matthew Gaudet, @MattStudies
A developer at IBM Toronto on the OMR project, Matthew Gaudet is focused on helping to Open Source IBM's JIT technology, with the goal of making it usable by many language implementations.

RubyKaigi 2016