Parallelism

Summarized using AI

Keynote: Ruby 2020

Evan Phoenix • December 11, 2015 • Chuo-ku, Tokyo, Japan

In the keynote titled "Ruby 2020", delivered by Evan Phoenix during RubyKaigi 2015, the focus is on the technical advancements necessary for Ruby to achieve significant performance improvements, particularly with the target of Ruby 3 being three times faster than Ruby 2. Phoenix draws from his extensive background with Ruby and related projects while establishing the need for a Just-in-Time (JIT) compiler as a crucial component for enhancing Ruby's performance.

Key Points Discussed:

  • Introduction and Credentials: Phoenix introduces himself and expresses gratitude towards the RubyKaigi organizers. He shares his journey in Ruby development, mentioning projects like Rubinius and Puma.
  • Historical Context: Reflecting on past attempts, Phoenix talks about his earlier work on Ruby, specifically a project called Sydney, which introduced native threads but was never merged into CRuby due to its size.
  • Performance Goals: He discusses Matz's vision for Ruby 3 to be significantly faster and raises questions about the community's approach to achieving this goal.
  • Defining Performance: Two aspects of performance are outlined: parallel execution and efficiency in code, stressing the complexity involved in achieving true parallelism in Ruby.
  • Micro vs Macro Optimizations: Phoenix advocates for macro-optimizations rather than relying only on micro-optimizations, emphasizing the limitations of small, incremental changes.
  • Importance of JIT: He argues that a JIT compiler is pivotal for Ruby's speed improvements, providing examples of how a JIT can radically speed up code execution.
  • Historical Research: Citing the language Self and its innovations in JIT compilation, he shows the principles still relevant today, and discusses how similar strategies can be applied to optimize Ruby.
  • Implementation Approaches: Phoenix reviews various JIT techniques including tracing JITs and method JITs, detailing the advantages and challenges inherent in each.
  • Integration with CRuby: He proposes integrating JIT capabilities into CRuby while cautioning against the pitfalls of losing visibility into the operations of core methods, suggesting a 'Canary' approach for profiling.

Conclusion and Takeaways:

  • Phoenix concludes with a call to action for the Ruby community to work collaboratively towards integrating JIT capabilities into CRuby, leveraging existing structures and focusing on practical solutions that align with the language's philosophy.
  • He emphasizes the significance of turning ideas into reality for Ruby's future, advocating for systematic changes to achieve the lofty performance goals set for Ruby 3. The keynote serves as both a technical discussion and a motivational push towards innovation within the Ruby community.
00:00:04.730 When Akira asked me to speak, I inquired if he had a specific topic in mind. He mentioned something very technical, and I agreed to it. Therefore, this talk is going to be quite technical.
00:00:12.059 I would like everyone to raise their hand at the end of this session. I might lose some of you at some point. As Akira mentioned, my name is Evan Phoenix, and I go by Evan PHX in most places on the internet. I have created many Ruby-related projects, such as Rubinius, Puma, and Benchmark IPS, which I was pleased to see many people using in their presentations.
00:00:26.090 I am also involved with RubyGems and, by extension, a committer to CRuby. Additionally, I am the CEO of a small startup called Vectra, the Director of Ruby Central, and I function as the accountant for rubygems.org, which means I handle the bills.
00:01:03.750 I live in Hollywood, and this is what it looks like from my backyard. I enjoy spending time there with my two daughters and my wife. It is a great honor for me to be here giving the final keynote, so I want to extend my gratitude to all the organizers. Everyone, please join me in giving them a big round of applause for a fantastic RubyKaigi.
00:01:42.090 Since I’m giving a technical talk, I felt it necessary to list my credentials. Many years ago, I forked Ruby and created a version called Sydney, which introduced native threads. I built a fully parallel implementation of Ruby with a dynamic JIT and Rubinius.
00:02:01.869 In fact, Koishi-san reminded me of Sydney just today; I had nearly forgotten about it. I thought to myself, am I really getting that old that I forgot? Then I realized I came across my original email discussing my fully parallel Ruby implementation. I couldn’t believe it had been ten years since I wrote that.
00:02:42.340 I found it fascinating to read through the mailing list threads from ten years ago. I even noticed Noble-san being very kind and supportive. In fact, he, unknowingly, saved the only copy of the implementation of Sydney among the mailing list messages. I found a URL today that still references that, and, coming to this conference, it reminded me how much I had missed.
00:03:01.510 Looking back, I was struck by how large my patch was—around 500 kilobytes— which likely contributed to the reason it was never merged into CRuby. If you plan to send a patch, remember to keep it concise. Matz has mentioned several times, both here and at RubyConf, that Ruby 3 will be three times faster than Ruby 2. At RubyConf, I first heard this, and it sparked both excitement and a question in my mind.
00:03:45.639 How can we collectively accomplish that goal as a community, as Ruby core members, and as individuals who love Ruby? This consideration greatly influenced my thoughts when Akira asked me to engage in some technical discussions.
00:04:41.810 So, what do I hope all of you will take away from this talk? My goal is to convince you that CRuby needs a just-in-time (JIT) compiler and that we cannot achieve the speed improvements of Ruby 3 without one. Furthermore, now is the ideal time to develop it. We have explored various approaches in the community, and it’s time to take action.
00:05:01.010 In the first part of my talk, I would like to discuss performance. What is performance exactly? It’s crucial to understand that when someone states, 'Ruby is three times faster,' we must have a tangible understanding of what performance means. Generally, performance can be boiled down to two aspects.
00:05:38.210 One is doing the same amount of work but executing it in parallel—for example, handling three web requests simultaneously instead of sequentially. The second aspect involves accomplishing the same task using one-third of the code, both scenarios can be considered three times faster.
00:06:09.169 Matz has discussed both parallelism and algorithms extensively, so let’s start with parallelism. In CRuby, achieving parallelism is significantly more complex than merely reducing the code. Having written a 500-kilobyte patch myself, I can assure you of that.
00:06:36.589 The core library of CRuby, which includes the numerous methods that most users rely upon, like those available in Array, String, Enumerable, and Hash, was not designed for concurrent execution. While those library methods perform essential tasks, they function on the premise that they are the sole code modifying data. Changing this is a monumental task.
00:07:27.549 Matz has proposed various mechanisms for adding parallelism, like actors that do not share memory. These ideas may feel very foreign to Ruby programmers. While Ruby has the capacity to evolve, the proposed changes would introduce complexities and make comprehension difficult for most Ruby developers.
00:08:08.109 Moreover, these modifications would require alterations to much of the core library, thereby confusing regular users. Several times at RubyKaigi, we have discussed improving Ruby's speed through micro-optimizations—incremental performance improvements that can lead to better efficiency.
00:08:52.660 These small optimizations are beneficial, but I fear that relying solely on them will slow our progress towards Ruby 3. If we only concentrate on micro-optimizations, increments will be minimal and scattered throughout the code.
00:09:12.800 However, starting with micro-optimizations is still a good approach. There are a number of easy improvements that Ruby core developers should consider, such as addressing the numerous missing caches within the core library. For instance, every time you encounter a call to the 'rb_funcall' function, which appears 364 times within the core library, it is slow, and every one of those functions could benefit from caching.
00:09:38.210 Unlike micro-optimizations, I argue that we need macro-optimizations. These optimizations must apply broadly and lead to more significant performance gains. To facilitate these large performance improvements, we might need to develop more innovative solutions.
00:10:09.360 The JIT compiler could be one such innovation. The JIT allows us to execute less code, and I believe it is the best option for achieving a threefold speed increase. By utilizing caching effectively, we can better understand our code's operations, emit superior code, and improve performance.
00:10:31.710 Some may argue that Ruby's dynamism makes it difficult to implement a JIT. However, I contest that Ruby is no more dynamic than languages like Java or Smalltalk. The low-level components of those languages are equally dynamic. When discussing Ruby's dynamism, we often think of its ability to change at runtime. Still, most method calls in typical code do not utilize polymorphism.
00:11:23.140 Allow me to demonstrate using an example. Here is a simplistic piece of code that performs a basic operation, which I certainly do not recommend for production use, but it suffices for our purposes today. The beauty of a JIT compiler is that it performs optimally in the same manner a human would analyze it.
00:12:02.870 Everyone can observe that we are using 'fetch_rate' and the method returns a constant. Therefore, the JIT can replace the method call with the constant value it returns. This type of optimization is crucial. A particular piece of code that originally required several method calls and mathematical operations can now execute nearly instantaneously, making it a thousand times faster than the previous implementation.
00:12:56.310 These drastic performance improvements are what we need to reach that ambitious 3x target. You might be wondering how we can achieve this: How exactly do we build a JIT compiler?
00:13:45.640 There is substantial research surrounding this topic, so let me briefly summarize it for you. I will begin with a language known as Self, created by the brilliant programmer Dave Ungar in 1987, just before Ruby was introduced. Self was the first language to implement a JIT compiler, operating on the principle of method calls without local or instance variables.
00:14:46.440 The speed of Self stemmed from the JIT's ability to eliminate many of its dynamic features during execution. This included innovations like polymorphic inline caches, which sped up method dispatching. Research about Self is foundational to many modern dynamic language optimizations, proving over thirty years ago that a dynamic language could be fast.
00:15:32.010 The lessons learned then continue to inform our approaches today, showcasing that we can remove dynamic aspects at runtime as a technique to enhance performance. Another relevant language is Strongtalk, which extended the ideas presented in Self by introducing a type hierarchy. Interestingly, while Strongtalk implemented types, they paradoxically slowed down execution because of the required runtime checks.
00:16:26.780 The key takeaway from Strongtalk's history emphasizes that types don’t inherently enhance performance; rather, optimizing requires other mechanisms, most notably a JIT compiler. This methodology was further advanced with the introduction of the HotSpot VM by Sun Microsystems, which was also developed from the foundations various dynamic languages, demonstrating that types can be beneficial for readability but can hinder raw performance.
00:17:18.150 Fast forwarding, let’s consider V8, a remarkable JavaScript engine found in Google Chrome. A notable figure involved in its creation, Lars Bak, also worked on Strongtalk and Self. This connection illustrates how insights and methods are carried over as programmers continue to build upon past findings.
00:18:12.400 V8 employs optimizations like hidden classes for efficient property access, showcasing how the use of tiered compilers can enhance runtime performance. The existence of a JIT compiler in dynamic programming languages is key to serious performance enhancement, which the current Ruby implementation lacks.
00:19:10.790 As we build the steps toward developing our JIT compiler for Ruby, there are several approaches we can take. The three primary techniques available are tracing JITs, method JITs, and partial evaluation. Each technique has its strengths and weaknesses that need to be taken into consideration.
00:20:08.240 Tracing JITs activate when a section of code is executed frequently enough and works to streamline it by eliminating branches. However, the number of paths within any code can grow exponentially, which complicates trace management. It's key to note that the more often methods are called can lead to degraded performance if we aren't careful. Method JITs compile the entirety of a method into machine code, allowing for consistent execution.
00:21:41.680 However, methods that exceed a certain size can lead to longer compilation times, which increases the complexity of execution. Partial evaluation is another technique; it compiles pieces of abstract syntax tree representations without needing to create full methods. While promising, it lacks the extensive research background found in tracing and method JITs.
00:22:18.090 As we explore these methods for implementing our JIT, we should also be wary of how much visibility we have into the code we’ve written. CRuby’s core is written in C, which creates a limitation because it does not expose much about its internal workings to the compiler.
00:23:09.700 The core library’s numerous methods and structures act as black boxes that hinder effective JIT use in performance enhancement. Therefore, I propose what I call the 'Canary' approach. Just as a canary in a coal mine serves as an indication of danger, implementing specific code for tracking purposes can provide insights into whether the JIT is executing as intended.
00:24:07.640 We need to ensure that our JIT compiler can effectively optimize the operations of Ruby core library methods. The integration must be smooth enough that the profiling added doesn't disrupt the developer experience of writing Ruby.
00:24:56.600 So, why not try to integrate existing community efforts, such as Rubinius? While Rubinius has its merits, it lacks the performance output desired to meet speed expectations. Moreover, transitioning to JRuby brings along concerns with the Java Virtual Machine's usability and real-time performance.
00:25:47.680 This leads us back to an important question: Why should we cling to the existing C methods? Simply ignoring them and compiling Ruby code isn’t sufficient because numerous interactions occur at that level and each influences performance. Hence, even if we focused solely on optimizing Ruby code, we would fall short of achieving that performance target of Ruby 3.
00:26:38.040 Therefore, my proposal is centered on refining the current CRuby core library. Instead of starting over, we should build upon the existing foundation, converting core library C code into LLVM bit code, which is intermediate representation, allowing for more dynamic optimization.
00:27:16.500 Translating C code into a representation usable by a JIT can unlock optimizations not currently available, removing restrictions imposed by static methods. This wouldn't require fundamental changes in how the core library operates, making it easier and faster to roll out updates.
00:28:17.400 This effort aligns with the pragmatic philosophy espoused by Matz. While there can often be a desire to avoid change, moving thoughtfully towards our goals ensures stability in the ecosystem while improving performance.
00:29:30.040 In conclusion, reaching the Ruby 3 goal of a 3x speed improvement is ambitious yet achievable. This journey will require systematic changes and dedication to building within our existing strengths. I encourage each of you to think deeply about how we can collectively harness the power of a JIT compiler.
00:30:30.740 The possibilities ahead are vast, and if we commit to progress and innovation, we’ll enhance Ruby’s capabilities. Each day more people and businesses rely on Ruby, serving billions through their applications. With concerted efforts, we can ensure Ruby continues to thrive and evolve.
00:31:18.099 Let us commence work on realizing these improvements for CRuby as we progress towards Ruby 3. Thank you all for your time and support!
00:49:44.510 Thank you for your attention. If there are any questions, please feel free to ask!
Explore all talks recorded at RubyKaigi 2015
+47