RubyKaigi Takeout 2021

Parsing Ruby

Since Ruby's inception, there have been many different projects that parse Ruby code. This includes everything from development tools to Ruby implementations themselves. This talk dives into the technical details and tradeoffs of how each of these tools parses and subsequently understands your applications. After, we'll discuss how you can do the same with your own projects using the Ripper standard library. You'll see just how far we can take this library toward building useful development tools.

RubyKaigi Takeout 2021: https://rubykaigi.org/2021-takeout/presentations/kddnewton.html

RubyKaigi Takeout 2021

00:00:00.000 Hello, my name is Kevin Newton and this talk is about the history of parsing Ruby.
00:00:04.000 Back in the early days, Ruby 0.6 was released in 1994. Matt Matz released the first version that I could find a change log entry for.
00:00:06.480 The next version that I have listed here is Ruby 0.76. It's been a full year and Matz is at this point introducing new syntax with its first breaking change.
00:00:10.240 Hashes didn't used to have a literal syntax, and at this point, they are referred to as 'dicks', much like Python. This version introduces braces, which was a breaking change because arrays used to be created with braces.
00:00:14.719 At this point, arrays were created with brackets, much like modern Ruby. A year later, Ruby 0.95 is released, and there are some interesting updates. The first entry mentions optional parentheses on method calls. The second entry includes an amusing anecdote that 'rescue' was previously spelled 'r-e-s-q-u-e'. At this point, Matz corrected this typo that had been in place since the beginning.
00:01:20.000 Ruby was becoming ready for the 1.x series by Christmas of 1996 when Matz released Ruby 1.0.0. The versioning at this point wasn't semantic; instead, it was dated. Many of the changes being introduced were making Ruby look more like the modern Ruby we know and less like C++. For example, the operator designating the superclass within a class definition changed from a colon, as in C++, to the less-than symbol that we know today.
00:01:40.320 The keyword 'continue' used to exist in Ruby, but it was renamed to 'next', as we know it today. One year later, Matz releases Ruby 1.1.0, which adds syntax for singleton classes and introduces new options for regular expressions. Ruby 1.1.0 brought options 'n', 'e', and 's' as optional suffixes, which added encoding support for regular expression literals, an interesting inclusion at a time when other languages weren't providing solid encoding support.
00:02:01.680 The next version after the 1.1 series is Ruby 1.3.0, which was a developer release bringing a couple of fascinating changes. The 'else' keyword is added to the begin-rescue clauses, allowing code execution if an error isn't thrown. Additionally, the '<<-' syntax is introduced for indented here-docs, which previously all had to start in the leftmost column. Ruby 1.2.0 was released the following day with support for here-docs, block comments using starts with '=begin' and '=end', and the introduction of the keywords 'true' and 'false' for the first time.
00:02:32.120 Prior to this, these keywords did not exist. Ruby 1.5 quickly followed with compile-time support for string concatenation, enabling adjacent string literals to concatenate before being evaluated. Ruby 1.6 was released a year later, adding the ability to use rescue as a modifier, similar to if, unless, until, and while.
00:03:04.640 One major project during this time was the writing of the Ruby Pickaxe book by Dave Thomas and Andy Hunt, which is still the seminal text for learning Ruby today. They created a project called NodeDump, which was a tree-walker interpreter that produced a human-readable format for all Ruby internals. This external project interacted with Ruby's internal node structure for the first time, spawning other projects interested in this subject. Soon after, a project called Rake, standing for 'Ruby Under the Hood', was created as a C extension that also interacted with the node structure, allowing access to tree nodes programmatically.
00:03:49.600 In that same period, Ruby 1.7 was released, adding new features to the language. The 'break' and 'next' constructs now accepted values for their escaping, and percent-w array literals can now escape spaces. With rising interest in Ruby, one of the individuals involved began working on the JRuby project, which aimed to port Ruby 1.6 and 1.7 directly into Java. The project would rewrite Ruby's grammar file, adapting the actions into Java syntax. JRuby continues to exist today, currently implementing Ruby 2.6 syntax.
00:05:04.000 During this time, another interesting project, Ripper, was created as an entirely separate project distinct from Ruby core. Ripper takes the parse.y file and various header files, eliminating the actions to create a streaming parser style to better dispatch parser events. Ripper is still using Bison as a parser generator and is indicative of a period where Ruby began to diversify. Interestingly, even today, Ripper is still noted as an early alpha version, despite its wide adoption across projects.
00:07:13.760 Another project I wanted to mention is Meta Ruby, which was not directly focused on parsing Ruby. Nonetheless, it had some interesting tangential aspects as it attempted to implement Ruby in Ruby. One part of this was releasing a schema for Ruby's Abstract Syntax Trees (AST), comparable to an XSLT for validating XML. This effort influenced various other projects as they progressed.
00:08:15.920 In 2003, Ruby 1.8.0 introduced capital %w word lists, allowing for interpolation into the members of a word list. Nested constant assignment also improved in this version. Ruby 1.8 became a significant version as it contained many foundational features present in more modern Ruby versions. Shortly before 1.9's release, two noteworthy projects emerged: ParseTree, led by Ryan Davis, which delved into Ruby's internals to build out an abstract syntax, and RubyNode, which provided another interpretation by accentuating the actual node structs.
00:09:55.040 As Ruby continued to evolve, the introduction of Ruby 1.9 represented a pivotal moment, transitioning from Yak to Bison for its internals. Various new parsing adjustments, among others, were made, and multiple projects struggled to adapt to these changes as core Ruby pivoted towards new techniques. Ripper became a more profound feature within Ruby's standard library, and significant syntactic changes, such as lambda literals and symbol hash keys, were also introduced.
00:11:31.680 In 2009, Ruby 1.9.1 was released as the first stable version from the 1.9 series, implementing the encoding pragma to formalize encoding as a first-class citizen in the language. This release addressed several syntax features introduced through the application of pattern matching, single-line methods, and the new pragma for rapid structures. This continued into what would soon become the Ruby 2.x series, introducing refinements and allowing named parameters.
00:12:57.120 The evolution of Ruby facilitated a flourishing community behind development tools and languages intended to extend Ruby's capabilities. Language server protocols gained traction, advancing language infrastructure and altering how Ruby applications were developed, alongside tooling like Solargraph that uses available tools for Ruby introspection. Other projects emerged utilizing alternative parser approaches while targeting compliance with Ruby's feature set.
00:14:09.760 As the Ruby ecosystem adapted, tools like Sorbet began incorporating type systems while simultaneous advancements in the Ruby VM Abstract Syntax Tree became noteworthy. Additionally, further releases such as Ruby 2.6 introduced refinements and various reduction allocations, building on the foundation laid by earlier updates. All these developments culminated in Ruby 3, where keyword arguments evolved significantly and pattern matching became a core concept.
00:15:39.680 Ultimately, the language has grown complex with numerous paths forward, and it becomes increasingly difficult to maintain full compatibility across implementations. Each new syntax change requires updates not just to Ruby itself, but across myriad projects dependent on Ruby. Understanding the intricate logic behind Ruby's parsing is crucial for future directions, but each effort to standardize necessitates community investment to keep these essential tools robust and functional. Language servers and their integration into the Ruby ecosystem enhance the language's usability and development features.
00:17:39.040 In conclusion, the history of Ruby and its parsing evolution suggests a continuous need for community engagement and investment into the tools vital for developer productivity. This will ensure the Ruby ecosystem remains capable of building sustainable applications and powerful development tools necessary for modern programming needs.
00:18:13.600 Thank you for your attention.