Penelope Phippen

Summarized using AI

Building Rubyfmt

Penelope Phippen • August 21, 2020 • online

In this talk titled "Building Rubyfmt," Penelope Phippen discusses the development of Rubyfmt, a Ruby autoformatter she is creating. The primary focus is on how Rubyfmt differs from RuboCop, which is a popular tool for code linting and style checking in the Ruby community. The presentation highlights both the technical challenges involved in building Rubyfmt and its overarching goals.

Key Points Discussed:

- Introduction of Rubyfmt: Rubyfmt is an autoformatter designed to automatically format Ruby code with cleaner syntax, transforming poorly written code into semantically equivalent, well-formatted versions upon saving in the editor.

- Integration with Text Editors: Rubyfmt is designed to work seamlessly within editors such as Vim, Emacs, and Atom, with plans for VS Code support. The core philosophy is to allow formatting during the editing process rather than in CI/CD pipelines.

- Performance: Rubyfmt processes a 2,500 line file in about 60 milliseconds and can manage up to 30,000 lines of Ruby code per second, ensuring minimal disruption to the development workflow.

- No Configurations: Unlike RuboCop, Rubyfmt has no style-related configurations to promote a uniform formatting style. This choice minimizes disputes over formatting styles during code reviews and improves consistency across projects.

- Design as a Unix Tool: Rubyfmt operates like a Unix tool, accepting standard input and output, allowing it to integrate easily with other software tools and commands.

- Not a Gem: Rubyfmt is not intended to be a Ruby gem; instead, it will be distributed via alternative channels like GitHub releases due to its operational design.

- Technical Complexity: The talk outlines the complexities of Ruby’s syntax, which complicate the building process of Rubyfmt. The integration of Rust into the project improves performance, but the intricate nature of Ruby necessitates rigorous challenges and adjustments.

- Community Involvement: Phippen expresses the importance of community feedback in refining Rubyfmt’s features and addresses the need to adapt quickly to Ruby's evolving standards.

Key Examples and Case Studies:

- Phippen discussed the impact of RuboCop and how it shifted discussions in code reviews without resolving underlying nitpicking issues. She articulated the benefits of Rubyfmt in promoting a less hierarchical coding culture.

- She touched on the performance benefits witnessed by using Rust in combination with Ruby programming to facilitate smoother operations with less operational overhead.

Conclusions and Takeaways:

- Rubyfmt aims to set a standard for Ruby code formatting that reliably produces clean code while minimizing disputes over configuration preferences.

- It serves as a solution to enhance productivity and collaboration by eliminating nitpicking around style differences in coding practices.

Overall, Penelope Phippen’s talk showcases her commitment to delivering a high-quality Ruby autoformatter while addressing the community’s needs and feedback.

For anyone interested in enhancing their Ruby coding experience, following Rubyfmt's development via GitHub and providing constructive feedback will be key for its evolution and success.

Building Rubyfmt
Penelope Phippen • August 21, 2020 • online

In this talk I'll discuss Rubyfmt, a work in progress Ruby autoformatter. Why I'm building it, and how it differs from Rubocop, the closest similar tool that you might be familiar with. I'll get deep in to the weeds on some of the technical challenges of building a system like this, and what the overall goals for the project are.

Penelope Phippen makes Rubyfmt, and was previously a lead maintainer of the Rspec testing framework. She's been writing Ruby for just about a decade, and still remembers writing Ruby for 1.8.6.

Welcome to the #NoRuKo conference. A virtual unconference organized by Stichting Ruby NL.

#NoRuKo playlist with all talks and panels: https://www.youtube.com/playlist?list=PL9_A7olkztLlmJIAc567KQgKcMi7-qnjg

Recorded 21th of August, 2020.
NoRuKo website: https://noruko.org/
Stichting Ruby NL website: https://rubynl.org/

NoRuKo 2020

00:00:01.360 Welcome back! Hey, and we're back.
00:00:07.440 So, before our next talk, we have a fun interactive game for you all.
00:00:15.759 As developers, we all have opinions, and our sponsor, AB Signal, has created these really fun interactive cards called Developer Dilemmas. You will see a link to one of the Developer Dilemma cards right in the chat.
00:00:30.160 Basically, you can pick your favorite: What kind of developer are you? Are you a 'move fast and break things' person or a 'strive for zero errors' kind of developer?
00:00:36.559 Jump in and cast your vote! But wait, don't go anywhere just yet—don't touch that dial!
00:00:48.160 We still have Penelope's talk coming up, and I'm sure Ramon can tell us a bit more about her.
00:00:59.760 Please, Ramon, absolutely! I'd be happy to.
00:01:06.280 So, Penelope is a director at RubyCentral.org, which, among other things, organizes RubyConf and RailsConf. She has also led the development of RSpec and has worked on one of the major testing tools for Ruby.
00:01:18.000 She has been writing Ruby for just about a decade, dating back to around the time of Ruby 1.8.6.
00:01:23.360 She's going to tell us all about Ruby Format, so please, Penelope, take it away. The stage is yours.
00:01:30.640 Absolutely! Thank you so much, and I see you have my slides. Cool! Alright, let's go.
00:01:38.159 Hi everyone, this is Building Ruby Format. Let's get started. To build on that introduction slightly, my name is Penelope Phippen, and I go by Penelope Zone almost everywhere on the internet.
00:01:50.560 I use she/her pronouns, and as mentioned, I'm a director at Ruby Central. Ruby Central is a 501(c)(3) non-profit organization in the United States that organizes RubyConf and RailsConf.
00:01:58.479 Every year, we put on those two largest conferences in our community. We have RubyConf coming up in just a little while. The CFP for RubyConf is currently open, and like every conference this year, it will be entirely virtual.
00:02:13.040 So, if you have not yet submitted a talk to RubyConf, I highly encourage you to do so. This is a great way to get started, especially since you don't have to travel for this large event.
00:02:32.000 Along with my role at Ruby Central, I will also be programming and sharing RailsConf next year. It's not yet clear whether RailsConf in April will be virtual or in-person, but I encourage you to watch out and submit when the CFP opens.
00:02:53.680 Another thing I wanted to mention before I get too far into this is that this slide deck I have been building crashed Keynote this morning when I was working on it—an update occurred or something.
00:03:11.840 So we're all going to pray that computers work this morning, and hopefully everything will be fine. With that preamble out of the way, let's get into it and talk about what Ruby Format actually is.
00:03:19.120 The easiest answer to that question is that Ruby Format is a Ruby auto-formatter. I could finish the talk here, and that would answer the question, but it's not super useful.
00:03:30.319 Instead, let’s go straight to a demonstration. What you see here is me typing some Ruby, and the thing that you will probably notice is that this Ruby code looks nothing like the Ruby code you would write on a daily basis; it's really bad.
00:03:47.760 However, after saving the file, the code snaps into place, and this is Ruby Format working in the background.
00:03:53.200 Ruby Format is designed to sit in your editor, watch for changes, and every time you save your file, overwrite it with a formatted version.
00:04:05.519 In other words, Ruby Format is a program that consumes Ruby source code, transforms it, and outputs semantically equivalent Ruby source code with cleaner formatting. That's the core idea.
00:04:18.799 Ruby Format is a program that transforms Ruby programs into equivalent ones with cleaner formatting. Beyond that, it is also designed to work within your editor. One of my core philosophies is that formatting should happen while you're editing, not as a back-and-forth in CI.
00:04:39.360 You should be able to see the changes in almost real-time, and from the very beginning, Ruby Format was designed to be a program that lives in your text editor. Currently, we have strong support for Vim, Emacs, Atom, and somewhat questionable support for Sublime.
00:04:59.919 As for VS Code, if you are a VS Code or Sublime user, please come to the Ruby Format issue tracker, test it out, and let us know whether it works for you. I'd really love to release this with full support for all of the text editors.
00:05:20.400 Another important aspect of Ruby Format is that it is fast—it's really, really fast. This is a necessary feature. If you are embedded in someone's text editor, you don't want to save the file and then wait an appreciable amount of time before resuming editing.
00:05:35.199 As it stands today, if you run Ruby Format over a 2,500 line file, it takes 60 milliseconds to complete. It will finish faster than you can notice that anything has happened.
00:05:54.560 This is powerful because it enables the format-on-save workflow I demonstrated earlier. The interesting thing about designing this performance as a feature is that on larger applications, you can run Ruby Format on your entire project in very little time.
00:06:07.360 At its peak, Ruby Format can handle about 30,000 lines of Ruby code per second. This means that inserting Ruby Format into even very large Ruby applications as a CI pass will barely take any time.
00:06:28.240 It will enable you to use it in your editor and your CI, which is a really cool and powerful benefit.
00:06:39.360 Ruby Format also has no style-related configuration. Unlike RuboCop, there's no RuboCop YAML, and there are no flags to change how formatting works. Ruby Format has one style—that is the style—and you cannot change it without altering the code of Ruby Format.
00:06:56.440 This decision leads to a very simple program, which I'll explain more in a bit.
00:07:14.000 Ruby Format is also designed to work like a Unix tool. It can consume standard input and output to standard output, and it includes flags for changing modes of operation.
00:07:24.800 By designing Ruby Format as a Unix tool, it can integrate with many other tools because text and pipes are a universal interface.
00:07:39.360 For example, if you say 'ruby format filename,' it will output to standard out, allowing you to chain it with other commands.
00:07:43.679 We have flags for in-place operations, and it works with directories, and so on. It was built from the beginning to be one of those simple Unix tools that do one thing well.
00:08:05.280 One notable aspect is that Ruby Format is not a gem—it will never be a gem. This has been a point of controversy, as many Ruby developers want to access their tools via Ruby gems.
00:08:16.000 However, Ruby Format is a command-line program that cannot be loaded into a Ruby project as a Ruby library, making it unreasonable to distribute as a Ruby gem. I don't want you to 'gem install ruby format'; instead, there will be alternate distribution channels available.
00:08:42.480 Today, we currently use GitHub releases for distribution. You may be thinking, 'Great, Penelope! I really want to use this Ruby auto-formatter that my team has been looking for. Can I have it, please?' Unfortunately, the answer is no.
00:08:56.960 While it's less of a strong 'no' than it was last time I spoke at Brighton Ruby, Ruby Format is in a state where you could download and play with it if you don't mind some sharp edges. Currently, it won’t break your code anymore. Previous versions could transform your code and produce invalid Ruby files or Ruby files that behaved differently; of course, that's not what you want from a tool like this.
00:09:27.920 So we fixed that by extensively testing it. Today, Ruby Format runs CI not only on small units of Ruby code but also on entire open-source repositories, including RSpec and some of Thoughtbot's Rails applications.
00:09:46.399 This allows us to push the boundaries of testing, giving us a high degree of confidence that if you run Ruby Format on a Ruby program, it won't break it. It will either fail to format it or produce a new Ruby program that is semantically equivalent.
00:10:05.519 However, the problem remains that some output has sharp edges. In particular, Ruby Format today deletes your comments quite often. Many developers find comments very important, and although the formatter will format your Ruby nicely, it might delete all your comments.
00:10:31.440 Hence, I do not recommend using it for day-to-day production use just yet. You may be asking yourself, 'Penelope, why are you doing this? This seems like a really hard problem. Why would you decide to build a Ruby auto-formatter?'
00:10:40.640 To answer that, I want to discuss an idea I vaguely mentioned on the Greater Than Code podcast, which is the hierarchy of nitpicking. For those of you who have been around for a while, you may remember building Ruby before we had RuboCop.
00:11:06.720 When you would fire up a pull request, people would nitpick over code, debating whether to use the new hash syntax or the old one, or offering other small suggestions that didn't carry much weight but were argued from a place of consistency.
00:11:25.920 Then RuboCop came along. It allowed for automating some of those nitpicks, performance issues, but almost no one liked the RuboCop defaults. Consequently, every Ruby application I've worked on has a customized RuboCop configuration.
00:11:46.640 But customizing your RuboCop configuration simply shifts those nitpicks elsewhere. You can move them from pull requests to a place where the team agrees on them, executed by RuboCop's automated fashion.
00:12:06.720 This doesn't resolve the nitpicking problem either because code review inherently exposes social power dynamics around aspects such as gender, racial, and other minorities.
00:12:26.560 Data shows that men generally get their pull requests approved more quickly than women, and they experience fewer comments or nitpicks during review.
00:12:47.840 With RuboCop YAML, changes to that configuration can be easier for those with more power, and thus this creates an intersection between power dynamics and code review.
00:13:03.680 Building Ruby Format eliminates these arguments about configurations by requiring discussions in the open to decide the style for the Ruby language as a whole.
00:13:18.720 This project inadvertently came out of a conversation I had with a relatively junior female engineer. She expressed excitement for Ruby Format because it alleviated the need to challenge her senior colleague to improve their RuboCop configuration.
00:13:41.360 Now, the power dynamic exists between that individual and me, as the lead of the Ruby Format project. However, I feel that I am well-suited to lead these changes for the community, and this is the essence of the hierarchy of nitpicking.
00:14:01.440 Now, I'd like to briefly cover the existing solution space. Historically, no one has truly attempted to build a Ruby auto-formatter. The closest existing solution, RuboCop, isn't primarily designed for formatting.
00:14:21.920 RuboCop is an excellent Ruby linting tool, but formatting was not its core competency. It has a separate flag for fixing code, but it was not created from the ground up to accomplish this task, which is exactly what Ruby Format aims to do.
00:14:37.920 My objective with Ruby Format is to eliminate the cumbersome work of configuring RuboCop for formatting purposes and moving that responsibility to Ruby Format.
00:14:58.160 As you may gather, I believe in tiny, focused tools. Ruby Format only formats Ruby files; it cannot perform linting or check for problematic structures. Its singular function is formatting, and it does that quickly.
00:15:16.080 Additionally, Ruby Format has no style-related configuration, which means we must get it right the first time. I want to avoid huge formatting commits in the future, which is why I'm developing it so carefully.
00:15:36.640 Currently, our ambition is for Ruby Format to maintain one major version, which will always be 1.0. We'll establish that if you run the formatter, it will format everything and will not reformat any piece of code until it's altered.
00:15:52.720 To achieve this, we carefully consider our formatting decisions. I have reversed decisions along the way because I realized they were wrong after learning more about Ruby's syntax.
00:16:10.160 A perfect example is with trailing commas. I initially thought trailing commas minimized git churn on structures separated by new lines, but I encountered many cases where trailing commas were invalid in Ruby syntax.
00:16:25.120 Consequently, I opted to disallow trailing commas in Ruby Format, which simplified the code significantly. I believe this showcases that if you disagree with Ruby Format's decisions, please file an issue on GitHub, and we can discuss it.
00:16:45.680 The ultimate goal is to establish a minimal, consistent subset of the Ruby language. Essentially, we're compressing all possible Ruby into this minimal output.
00:17:01.440 Now, let's dive into some implementation details. Originally, Ruby Format was written entirely in Ruby, consuming a Ruby file, creating a parse tree using a library called Ripper, and generating tokens to output to a file.
00:17:17.200 The syntax trees are represented as arrays with a tag identifying what has been parsed. You can see this tag with its corresponding identifier, as well as any optional parentheses, in the output of Ripper.
00:17:36.960 And this process worked well for smaller files, but it was too slow for larger ones; formatting a 2000-line file took about 188 milliseconds.
00:17:50.080 To determine how fast is fast enough, we should consider modern computer performance: non-high refresh rate displays refresh about 60 times per second, or every 16 milliseconds.
00:18:05.440 Additionally, the fastest human reaction to a screen change is around 100 milliseconds, so we need to aim for a performance that fits within these constraints.
00:18:24.800 Using this knowledge, we assessed several approaches. For instance, running 'bundle exec rubocop' on a four-line file takes around 800 milliseconds due to about 425 milliseconds spent booting Bundler.
00:18:46.080 Thus, using Bundler was not a viable option, and the same goes for inheriting from RuboCop—both approaches were simply too slow.
00:19:03.679 We found that evaluating an empty Ruby program took about 75 milliseconds, but we needed to eliminate the loading process for Ruby gems.
00:19:19.440 By disabling Ruby gems, we reduced our boot time to just 25 milliseconds, leaving 75 milliseconds remaining to run the Ruby auto-formatter.
00:19:34.480 By running Ruby Format on an empty file without any additional overhead, it became efficient enough for a format-on-save workflow.
00:19:51.760 The remaining 188 milliseconds runtime posed a challenge, so I wondered: what if I wrote a Ruby auto-formatter in Rust? Surely that would be easy, right?
00:20:09.600 The answer was 'no.' The Ruby parser, known as Parse.y, is not separable from the Ruby interpreter; you need a booted Ruby interpreter to execute the Ruby parser.
00:20:25.440 This limitation forced us to integrate Ruby and Rust, allowing us to benefit from the performance of Rust while still utilizing Ruby’s parsing capabilities.
00:20:43.360 Initially, we sent a Ruby process that read the source file into a JSON blob, which the Rust program would then process.
00:21:01.840 Ultimately, this approach was inefficient. Now, instead of passing JSON, we simply pass a pointer to Ruby objects, since Ruby itself is written in C and integrates easily with Rust.
00:21:19.680 This allows Rust to navigate through Ruby objects by calling into the Ruby interpreter, which is a remarkable feat.
00:21:37.920 Sean Griffin, who is speaking later today, did much of the heavy lifting for this integration. It was a huge help.
00:21:51.200 Aside from performance reasons, working with Rust provides additional benefits. We can express all data types with static types, enabling us to understand how to work with the parse tree at compile time.
00:22:05.760 This significantly reduces mistakes in our implementation and is essential for the 60-millisecond performance number.
00:22:22.720 So, you might be wondering, 'Why aren't you finished yet, Penelope? You built this thing and converted it to Rust. Surely you must be close?'
00:22:38.320 Unfortunately, the syntax of the Ruby language is extremely complex. For instance, how we write multi-line strings using heredocs is peculiar.
00:22:55.440 They can even be defined inside other heredocs, which makes it tricky for Ruby Format to handle these cases correctly.
00:23:14.080 Even seemingly simple constructs, like arrays, can change structure dramatically depending on context, complicating our ability to work with them in Rust.
00:23:31.440 Furthermore, without a formal specification of the Ruby language, getting the data structures correct for Rust is exceedingly challenging.
00:23:48.640 The complexity of Ruby Format is daunting, and I've inadvertently become quite knowledgeable about the intricacies of Ruby syntax through this project.
00:24:05.920 Despite the challenges, I am committed to meticulously getting everything right, as I believe Ruby deserves a true, excellent auto-formatter.
00:24:22.560 I want to thank the RuboCop and Sorbet teams as they've proven there is an appetite for excellent tooling within the Ruby ecosystem.
00:24:39.200 I also want to mention that I work for Stripe, the company that developed Sorbet. They are not compensating me to speak positively about it; I genuinely believe it's a fantastic tool.
00:24:56.880 Without tools like Sorbet, we wouldn't have been able to create great tooling for Ruby. If you would like to check out the code for Ruby Format, it's available on GitHub: github.com/penelopezone/ruby_format.
00:25:10.800 That's all I have! Here are my contact details, which I'll leave on screen for a few moments.
00:25:20.080 I'd be happy to take some questions.
00:25:27.440 That was wonderful! Thank you so much, Penelope. Before we get to the Q&A, let's take a short break.
00:25:36.640 I'll take a moment to share the results of the developer dilemmas.
00:25:44.480 The final tally shows 44% of you voted for 'move fast and break things,' while 56% voted for 'strive for zero errors.' Look at that!
00:25:55.440 Now, let's jump into some questions. Type them into the YouTube chat, please, and we'll bring them to Penelope as we go.
00:26:08.000 Penelope, thank you very much for your talk! We have a few questions.
00:26:14.640 One of them is: Can I use Ruby Format in RubyMine along with IDEs like Film?
00:26:24.560 That's a great question. I don’t know. I suspect someone would need to write a plugin for RubyMine.
00:26:33.440 I believe it has a formatter selector, so perhaps someone could reach out to the RubyMine developers to ask them to add Ruby Format.
00:26:40.960 But as of now, I suspect that integration isn't there yet.
00:26:48.800 Andy has a question: Great work, Penelope! I was wondering if Ruby Format enforces a maximum line length and how it handles overflow.
00:27:03.840 That's a fantastic question! I haven’t decided yet. Currently, we're exploring two possible modes: one is to enforce a line length limit.
00:27:22.240 If it exceeds that limit, we could break constructs from being single-line into multi-line—like turning long hashes or arrays into vertical lists.
00:27:37.760 The other idea is to respect the user's choice, allowing them to set a width as wide as they prefer.
00:27:46.240 However, if a user breaks a construct onto multiple lines, we would apply the multi-line formatting style. The formatter should adapt to how developers work.
00:28:03.839 The formatter's approach to line length could hopefully incentivize developers to work with the formatter rather than against it.
00:28:11.440 I believe that as the Ruby Format evolves, developers will learn how to better collaborate with it.
00:28:29.040 We have time for another question: Penelope, this project is complex. Will you be able to keep it updated with Ruby's latest versions?
00:28:36.640 Great question! As Matz announced new Ruby syntax in Ruby 3, I expressed my trepidation regarding maintaining support.
00:28:55.440 Currently, it requires manual coding for support when new constructs are introduced. Unfortunately, Ruby Format does not yet support Ruby 2.7's pattern matching.
00:29:12.160 Ruby Format currently supports Ruby 2.6 and upcoming versions, but I focus on recent releases for practical reasons.
00:29:32.160 As development progresses, we aim to support Ruby 3 syntax. Additionally, should Ruby introduce breaking changes, Ruby Format would adapt accordingly.
00:29:50.880 My goal is to maintain compatibility without needing major version updates, as long as it's possible to represent both older and younger Ruby syntax.
00:30:08.160 Thank you for an incredible session, Penelope. I appreciate you sharing your journey and insights!
00:30:21.680 Now, we're out of time. We'll take a tiny break and return in about three minutes for Tatiana's talk. See you then!
Explore all talks recorded at NoRuKo 2020
+10