Programming Languages

Compiling Ruby to idiomatic code in static languages

Compiling Ruby to idiomatic code in static languages

by Alexander Ivanov and Zahary Karadjov

In the presentation titled "Compiling Ruby to idiomatic code in static languages" at RubyKaigi 2019, speakers Alexander Ivanov and Zahary Karadjov delve into translating dynamic languages, particularly Ruby, into statically typed languages like C++, C#, Go, and Java. They explore two main approaches:

  • Pseudocode-like Translation: This approach supports a subset of Ruby, focusing on generating straightforward statically typed code in various target languages. The speakers emphasize the need for idiomatic code that developers can easily understand, reinforcing the idea that automated translation should not compromise code quality.

  • Realcode-like Translation with Ruby Tuning: This method involves inferring Ruby types at runtime and translating complex codebases to Nim using a project called rb2nim. The speakers explain that while this process does necessitate some manual adjustments, it automates a significant portion of the work.

The duo explains the rationale behind translating from Ruby, indicating that as projects grow in complexity and user base, businesses may seek to rewrite their systems in more efficient languages. This transition is often costly and fraught with risks, prompting the search for safer alternatives. They cite the need for optimized algorithms and leverage Ruby during rapid prototyping phases to maintain code efficiency in finalized products.

Key project developments highlighted include:
- Pseudocode Project: Originally designed to generate algorithms across multiple languages based on a common logic.
- PI to Nim Project: Focuses on translating Python code into Nim, successfully translating around fifty thousand lines of code while adapting to changes in the codebase over time.
- Ruby Tuning Tool: Aimed at tracing Ruby execution to annotate and infer function types, allowing for better automatic translation into Nim, while maintaining the dynamics of Ruby’s meta-programming features.

The presentation also emphasizes the importance of testing in Ruby for successful type inference and translation accuracy. The speakers conclude by discussing their goals for language translation projects, the need for community feedback, and their openness to suggestions for future translations, ensuring their work continues to address developers’ requirements effectively. In summary, Ivanov and Karadjov aim to enhance translation capabilities to preserve the idiomatic beauty and functionality of Ruby while increasing the reach of Ruby code into other static languages.

00:00:00.860 Hello everyone, I'm Harry and I would like to share a few things about myself.
00:00:06.420 I began my career as a game engine developer. Are there any game developers in the audience? Let me see some hands.
00:00:18.420 It’s not surprising that there are not too many, because using a language like Ruby for game development can be quite challenging. In game development, you typically need to maximize performance, which often leads to using lower-level languages like C++. Deep down, I always dreamed of writing in Ruby.
00:00:37.860 That led me to the idea of creating my own programming language, one that would be as slick as Ruby but as fast as C++. This desire is how I got involved with the Nim programming language. I joined the development team seven years ago and I am currently the second most active contributor.
00:01:19.229 Now let me introduce my friend Alexander. Alexander started out doing small projects and contributing to various languages, including Nim. Currently, he works on developer tools for Nim.
00:01:32.130 Earlier on, he was a Python and Ruby web developer and even taught Ruby at Sofia University. Alexander fell in love with Ruby during that time, working primarily with Ruby on Rails and web design. How many of you are web developers? Ah, many more than the game developers.
00:02:14.849 We are both from Bulgaria, which is an interesting place to live. Has anyone been to Bulgaria? Awesome! Though many people often highlight Sofia, I actually live in a smaller town called Port Authority. It’s quite old and has beautiful architecture as well as delicious pancakes. It’s our cultural capital this year alongside an Italian city.
00:02:56.519 Bulgaria is well-known for its yogurt, but it also has a rich history. We were one of the first Christian countries in Eastern Europe. Our nation has seen glorious moments and faced various powerful competitors throughout history, forcing us to reinvent ourselves multiple times.
00:04:11.099 Despite all the struggles, our most significant moment of liberation involved many extraordinary figures, even samurais who fought for our freedom, which is quite fascinating. After these events, we kind of reestablished ourselves.
00:05:02.840 Now let’s get back to the main topic. Today, we will discuss the translation of dynamic languages like Ruby into statically typed ones, such as C++.
00:05:40.250 You might wonder why anyone would want to leave a comfortable language like Ruby to explore something else. Well, perhaps your project has grown considerably with many users, which is a good problem to have, but it often leads companies to consider rewriting their systems in a different language.
00:06:01.430 This process usually takes a lot of time, is expensive, and carries risks, so finding a safer, easier way to make this transition is desirable. Another potential reason could be if you'd like to implement an algorithm, for instance, a cryptographic routine, that could then be translated into multiple programming languages for broader access and adoption.
00:06:31.969 Finally, you might want to leverage the useful capabilities and superpowers of Ruby during the rapid prototyping phase of your project, but still keep the option of having an optimized, automatically-translated version ready for your end customers, streamlining your development process.
00:07:04.169 Our projects have been exploring this translation approach for several years. It all started with my friend Alexander's project called Pseudocode, which took place over four years ago and is available on GitHub. The original idea involved generating algorithms based on a common logic across different languages.
00:07:56.250 I faced the challenge of generating specific algorithms in a single language while aiming for an automatic conversion into as many target languages as possible. This is applicable for auto-generating API handlers, wrappers, or general-purpose algorithm libraries.
00:08:55.760 To achieve this, I utilized a very simple language like Python or Ruby, which makes writing pseudocode easy and accessible while targeting other languages. The main difference between my approach and Alexander's was that I aimed to generate idiomatic code by creating an intermediate level using a universal language and a standard library.
00:09:41.159 This universal library would cover common types, collections, functions, and methods necessary for straightforward mapping between languages, allowing for clearer translation while preserving code readability.
00:10:02.730 Although this method worked well for smaller problems, we realized that we needed a more comprehensive approach. This realization led us to the development of the PI to nim project.
00:10:30.870 At the company where I work, our goal is to create one of the first implementations of Ethereum 2.0. This development has been primarily driven by a team working in Python, but we identified the need for a language that is somewhat similar to Python in terms of appearance and idioms, and that’s where Nim comes in.
00:10:55.890 Nim is an efficient language that supports our aim to create a tearoom implementation for mobile devices and resource-constrained environments. During Nim's development, we aimed to create a translation for the Python code produced by our team.
00:11:32.430 In the early stages of developing Nimbus, our Ethereum client, we achieved a remarkable amount of automated translation — around fifty thousand lines of code, which wasn’t fully functional but needed only minimal touch-up. This initial success boosted our confidence that we were on the right track.
00:12:25.690 As time went by, our codebase underwent significant changes, allowing us to refine our methods for better efficiency. A key aspect to mention is the difference between Pseudocode and PI to Nim in how they interpret the program's source code.
00:13:12.509 While Pseudocode looks at a program as a static snapshot, PI to Nim infers information through execution tracing. PI to Nim is specifically optimized to target Python, focusing on practical translation for real-world projects.
00:14:36.130 Recently, we are excited to announce the first release of Ruby Tuning, a tool similar to PI to Nim but designed for Ruby. Both PI to Nim and Ruby Tuning utilize a more general system we refer to as Linguist.
00:15:06.360 We chose Ruby as a second target because it represents a significant challenge, mainly due to its predominant use of meta-programming. This difficulty excited us, leading to selecting Ruby for translation.
00:16:01.660 Moreover, Nim is exceptionally fast, on par with C and C++ in many scenarios but with a garbage collector, making it capable of expressing many dynamic programming features. Nim also has robust meta-programming capabilities that allow us to recreate the expressive power of Ruby.
00:16:37.370 Let me illustrate how Ruby blocks can be elegantly replicated in Nim. Starting with a simple example, we have Ruby and then a similar construct in Nim. The DSL capabilities allow creating rapid prototypes that closely mimic Ruby’s structure.
00:17:25.150 This process is driven by compile-time macros that transform pieces of your program into targeted code output. Nim's powerful macro system grants enhanced flexibility compared to what is seen in C and C++, effectively unifying dynamic code capabilities to static context.
00:18:07.460 Now, let’s delve into the Ruby Tuning methodology. Despite Ruby being a dynamic language, its execution manipulates specific values with identifiable types, allowing us to monitor program execution and determine concrete types for many methods.
00:19:00.150 We achieve this by recording execution traces using the TracePoint API and creating a database. This database enables a deep understanding and allows us to automatically annotate functions with the specific types observed during execution.
00:19:49.620 In addition to the Ruby Tuning project, we see various applications for the systems we are developing. For example, by running tests with Ruby Deduct, we create a database that can serve as a language server for enhanced code autocomplete features.
00:20:44.900 Moreover, you can incorporate the tool into continuous integration pipelines to dynamically generate code documentation, which helps navigate your codebase more effectively.
00:21:39.440 It’s important to note that to ensure success with these approaches, your code needs good tests and coverage for effective tracing. As Ruby developers, it’s likely you already have test suites within your projects.
00:22:20.810 In conclusion, we are continuously working on improving translation accuracy for Ruby code. While our method can automatically translate much of the code, we've observed challenges with mock types and dynamic portions of code.
00:23:02.750 For instance, mock types are great for testing, but they often misrepresent underlying values. We're investigating additional configuration options to better address such inconsistencies.
00:23:56.370 In our work on Ruby Tuning, we aim to make translation idiomatic, preserving the essence and beauty of the original structure. Our goal is to create code that adheres closely to Ruby’s expressive style.
00:24:50.270 Regarding project translations, we utilize multiple repositories: one for the source code and another for the translated output. This structure allows us to manage their development and address any necessary updates or annotations.
00:25:34.720 In the source repository, light annotations are made, keeping the codebase manageable while syncing with upstream changes, reducing conflicts. In the output repo, we focus on minimal manual patches following automatic translations.
00:26:16.490 Translating projects often demands significant engineering efforts. We usually start by mapping the critical components to identify effective targets for manual replacements to enhance the translation process.
00:27:15.680 It’s crucial to program with idiomatic translations during this mapping phase, which allows for more coherent and robust translations throughout the project.
00:28:13.360 Moreover, our DSL assists us in easily defining rules and generating target syntax for various input language components during translations.
00:28:57.630 Additionally, we utilize a system that simplifies adding support for new languages, allowing efficient adaptation on a structured basis. Our intent is to maintain a flexible architecture that incorporates various programming paradigms.
00:29:44.549 Once the final code generation stage is achieved, we have the option to tweak the process. This flexibility allows for smart engagements with many parts of a translation that need more manual handling, giving us ample opportunities to adjust as needed.
00:30:29.490 We aim to provide many escape hatches in our translation tools, along with a solid foundation to promote easy readjustments, enabling better maintenance and usage of the translated projects.
00:31:20.150 As a result, our goal remains to translate Ruby elegantly into idiomatic Nim while preserving the beauty and functionality inherent to Ruby.
00:32:01.370 Throughout our experimentation, we found it important to manage translations effectively, all while striving to minimize dynamic aspects. This involves closely analyzing different function behaviors to ensure accuracy and efficiency.
00:32:51.890 The final takeaway is our future direction; we aim to expand translation to other languages based on community feedback. We hope for active participation in feedback as we strive to make this project more robust and widely applicable.
00:33:46.720 As we progress, we remain open to suggestions about the next projects suitable for translation, considering both community interest and feasibility. Thank you for your attention.
00:34:42.090 If anyone is curious about our project or wants to discuss anything further, please find us after the talk. Thank you!