Automated Type Contracts Generation For Ruby

00:00:11.720 Hello, my name is Valentin Fondaratov. I work at JetBrains, the company that brought you IntelliJ IDEA, ReSharper, MPS, and many other tools.

00:00:16.870 Today, I would like to share some new projects we’ve started in the Ruby team. Although we’re still in the beginning stages with just a fragile prototype, I believe we’re moving in a promising direction.

00:00:30.500 Before we discuss our roadmap, I will start with a review of code verification in the Ruby world and the challenges we face. These challenges are multifaceted, with varying approaches being explored to address them.

00:00:48.950 We’ll look at different code analysis approaches that combine both running programs and static analysis. I can’t say it will completely eliminate bugs, but it may improve type safety in Ruby and help address some common code smells.

00:01:09.409 Let’s begin by discussing the tools we use. A significant part of any program's success depends on the tools available. This includes CI services, documentation, operating systems, programming languages, and even laptop manufacturers. Although not every team uses code analysis tools, it is evident that they can save you from potential pitfalls.

00:01:44.180 Code analysis tools can catch severe errors before they reach production and impact your users. They can also encourage adherence to coding standards, warn you about code that won't work before you run tests, and assist in writing cleaner, more concise code through automated refactorings.

00:02:23.830 In the Ruby ecosystem, one of the most recognized tools for code analysis is RuboCop. Many of you likely use it. While it has many community-backed inspections, or 'cops,' it primarily serves as a guideline enforcer.

00:02:54.060 RuboCop can identify code smells leading to bugs, but it doesn’t always catch latent issues in your code. Let’s examine a code snippet examined by RuboCop. It flags spacing issues, but it may miss crucial method calls that lead to runtime errors.

00:03:51.900 Urban Mind, where I work, focuses on bug detection rather than just code style consistency, although it has some capabilities in that area as well. It understands the types of variables better, unlike RuboCop. In statically typed languages, such errors would be highlighted in the editor.

00:04:58.150 However, the dynamic nature of Ruby allows for the creation of concise domain-specific languages, which can make static analysis challenging.

00:05:06.190 Let’s explore a specific case from the Diaspora social network, where a method returns different types based on input. Static analysis struggles with these complex scenarios, so we dream about a perfect analysis tool.

00:05:40.720 We face a dilemma: on one hand, we can’t achieve an ideal static analysis tool, and on the other, running our programs only tests a fraction of possible cases. Debugging and testing are essential for uncovering bugs, but we cannot validate all potential issues.

00:07:28.120 Thus, we must acknowledge significant limitations. Even with comprehensive test coverage, there’s no guarantee that all branches in libraries are tested adequately. Running tests provides coverage, but without full knowledge of how everything interacts, we cannot be entirely confident in our results.

00:08:41.080 Since we currently cannot achieve full verification, a more effective static analysis model could be borrowed from typed languages like Java and tools such as IntelliJ IDEA, while also improving test coverage.

00:09:13.400 I invite everyone to join us in developing this initiative, as the end goal is yet to be determined. However, good test coverage often implies that you may not be checking all edge cases.

00:09:41.640 It's crucial to explore and remember the behavior of the methods we are testing, especially in cases like our 'be' method in RSpec, which can exhibit different return types based on the input.

00:10:08.870 We can run tests to determine these types effectively. In doing so, we will understand what results can be expected from various inputs, allowing us to document our findings.

00:10:45.620 The algorithm for approaching this situation can be divided into three phases: gathering data from Ruby scripts, processing that data into human-readable and machine-readable formats, and sharing results with colleagues.

00:11:06.829 Focusing on the first phase, many people may not have used the TracePoint API. However, this powerful feature allows you to subscribe to events like method calls, method returns, and exceptions, enabling capturing local context values and parameters.

00:12:00.700 For example, we can observe how TracePoint interacts with method parameters during execution. Let’s hypothesize about calling a function with mandatory and optional arguments, illustrating how TestPoint works.

00:12:23.720 Each method call involves a series of bytecode instructions, which can be examined through TracePoint. By analyzing how default parameters are handled, we gain insight into parameter values passed at runtime.

00:13:20.780 Further dissecting the bytecode reveals that in order to obtain the correct parameters, we need to track back to identify the framing of the calling context utilizing the Ruby C API. This allows us to extract the precise input used during execution.

00:14:01.880 In our endeavor to understand Ruby function behavior better, we meticulously collect the types of all incoming parameters and correlate them with return types of method calls.

00:15:02.590 For instance, we can analyze a string class's 'split' method based on comprehensive documentation representing its usage. The goal is to transform observed behavior into actionable type contracts.

00:16:01.080 We can automate generating contracts that provide clarity on parameter types and return values for functions. Through the creation of dependency graphs, this further aids in understanding data flows within methods.

00:16:48.400 By merging type information across different method call instances, we enhance clarity and create visual models, given that similar inputs produce common outcomes. We can develop a finite automaton model that visualizes return types based on input parameters.

00:17:57.340 Creating a comprehensive automaton simplifies type resolution, and by minimizing the number of states, we enhance efficiency in understanding method behavior, even when the same type is used for different parameters.

00:18:59.020 For corner cases, introducing reference types for parameters allows us to track types that are inherently linked. Post-minimization, we can refine our types and understand methods far better.

00:19:59.400 During practical tests, we unravel how particular methods iterate through various test cases, examining behavior and output generation for a range of inputs. The manual annotations help illuminate the underlying logic.

00:20:59.720 Adding additional cases and generating contracts based on function calls gives rise to a method of compiling libraries with type annotations. This process allows for further refinement, resulting in utility within production environments.

00:22:45.020 Ultimately, this systematic method aims to create a global library annotation network, capturing vital runtime information from various applications to enhance the overall ecosystem.

00:23:24.180 The crucial elements for annotation pertain primarily to libraries we utilize and relevance across teams. By pooling testing data from various environments, we can address gaps, improve understanding, and form comprehensive annotations.

00:24:50.010 Now, I encourage developers to consider how this might assist in your own projects and endeavors. Generating value from testing and integrating observed patterns allows us to enhance existing Ruby tools and promote collaborative enhancement across our community.

00:25:45.390 This project is a small step towards mitigating some of the issues facing dynamic languages, particularly Ruby. The essence of improving developer happiness stems from both the language and the tools that we employ.

00:27:08.770 Thank you for your attention. Are there any questions?

00:28:07.250 Someone asked about the Ruby 3 project and optional types. While gradual typing may not appear in Ruby 3, this project aims to facilitate transforming existing projects into annotated ones.

00:28:31.670 Another question pertained to handling libraries callable with an endless number of classes. The strategy would be to use wildcards for diverse class distributions, enabling flexibility in the type system.

00:29:57.600 We are looking to implement duck typing systems in cases where classes share method structures. Utilizing runtime data will enhance our understanding of how interfaces can be dynamically defined and utilized.

00:30:43.360 Thank you all once again for your questions and for being part of this presentation.