Soutaro Matsumoto

Type Checking Ruby Programs with Annotations

RubyKaigi2017
http://rubykaigi.org/2017/presentations/soutaro.html

Type inference for Ruby programs is really difficult, and no one on earth has implemented successfully yet.

Q: What if we write type annotations?
A: Much easier, but it is still not trivial.

I will explain why they are difficult, how we can have a practical type checker for Ruby, and how the programming experience will be with types.

RubyKaigi 2017

00:00:00.030 Okay, so yeah, today I'm talking about type checking Ruby programs with annotations. The last talk was about static checking, so let me start with introducing myself.
00:00:05.220 My name is Dr. Matsumoto, and you can find me on GitHub and Twitter under the handle Soutaro. Please follow me. I'll also introduce the company I'm working for and the project I'm currently involved in.
00:00:20.880 I spent several hours yesterday figuring out how to introduce myself for today, and I finally found that this should be a good self-introduction. I have implemented four type checkers over the past 12 years, so I might be the only person in the world who has tried these four times. The first type checker I attempted to implement was when I was an undergraduate student. It was based on constraint-based type inference, and the type system included some structural subtyping, which didn't work very well.
00:01:00.510 The next attempt was when I was a student about ten years ago. That one was based on the ML type inference algorithm and object typing based on an extension of polymorphic record types. Unfortunately, this also didn't work well. After that, I tried control flow analysis and implemented a safety analysis that was almost equivalent to type checking. It attempts to analyze several method definitions in Ruby programs, but I don't think it worked very well either.
00:02:04.469 Today, I'm discussing my latest static type checker. It is based on local type inference, which is the fastest method we can set up. I assume there are no type annotations in Ruby programs. My type checker is built upon local type inference and structural subtyping.
00:02:51.690 I want to start with this question: Why do we want a type checker? What can we do if we have types in Ruby? There are numerous benefits and applications of types that I want to list. First, it helps programmers find bugs. A type error usually indicates that your program is calling a method incorrectly or passing improper arguments to a method. If a type checker catches these errors while you're writing your code, you can fix them before executing your program.
00:03:36.600 The next benefit is that types provide verifiable documentation of an API. They structure the usage of APIs and ensure that the documentation is not outdated. This is much better than API documentation that is embedded in comments because you'll never forget to update the documentation. Furthermore, when type checking is integrated into IDEs like RubyMine or Emacs, it assists in code completion. When typing code, method names will pop up automatically, minimizing the need to type full names.
00:05:05.820 Type checking also aids in refactoring. When you change a method's name or signature—adding new arguments or keywords—the type checker helps identify all the places that require updates. IDEs can provide better refactoring features with type information, which can also be utilized for more advanced program analysis, such as security checking. Tools like Brakeman can be improved with type information.
00:06:09.630 Now that we agree on the need for type checking, it is essential to note that the type checking for Ruby is not a new idea—it has been attempted for at least the last 12 years. I want to present two papers published about ten years ago in this talk. The first one is 'Static Type Inference for Ruby.' You may remember the name of the author, D. MacAdobe, regarding the type checker implementation described in this paper.
00:06:45.509 The second paper is mine, discussing 'Type Inference for Big Programs' based on polymorphic record types. Both papers attempt to provide type inference for Ruby programs as they aim to type check existing Ruby programs without modifying them. As you know, Ruby is an untyped language, and no type annotations are included in the source code. Hence, to type check Ruby programs, we need to infer types from the source code.
00:08:06.659 I want to discuss the first paper, 'Static Type Inference for Ruby.' The implementation is still available at the amount of a club for Ruby. In fact, you can download the implementation and try it with your programs, but it hasn't been updated in nine years. I attempted to download it yesterday but couldn't build it.
00:08:43.460 The type system is based on structural subtyping. The advantage of structural subtyping is that it is flexible, intuitive, and powerful enough to represent the types of Ruby programs. However, the downside is due to its subsumption type system. The type inference cannot import polymorphic types in a global program.
00:10:00.800 Let me explain my paper. I presented it at RubyKaigi 2008, ten years ago. The type system is based on very type inference and polymorphic record types. Polymorphic records simulate structural subtyping, and because it is a bit weaker than structural subtyping, it is possible to implicitly infer polymorphic types using automatic record types. My implementation infers polymorphic types in the program.
00:11:22.690 However, I should note that there are serious limitations. The implementation cannot give types for some built-in methods and classes, such as String and Integer. This means it cannot type check any Ruby program or provide types for built-in methods. This limitation arises due to the type inference algorithm it uses, which does not allow polymorphic recursion.
00:12:08.390 Both papers have very significant limitations and cannot function as practical tools. This is a disappointing conclusion about type inference for Ruby programs, as we cannot construct an effective type inference algorithm for them. Picking structural subtyping does not enable polymorphic types, while choosing polymorphic type inference leaves some built-ins untyped.
00:12:53.630 Currently, there seems to be no feasible solution for these inherent issues. However, we can hope for advancements in type theory. Perhaps, some researchers will eventually find a good trade-off that maintains the flexibility needed to represent Ruby types while allowing for polymorphic type inference.
00:14:12.240 Let's take a step back when discussing type checking. When we talk about type checking, we implicitly have three requirements: correctness, static nature, and the absence of annotations. Typically, we want a type checker to be correct. The correctness of a type checker means that if it confirms a program is okay, no errors will be raised during execution.
00:15:06.040 The second requirement is static typing; we want to type check Ruby programs without executing them. This means that type checking should not depend on the exact runtime behavior of the program. This is the main difference from unit testing. The third requirement is that we want to type check existing Ruby programs without needing to add any annotations, as all Ruby programs typically lack type annotations.
00:16:16.050 However, how about relaxing these requirements? Instead of enforcing correctness, we could allow for some incorrect type checking. We might also consider performing type checking at runtime instead of statically. Alternatively, if inferring all types is infeasible, we could ask programmers to provide some annotations.
00:17:21.240 One example is TypeScript; while it is almost correct, it has some unsound typing rules, such as uncertainty regarding covariance and subtyping rules on function parameters. Its overall type theory indicates that TypeScript is incorrect, yet the typed JavaScript ecosystem has proven incredibly useful, enhancing productivity.
00:18:50.410 Another example is security analysis tools like Brakeman. They are often useful for identifying vulnerabilities in Ruby programs, even if they do not provide a strict correctness guarantee. So, when we stop enforcing strict correctness, what can we do to validate Ruby programs?
00:20:24.110 Next, let's consider how we could relax the static requirements. A paper discusses just-in-time static type checking for dynamic languages and was published last year. This algorithm type checks Ruby programs during execution, particularly at the beginning of each method.
00:21:35.330 If a Ruby program defines a method and this method calls a function that does not exist within its context, the dynamic type checker would be able to detect this mistake on the fly. This approach can find more issues than traditional dynamic type checking might catch. It also supports some meta programming, which can be significantly beneficial.
00:22:50.909 Additionally, I want to introduce my newest static type checker, known as Steep, available in my GitHub repository and installable via RubyGems. The key concepts of Steep include gradual typing: if you include type annotations in your program, each one will be checked, but if you choose not to include any, it won't enforce type checking.
00:23:21.149 Steep also performs local type inference, allowing programmers to apply type annotations only where necessary, thereby minimizing the annotation burden. Instead of requiring annotations for every variable, Steep imports types of local variables automatically. It also provides a language for defining types.
00:25:01.200 For example, here’s a Ruby program with Steep annotations. Annotations are provided as comments, specifying the types of local variables. The final type checks can help with figuring out the value of a constant, giving a picture of the state of the program.
00:27:00.990 In Steep, types are defined in another language rather than extracted from Ruby programs. For instance, an interface to define methods and types can appear very elaborate. This feature can facilitate type definitions while using Ruby. Therefore, type definitions contribute to clear method signatures and return types.
00:29:16.950 The separation of method definitions and type annotations allows for a clear representation within the programming structure. Consequently, this implementation holds substantial similarities to Objective-C, aiming to enhance development without the encumbering requirements of type definitions.
00:31:06.370 In conclusion, while type inference for Ruby is challenging, I hope Steep demonstrates significant growth in static type checking for Ruby applications.
00:31:17.170 Thank you! Do you have any questions?
00:31:59.750 One question I have is about type signatures. Is the type signature from Steep directly interpretable by Ruby, or is it implemented through a different mechanism? It seems that the algorithm could certainly have some relations to Ruby but would require additional implementation to function correctly.
00:34:13.920 Thank you for the question! Your concern is valid as the community works to generate precise type definitions for Ruby's typing system.
00:36:04.200 There are numerous existing libraries and types that could benefit from extensive type definition efforts. It is indeed crucial for the developer community to come together and aid in these developments.
00:37:17.070 In simple terms, a more collective effort to achieve similar outcomes as seen in TypeScript could vastly improve the stability and productivity of Ruby applications. Thank you!