LoneStarRuby Conf 2011
More DSL, Less Pain
Summarized using AI

More DSL, Less Pain

by Evan Light

In the presentation titled "More DSL, Less Pain," Evan Light discusses the complexities of writing Domain-Specific Languages (DSLs) in Ruby and explores methods to simplify the process. The talk begins with an introduction to the various kinds of DSLs, particularly internal versus external types, and emphasizes the importance of contextual DSLs. Light recounts his experiences using the Cucumber tool for acceptance testing and introduces his own project, CUDA, which serves as an internal DSL focused on Ruby.

He highlights the challenges faced when creating DSLs and uses code samples to show the difference in clarity between verbose and concise examples. Light categorizes DSLs into two flavors: internal and external, specifying RSpec and state machines as examples of internal DSLs. The focus then shifts to contextual DSLs, where state is accumulated across objects, exemplified by Sinatra's mapping of HTTP requests to routes.

To illustrate building a contextual DSL, Light shares his approach of using a class for context while employing instance_eval to manage nesting within the DSL. He discusses the potential complications arising from this method and introduces the "Lispy" gem, which inspired him to think of DSLs as something more akin to a compiler. Light explains the components of a compiler, such as lexical analyzers and abstract syntax trees (ASTs), and suggests that DSL creation could benefit from these principles.

He outlines how the your DSL gem he developed extends Lispy's capabilities, providing a structure to create and manipulate DSLs through ASTs. The benefits of using ASTs in DSL design include simplifying the interaction and allowing for flexible outputs. Light concludes by advocating for the use of compilers in internal DSLs to reduce complexity and enhance maintainability.

The key takeaways from Evan Light's presentation emphasize:
- The distinctions between internal and external DSLs.
- The benefits of contextual DSLs and state management.
- The importance of ASTs in simplifying DSL implementations and avoiding tight coupling.
- The value of using a compilation approach to enhance the clarity and usability of DSLs.

00:00:19.369 My name is Evan Light and I'm here to talk to you about DSLs and how to write them less painfully. Stephen already set me up just a little bit to talk about DSLs and, yes, we are going to discuss a bit about metaprogramming.
00:00:28.980 Here's the agenda: I'll give you a quick primer on the different kinds of DSLs you can write in Ruby. We'll specifically talk about a certain kind of DSL called contextual DSLs. This classification may have been originated by OB Fernandez. Then, we will discuss compilers of all things, and finally, I will introduce a little gem called your DSL.
00:00:46.980 Just pretend I'm as entertaining as James Gray. I'll make an effort, but I don't know if I'll succeed. First, let me share a little story. I spent a lot of time thinking about DSLs in Ruby through a tool called Cucumber. Can I see a show of hands for people who have used Cucumber? Okay, about 30% of the room.
00:00:59.820 Cucumber provides an external domain-specific language; it's a specification for a language that's implemented in another language. It can be implemented in various other languages and it's used for automated acceptance testing. While I appreciate some of the ideas behind it, I personally don't like using it, so I wrote something called CUDA. CUDA does something similar to what Cucumber does, except that it's focused on 'Test Unit.' It's just Ruby code.
00:01:43.079 This means that it's an internal DSL, which indicates it's written in Ruby and runs in Ruby. Therefore, it strictly utilizes Ruby. This is how I gained much of my education while also enduring numerous challenges while writing DSLs. Consequently, CUDA has been a pet project of mine for a couple of years.
00:02:05.520 I use it in the projects I can, but some of my customers prefer Cucumber. What can you do? Unfortunately, the implementation of CUDA has never quite met my standards because writing DSLs can be quite tricky. Let's talk about that now.
00:02:28.200 Which do you all prefer: A or B? Both are trying the same code samples that are both attempting to accomplish basically the same task. On the left side, we've got WEBrick, where you're creating a new instance of a WEBrick server on port 4567. By the way, 'Say Mr.' Not sure if you all want to hear about that.
00:02:55.140 We are looking at an incoming HTTP request; if it's a GET request, the body responds with 'Hello, World!' If it's anything else, we send back a 404 response. We then attach the lambda to the WEBrick server as a proc handler servlet. After that, we listen for a couple of signals, and then we start the server. You can do all this in 20 lines of code, or you can do it in five. For those of you who said you prefer A, well, up yours!
00:03:59.450 If you don’t get that reference, just Google 'Porkchop Sandwiches' and watch the video. I'm not allowed to show it here because it’s not family-friendly. So, moving right along, what is a DSL? A domain-specific language is a fancy way of saying an API that tries to eschew ceremony and elevate the intent of what you're trying to say.
00:04:08.190 These DSLs should expose intent as opposed to standard Ruby DSL libraries that can often get caught up in noise and may not read as clearly. Returning to WEBrick, this is not necessarily a DSL, but you could maybe consider it that, and I'm going to talk more about that. Now, let's discuss the flavors of DSLs, because for me, they come in two forms: internal and external.
00:04:57.920 Some examples of internal DSLs include RSpec, which we mentioned a moment ago, and state machine, which is common in Rails Active Record. Most of you probably know that one. External DSLs could include CoffeeScript, which is very popular these days, Gherkin, the text portion of Cucumber, and Haml, which is just another form of markup used in various programming languages. There's even Haml.js.
00:05:44.350 Now, we’re going to focus specifically on internal DSLs, particularly the contextual variety. Most of you are likely familiar with something like 'say' as a commendable class method DSL, where things are declared in a fairly straightforward manner. You could even argue that HTTP itself is a kind of DSL, although it’s quite tailored to HTTP.
00:06:01.789 An example would be calling 'set_form_data' on an HTTP request and passing some parameters. On the other hand, Sinatra serves as an excellent example of a contextual DSL.
00:06:07.469 In a contextual DSL, we are accumulating state across several different objects, mapping HTTP verbs to routes, where these HTTP verb-route pairs actually map to code blocks. This approach obscures all the ceremony while exposing the intent. For instance, for a POST request to the root URL, we would perform an action.
00:06:14.750 So, how do we go about building a contextual DSL? Previously, I found myself creating a class for context. So, as you nest, you end up creating a unique class for each block that you define, essentially for each level of nesting.
00:06:23.070 I'll provide an example in a minute. You will have an instance of that class for each block. Utilizing 'instance_eval' changes the context as you nest deeper into the DSL. For example, we have a wonderfully named method 'whatever'—please excuse the name, it came out of tests for a gem I will discuss shortly. This DSL is configured for a web server.
00:07:02.950 As such, you have a method 'set_up' that is specifying some values such as workers and connections on the HTTP server. We're saying for HTTP, we will turn off log access. Then we nest again and specify that the server will listen on port 80. We nest once more to say that for the location of the root path, we will have a document route.
00:07:47.240 In creating these DSLs, we seek to accumulate all the state while also striving to write as little as possible to maintain clarity. To describe the context, we should have a class of type 'whatever'—but within the HTTP context, we’ll just specify HTTP methods.
00:08:04.830 When we call the server method, we instigate the creation of a server object, nesting further down in this kind of Russian doll construction. Right now, I am not doing anything productive other than collecting information and creating objects, not really retaining them. This serves to illustrate how one might approach building these nested structures.
00:08:38.520 Yet, using multiple instance evaluations for just one task can become quite confusing. During my pursuit to enhance the code in CUDA, I stumbled upon Lispy, courtesy of Hacker News. Has anyone here not heard of Hacker News?
00:08:59.920 Lispy captures DSLs in a homoiconic manner; it essentially means 'code as data.' Upon reviewing the Git repository for Lispy, I found that not much has changed since its release. I made a few changes to it, but the original author did not want to adopt the route I had taken with it and eventually drifted away from further contact.
00:09:31.640 Therefore, I forked it. I realized that Lispy functions similarly to a compiler. This realization sparked my interest as I lacked compiler classes during my undergrad studies.
00:09:41.830 So we’re going to talk about compilers now. How many of you have studied compilers? Quite a few, cool! For those who haven't, a compiler consists of three parts: a lexical analyzer, a parser, and a code generator.
00:10:01.210 The lexical analyzer breaks down the code into individual tokens based on specific rules. When the parser takes this set of tokens, it produces what's known as an abstract syntax tree, or AST. The AST is a representation of the code that's stripped of all syntactical noise.
00:10:30.520 Let’s discuss the concept of the abstract syntax tree further. An example we have here could be the code used for a to-do list. While it doesn’t strictly have to be Ruby, this example visually represents what an AST might look like, and is somewhat similar to a Lisp-like representation.
00:11:07.660 This structure, which was initially produced by the Lispy gem, outputs nested arrays that can easily become challenging to follow. In any case, this is a noteworthy example of an AST.
00:11:40.140 Regarding code generation, given an AST, our task is to emit either machine code or some form of source code, such as CoffeeScript. Now, consider this question: What if we created a compiler for internal DSLs?
00:12:01.640 Ruby, in itself, acts as a lexical analyzer by interpreting its own code. In terms of parsing, I used method missing to gather data for constructing the AST I discussed earlier. The AST is typically stored in what’s called symbolic expressions.
00:12:38.650 A symbolic expression is simply a listed layout of actions or methods you are trying to perform. My approach, while modifying Lispy, has leaned towards a more Object-Oriented style.
00:13:04.410 This means I might have atypical structures. So in practice, these objects aren't really symbolic expressions; they're more akin to 'sexps.' Yet, the main advantage of my approach is avoiding iterating through arrays or having to use indices to access values in your AST.
00:13:25.020 You can end up with a much clearer and more semantic structure, much like the DSL sample I referenced earlier. Instead of extracting values using indices, you simply request arguments or scope.
00:13:50.690 This simplifies the interaction significantly, so when creating a generator for your internal DSL based on an AST, you get to decide what to emit, and you essentially craft it as per your needs.
00:14:12.320 I used Lispy and transformed it into a gem called your DSL, which added some extra features to it. Your DSL serves as a contextual DSL,
00:14:41.460 and it provides you with an AST that you can iterate through and manipulate however you want. In my case, I utilized it to execute an interpreter. When given some DSL input, you receive an AST as output.
00:15:04.130 For instance, let’s consider the HTTP DSL I mentioned earlier. Even with a couple extra lines of code to extend your DSL, you can still run it without generating any errors. Even without the implementation, the AST will remain available, waiting for you to work with it.
00:15:31.950 This structure allows for easy generation of ASTs that consist of expression objects containing nested expressions. There’s a code sample focusing on a simpler acceptance test DSL that we’ll look at.
00:15:54.610 This DSL has scopes and it traverses various scenarios to generate test steps, which execute within a test framework. Essentially, I’m using Test Unit. So, that’s the gist. I leverage my DSL along with a compiler.
00:16:19.190 The cool aspect is that you can use your generator to create different outputs. You can take the AST and interpret it in plain English. This, too, is straightforward. So why should we utilize a compiler for internal DSLs? The most essential reason resides in separating your DSL language from its implementation.
00:16:44.520 When tightly coupled, code complexity increases tremendously. It's common knowledge that tight coupling is typically viewed unfavorably, given that prototyping becomes simpler. As you've noticed, you can implement your DSL bit by bit. It still functions without impacting the rest of the code negatively.
00:17:12.180 The AST simplifies the process; you can iterate through it and manipulate as necessary. Instead of employing multiple instance evaluations, you can condense it down to one. The flexibility of switching generators offers additional advantages.
00:17:31.870 To summarize, an AST can be a tremendous asset for your internal DSLs. An internal DSL compiler may offer significant value to others, and I believe it could help take messy code and rewrite it swiftly.
00:17:57.270 I don't attribute this success solely to myself; it's more a testament to the powerful concept behind it. ASTs serve as a proven framework for DSLs, and the your DSL gem simplifies implementation if you'd like to fork it and experiment further.
00:18:19.440 Also, I'm a skeptic at heart, quite like Jim. I've formed a group near my home called 'Free Ocean City Thinkers,' emphasizing the notion of questioning ideas. So, don't hesitate to challenge my thoughts if you think they fall short or can be improved.
00:18:42.520 I also run an event in Northern Virginia called Ruby DCamp, which is free and has only a couple of remaining spots. It's scheduled for later in September, and it’s a great opportunity to learn and meet others.
00:19:03.390 While spots fill up quickly, cancellations often occur at the last minute, so feel free to reach out if interested. Mind you, attendance requires an invitation code to access.
00:19:27.160 Lastly, I host online office hours where I make myself available to hack together or discuss ideas. As I mentioned earlier, I reside in a remote part of Ocean City, Maryland, where there aren't many like-minded Ruby enthusiasts, and I genuinely enjoy connecting with fellow programmers.
00:19:51.700 If you feel inclined to collaborate or brainstorm technical concepts, feel free to reach out. I maintain a calendar for signing up to chat.
00:20:06.000 So, thank you! That's all I have for now. Are there any questions?
00:20:20.330 Steve mentioned that one of the things about the ASTs that he recently considered was the fact that Aaron Patterson rewrote ActiveRecord based on ASTs.
00:20:45.320 This improved performance notably, especially regarding the string concatenation being utilized previously, and he pondered whether contextual DSLs could serve a similar beneficial function given Ruby's inherent issues.
00:21:01.990 Let me reiterate that Steven observed ActiveRecord's significant performance increase after utilizing ASTs, mentioning difficulties with Ruby’s performance related to nested scopes.
00:21:23.090 Certainly, let me emphasize the question/comment Steve posed concerning whether I had conducted performance comparisons between CUDA and the current implementations.
00:21:51.680 As far as I can tell, the performance could potentially be slightly slower, at the moment. The disparity may stem from how your DSL operates, maintaining a similar structure but factored a bit differently.
00:22:15.099 The DSL generates the AST, which then is walked through for execution, potentially doubling the traversal time compared to a visitor pattern. However, I have yet to engage significantly in performance comparisons.
00:22:43.860 Do we have any follow-up questions? David asks whether I have recommendations for gaining an education in recursive descent parsing.
00:23:01.780 I didn't personally have to delve into that detail with Lispy, as the structure was pre-existing; I just needed to tweak it, though I did take part in courses on finite automata.
00:23:29.520 For educational resources, I gathered knowledge mostly via extensive Google searches, and Wikipedia articles proved useful material. So, I would recommend starting there.
00:23:57.090 Concerning ARchitects, it’s possible to define rules using JSON, which provides ample resources. There are many tools in the ecosystem, including RAC and others, although those primarily point towards external DSLs.
00:24:29.640 This process may feel like cheating because it simply uses method missing to work with data. In this case, the AST accumulates data as interaction proceeds.
00:25:01.320 If there’s further discussion on your comparison of your DSL to some other parsing methodology, I’d love to hear it. Any other questions before we wrap things up?
00:25:25.340 Alright, thank you for your time and for the great questions!
Explore all talks recorded at LoneStarRuby Conf 2011
+15