00:00:14.530
Okay, guess what? We're going to get started now. Welcome everyone to my talk, 'Stately State Machines with Ragel.' Thanks for coming.
00:00:22.039
I know there are a couple of talks to choose from, so I appreciate you all being here. I'm a little jet-lagged, so if I seem like I don't know what I'm talking about, that's probably why.
00:00:27.800
Just to gauge the audience, how many have heard of Ragel before? Okay, not many. That's fine. All right, let's get started.
00:00:34.610
The goals of this talk are to convince you that Ragel is worth trying if you haven't used it before and to give you some intuition about how it works. Additionally, I will show you how to set it up so that you can do some basic parsing with it.
00:00:40.930
A little bit about me: My name is Ian Duggan, and I'm into lots of different things. I play hockey several times a week.
00:00:47.000
This is Stormy, the pig. I come from North Carolina, where pigs, me, and hockey coexist. I play guitar, banjo, mandolin, and ukulele. I also have a fiddle that's gathering dust. Additionally, I fly; I have a Cessna 120 that I fly out of Hayward, California. It has a whopping 85 horsepower, but it's fun to putt around.
00:01:06.500
I also love my cats. You know you have to have cat pictures in a presentation, so here they are. This is Parrington and Poem Goofus. This is Minnie; she's a goofball who likes to wear outfits. She thinks they're fun.
00:01:12.500
My cats like being cozy. These ones here are messy; Minnie likes to sit in strange places. I'm a software guy. I code for the internet and the Googles.
00:01:25.100
I'm a recovering technology entrepreneur. I've been in and out of startup institutions for many years now. Currently, I work for a company that does online streaming of social video games. We're obviously hiring, and we have lots of Ruby. The whole front end is built on Ruby on Rails, while the back end is under redevelopment with Go to improve performance.
00:01:51.859
I've been using Ruby for quite some time. I started on version 1.6; a friend of mine gave me a book, and I thought it was a pretty cool language. I don't know when the transition happened, but I have just kind of stuck with it since.
00:02:22.100
Now, enough about me. This talk is about Ragel. Ragel is a really cool tool used in many high-performance text parsing applications.
00:02:28.690
Some people have discovered it and are using it for such purposes, while many others struggle with regular expressions. I believe Ragel can be extremely helpful.
00:02:39.550
It has many features that allow you to do some really cool tricks, and if you don't have it in your toolbox, I'm hoping that you'll add it.
00:02:44.770
As I mentioned, many projects use Ragel. Mongrel, Unicorn, and Okuma are among the most well-known ones. I think Zed Shaw popularized Ragel about ten years ago when he rewrote the Mongrel HTTP protocol parser with it.
00:03:00.730
There's a cool gem called Racc that actually uses Ragel to parse Ruby, along with others like RedCloth, Mail, and Hpricot which all make use of it.
00:03:15.250
So just quickly, what does Ragel look like? Before we jump in, Ragel uses a Domain-Specific Language (DSL) that allows you to define state machines along with some actions. You can convert different terms according to your specifications.
00:03:27.250
Most people are familiar with regular expressions. Regular expressions are generally easy; most folks understand them and know basic concepts like matching, grouping, and alternation.
00:03:40.420
Ruby has excellent tools for regular expressions because it has a heritage from Perl which relied heavily on them. You can do advanced things like matching and replacing text—regular expressions work great until they don't.
00:04:04.960
If you've ever opened a file and stared at a complex regex expression, then Ragel might be something worth considering. Regular expressions have their place, but sometimes you want more control.
00:04:21.190
Just a little about state machines: finite automata have states and transitions. You might have seen these depicted as circles and arrows, with input causing transitions from one state to another.
00:04:34.400
There are two basic types: deterministic finite automata and non-deterministic finite automata. The key difference is that deterministic means only one state can be active at a time, while non-deterministic allows for being in multiple states simultaneously.
00:04:43.539
In academia, there is a way to demonstrate the equivalence of regular expressions to state machines. All the complex things you do with regular expressions can be done given the right tools.
00:05:07.520
State machines are an important tool in computer programming. Ragel is a wonderful tool for creating them, and they are everywhere. Think of traffic light signals; they operate on state transitions: green, yellow, red, and back to green infinitely.
00:05:24.689
Your CPU operates under a model of state machines, where every piece of input can cause transitions from one state to another. Every electronic device, from vending machines to barcode scanners, operates under similar principles, making finite state machines incredibly useful.
00:05:57.890
They are great for many reasons: they are simple to understand, and with the right approach can produce code that is faster, easier to maintain, and more correct. State machines are very specific about what inputs are accepted and which are not. It's harder to get something to parse incorrectly.
00:06:22.780
If you're still not convinced, we should spend some time exploring what they are. First, let’s look at some vocabulary: state machines have a start state, often denoted by S0, and an accept state or final state, which signifies that the machine has accepted the input string.
00:06:45.490
Transitions between states are usually represented with arrows in diagrams, and there is a special type called an epsilon transition, which allows you to move between states without consuming any input. This can be useful in concatenating two state machines.
00:07:09.630
From what we’ve discussed, we can build some simple state machines. For example, a regex can be viewed as a state machine, where the asterisk (*) operator matches any number of input symbols. We can create multiple machines to exhibit similar behaviors.
00:07:39.750
So what is Ragel, exactly? Ragel is a finite state machine compiler with output support for multiple languages, including C++, C#, Objective-C, D, Java, and Ruby. This makes it extremely useful for developers working across different projects.
00:08:01.840
It also generates various ways of working with its states and is particularly useful for building lexical analyzers and protocol definitions. Ragel stands out because you can attach arbitrary code to different states and transitions.
00:08:30.000
This capability allows you to match part of a string, jump to a specific place within it, modify it, and then come back. Unlike regular expressions, Ragel allows for much more control and flexibility.
00:08:58.540
It's worth noting how to pronounce 'Ragel.' Adrian Thurston, who created it, clarified in an email that it's pronounced 'ra-gel,' based on his nickname, Adrian, and a reference to 'regular languages.'
00:09:12.990
At a basic level, Ragel is a DSL for creating these state machines, especially useful for parsing protocols and data formats. It offers declarative text parsing, which is a powerful and efficient approach.
00:09:41.540
The basic structure of a Ragel file is written in a host language like Ruby, with calls to Ragel pieces interspersed throughout. For instance, if you want to define a state machine for matching 'fubar,' you can name your machines and refer to them in different parts of your file.
00:10:05.780
You define a state machine with simple instantiation via ':='. Just like any programming environment, Ragel allows you to include other files, and you can refer to the manual for more detailed semantics.
00:10:30.440
There are different types of literals in Ragel, including straight literals and regular expression literals. Furthermore, numbers may be denoted in decimal or hexadecimal forms.
00:10:53.480
You can also include regular expressions within Ragel, which is interesting since regular expressions themselves compile down into state machines.
00:11:14.500
For example, you can match a range of characters from 'a' to 'z', and Ragel allows you to refer to named sections inside your expressions. There are also built-in character classes you can leverage for common matches.
00:11:47.610
At this point, we have the fundamental building blocks for more complex machines. These primitives allow us to create numerous and complex simulations, essentially building with mechanical advantages.
00:12:12.420
As I was learning about Ragel, I discovered that the best way to understand it is through experimentation. Ragel provides functionality for combining and manipulating state machines in ways that allow for composition of states.
00:12:56.620
You can manipulate state machines by performing operations like union, intersection, difference, and concatenation. Exploring these operations leads to discovering how state machines interact and how you can build more dynamic systems.
00:13:21.990
For example, performing a union of two strings would yield a state machine that traverses both input streams simultaneously, while intersection would yield a match existing in both state machines.
00:13:50.530
The difference operation allows you to match any string in the first machine that does not contain any pattern from the second state machine. These building blocks can be effectively leveraged in practical applications.
00:14:13.950
Ragel also allows for repeated patterns, which can simplify complex matching requirements. Optimization happens as Ragel compiles, reducing redundancy within the state machine.
00:14:42.350
Additionally, Ragel permits attaching actions to states and transitions, expanding the power of state machines. You can define when you enter, exit, or cross state transitions to perform specific actions, giving immense flexibility within your programs.
00:15:20.510
There are various transition types, and you choose which to apply based on the context of what you're doing. Non-determinism comes into play when multiple state machines might overlap in input, which can cause ambiguities and requires careful handling.
00:15:50.650
Occasionally, you may encounter situations where multiple matches could occur, requiring clarity on priority among transitions. Ragel provides tools to manage this by allowing transitions to take priority based on your specifications.
00:16:20.900
Ragel allows you to define structures within your patterns using special notation for guarded and concatenated operations. These operations offer precision when working with basic match tasks or building more complex parsers.
00:16:52.230
Ragel excels at constructing scanners, which recognize tokens iteratively as they come into the system while consuming input. This parser can identify a chain of tokens without backtracking or wasting processing cycles.
00:17:24.760
For instance, in parsing HTTP headers, Ragel can streamline recognizing separate headers in a sequence and provide efficient data handling without contention between states.
00:17:52.010
At this point, we've been examining things from the perspective of regular expressions, but the level of abstraction with Ragel dives deeper into state machine terminology, with states and transitions forming the core of its functionality.
00:18:21.200
In practice, you define states and transitions, directing how data flows through the state machine. You can also set function calls within Ragel, enhancing modularity within your parsing tasks.
00:19:05.550
When parsing something recursive, regular expressions struggle due to their inherent limitations, whereas Ragel can overcome these through cleverly embedding actions and maintaining a structured environment for recursive data.
00:19:36.490
The encapsulating stack becomes crucial when managing nested structures. Using this approach allows you to push and pop context during parsing to maintain clarity of structure.
00:20:12.330
Internally, Ragel operates with data buffers, setting pointers to describe which sections to process. Interaction with these variables happens as the state machine processes input, leading to efficient data consumption.
00:20:42.970
Each transition updates buffers ensuring that when you hit a final state, you can extract processed information seamlessly.
00:21:07.620
The overall architecture allows Ragel to precisely control state transitions, culminating in a highly optimized parsing performance. All these optimizations foster an environment of quick parsing and efficient execution.
00:21:34.390
When dealing with complex inputs, Ragel empowers developers to articulate parsing logic that can handle advanced scenarios. The ability to redefine intersections and unions gives flexibility back to developers.
00:22:21.020
As you start constructing detailed parsers, Ragel enables you to clarify state logic through organized function calls. This structure reduces confusion and enhances comprehension of what is happening in the parsing process.
00:23:06.030
While constructing state machines might seem daunting, Ragel simplifies the process through its declarative syntax. You encapsulate your parsing logic within manageable chunks, promoting effective code reuse.
00:23:44.230
Ragel works efficiently even in competitive environments where performance matters. By leveraging the flexibility and expressive syntax of Ragel, you can build systems optimized for speed and clarity.
00:24:21.240
Moving from theory to practice, experiencing the straightforward installation process opens pathways to using Ragel in your applications. Installation on macOS can be as simple as using a package manager.
00:25:07.470
After installation, compiling Ragel files into usable formats for your applications is an incredibly straightforward process. Debugging also becomes manageable. You can output graphs in .dot format for visual representation of state flows.
00:25:42.610
As you interact with your Ragel-driven application, observe the structures formulated dynamically; it reveals the inner workings of your parsing states.
00:26:12.480
The new machine example I've created serves to demonstrate the real-time functionality of Ragel and its ability to build highly efficient state systems.
00:26:43.890
As we step through a demonstration, pay attention to how inputs affect the active state and how Ragel accurately identifies and processes data streams.
00:27:15.690
This visibility allows both developers and users to gain an insight into the performance impact and traversal of data through the state machine.
00:27:35.480
As you connect these concepts back to practical outcomes, Ragel's architecture lays the foundation for advanced operations and performance optimizations in state machine designs.
00:28:10.150
I've shown multiple examples and states while discussing how you can organize your parsing using Ragel. The simplicity of its syntax allows for rapid exploration of ideas rather than getting bogged down in complexities.
00:28:50.200
When faced with challenges related to parsing or state management, remember the principles discussed today and use Ragel to streamline your approach.
00:29:31.600
In conclusion, Ragel proves itself to be a highly versatile tool for building state machines that are both effective and efficient. I encourage you to explore its capabilities further and find new ways to improve your development.
00:29:52.330
Thank you for your time! I'm happy to take any questions regarding Ragel and its applications.