00:00:12.160
Hello, Mountains! This has really been an excellent conference so far. We have had some amazing speakers.
00:00:17.279
Some very enlightening topics have been presented. I love how MountainWest always seems to cover a wide range of subjects.
00:00:22.400
We get to explore systems programming, plant programming, and even people's pet projects. This afternoon, we're going to hear about some exciting topics related to different kinds of systems.
00:00:29.199
It's truly awesome being here. I just want to start off by expressing my gratitude for the conference.
00:00:35.440
Today, I'm going to talk about parsing expressions, specifically using them in the Ruby programming language. Can I quickly see how many people are familiar with the concept of parsing expressions?
00:00:41.600
How many of you have heard about or used libraries like Treetop? Okay, excellent! We are speaking to a pretty good audience here.
00:00:58.399
I just want to talk a little bit about myself. I've been using Ruby for about three years professionally. I work for Paf in San Francisco, where we are a small startup working on an iPhone app and social network. I handle all of our website and APIs.
00:01:12.400
So I understand something about the Ruby community, and I also know that many of you have a dirty little secret lurking in the bowels of your apps.
00:01:18.320
How many of you have one of these lurking deep in your applications? Can anyone guess what this expression does? It can be quite confusing.
00:01:35.440
I once worked on an app that had a regex similar to this. It started with 'http' followed by an optional 's' or 'x'. Apparently, this regex supports FTP as well.
00:01:59.600
When I saw this, I was unsure whether the person who created it knew what they were doing. As we developed our app, I hit a bug regarding URL validation, which mostly resulted in a 500 error when users attempted to save accounts with Blogspot addresses.
00:02:19.760
Any guesses on what the performance of this regex was against such URLs? Twelve seconds? Ninety-seven seconds? Yeah, it was terrible.
00:02:39.840
Our Unicorn processes were being slaughtered because after a minute of running on this expression, the master process would terminate due to the excessive runtime.
00:02:54.560
The regex beast was hiding deep in my app with its maddening complexity, leading to inefficiencies and confusion.
00:03:05.040
Be cautious if you’re using a regex like this. Commonly, the tendency is to find a regex for URL validation or email validation by copying and pasting from the internet. You may think you’re done, but you might not realize the potential pitfalls.
00:03:22.000
Let's step back and discuss the problem we're trying to solve with regular expressions. The core of the problem is the diverse text we manipulate in coding.
00:03:36.560
Text comes from various sources, like user input in forms or API responses in formats like XML or JSON. For standardized formats like XML and JSON, we have good parsers, but for everything else, we often resort to regex.
00:03:54.639
Regrettably, this has made regex a common hammer for text parsing, leading us to apply it even when a different tool would be more appropriate.
00:04:07.120
Regular expressions were not designed for all tasks, and we need to consider what alternatives exist. Although regex is useful for certain types of text, other scenarios may require a more efficient solution.
00:04:30.960
For example, many standards are complex and require structured parsing rather than one-off regex solutions. We want tools that allow us to define structured grammar that inherently knows how to parse according to specification.
00:04:57.760
When building our parsers, we want speed, simplicity, modifiability, and flexibility. If the parser’s complexity makes it difficult to maintain, then in practical terms, it becomes useless.
00:05:20.800
Most of us use regex because we are better accustomed to its mechanics than to structured parsing patterns. However, we benefit greatly if we can create flexible parsing systems that allow for better readability and maintainability.
00:05:35.440
Parsing expressions were first discussed at MIT by Brian Ford in 2004, offering a declarative alternative to regex.
00:05:56.800
They provide a recursive mechanism that allows for much more complex parsing in comparison to traditional regex.
00:06:18.640
With parsing expressions, it is easier to read and maintain code compared to regex. This is particularly important, as regex becomes complex with increased size.
00:06:40.240
Additionally, parsing expressions eliminate ambiguity due to their structured decision-making from token streams, while regex tends to backtrack, complicating the parsing process.
00:06:57.100
Most experimental performance shows parsing expressions can outperform regex for specific tasks because they are built to handle structured data better.
00:07:17.400
I was very interested in exploring parsing expressions, so I initially tried the Treetop library. However, I found it to be unmaintained and inefficient for my purposes.
00:07:40.800
This led me to create Citrus, a library designed for efficient parsing expressions in Ruby. You can install it using 'gem install citrus'.
00:08:00.000
Citrus is designed to be user-friendly and allows you to define grammars using familiar syntax that looks like Ruby.
00:08:33.280
The grammar rule names allow you to define relationships easily, and thus references rules within your grammar schema.
00:09:01.440
I prefer the intuitive layout of Citrus and its support for various expressions, enabling complex tasks without cumbersome regex.
00:09:28.480
Let’s quickly review how Citrus works syntactically. You can represent strings, define exact matches, or even use regex as a fallback when necessary.
00:09:52.640
Citrus makes it possible to create character classes or specify repetition with clear syntax. This supports a default of either zero or more matches based on your use case.
00:10:15.440
You can also implement logical ordering in your parsing expressions, ensuring one match follows another in a clear sequence.
00:10:46.000
The trees created during parsing enable efficiently tracking of matches under different paths logically.
00:11:05.760
You can achieve semantic value assignment through the block mechanism where you can convert matched string values into usable integers.
00:11:22.960
In this way, your parsing becomes both straightforward and extensible, allowing for deeper inspection into your match trees.
00:11:47.840
I’d like to demonstrate an example that would otherwise be impossible with regex: parsing nested parentheses structures.
00:12:20.400
I can define a simple grammar to match open parentheses followed by characters and closed parentheses while allowing for recursion in the parsing process.
00:12:55.280
To illustrate how the parsing mechanics work, let's look at matching simple arithmetic operations, using Citrus to parse and handle mathematical expressions effectively.
00:13:11.840
The process of parsing expressions with various objects in Ruby and creating logical operations is tremendous fun and can be done intuitively.
00:13:42.960
You can also develop more complex functionalities within the same framework, using Citrus to parse structured content from external APIs or user-generated data.
00:14:22.320
Considering how seamless it is to integrate such capabilities into your existing codebase allows developers to create dynamic applications efficiently.
00:15:05.120
In conclusion, using parsing expressions equates to increased agility and capability essential for modern application development.
00:15:32.040
Using the right tool for the job is crucial; parsing expressions in Ruby opens doors to better manipulation of data formats while avoiding the common pitfalls of traditional regex.
00:16:03.920
Thank you very much for your attention today! I'm excited to see how you implement parser techniques in your projects. If anyone has any questions, now is the time!