Talks
Generate Parsers! Prevent Exploits!

Generate Parsers! Prevent Exploits!

by Nick Howard

In the talk titled "Generate Parsers! Prevent Exploits!" presented by Nick Howard at MountainWest RubyConf 2014, the speaker explores the relationship between application exploits and formal language parsing, emphasizing the significance of generating specific parsers to enhance security. The main topic revolves around the vulnerabilities that arise from improper input handling in web applications, and how structured parsing can mitigate these risks.

Key points discussed include:

- Definition of Exploits: Howard outlines what exploits are, likening them to discovering vulnerabilities in a medieval fortress, which can lead to unauthorized access to sensitive data.
- Nature of Exploits: He explains that exploits can manifest through various forms, including buffer overflows and SQL injection, which all exploit undefined behaviors within an application.
- LangSec Introduction: The concept of Language-theoretic Security (LangSec) is introduced, which emphasizes the application of formal language theory to identify vulnerabilities in software systems.
- Input Validation Importance: The speaker emphasizes rigorous validation of inputs as a preventative measure against exploits, stating that bad input should be rejected outright.
- Examples: Howard provides an example of a past Rails vulnerability involving improper XML parsing that unwittingly allowed YAML execution, highlighting the perils of flawed input validation.
- Chomsky Hierarchy: He briefly discusses the Chomsky hierarchy, explaining that lower-decision languages are safer than more complex, recursively enumerable languages, which can complicate input security.
- Best Practices: Howard advises against using Turing complete inputs and recommends adopting strict schemas for parsing while leveraging robust language parsers to enhance security measures.
- Muskox Library: He introduces Muskox, a Ruby library aimed at simplifying parser generation, ensuring secure application input handling by providing immediate feedback on parsing errors.
- Future Integration: The intention to integrate Muskox into Rails applications for better form data processing is noted, promoting safer development practices.

Concluding Remarks: Howard's presentation emphasizes that effective input handling and structured parsing are essential to prevent exploits in web applications. By applying principles of formal language theory, developers can significantly enhance the security of their applications while ensuring clearer documentation and error handling.

For those interested in diving deeper into these subjects, Howard recommends exploring resources on langsec.org and the Muskox library on GitHub.

00:00:25.840 Good morning, everybody. I'm going to be talking about parsers and exploits today.
00:00:28.640 The title of my talk could be summarized as 'Generate Parsers! Prevent Exploits!' Alternatively, it could also be called 'LangSec for Ruby Devs' or 'The Theory of Computation is Actually Relevant to Web Development.' Well, maybe.
00:00:36.079 So, what am I going to discuss? First, I'll explain what exploits are and how they relate to formal languages. Then, I'll discuss the importance of generating parsers to handle those formal languages because that's fantastic. As a fun note, I’ll also include pictures of ducks and geese as analogies.
00:00:52.160 Exploits can be quite detrimental. Having an application exploited completely ruins your day. Attackers can find vulnerabilities in your application, akin to discovering holes in your castle walls or cracks in your dams. When they succeed, they gain access to all your sensitive data. This access could include your secrets and your databases, essentially gaining root access to everything—and that really sucks.
00:01:06.000 As an aside, make sure your passwords are hashed securely because you definitely do not want them exposed! Now, when considering exploits, they often appear as tricks. For example, if you perform a specific action unexpectedly, everything could go haywire. It can be challenging to anticipate where these exploits will manifest and how they function.
00:01:28.880 Exploits like buffer overflows, SQL injection, or cross-site scripting might seem unrelated, but at their core, they share a commonality. They represent unexpected computation within your application stemming from some control flow that you never intended to execute. This brings us to the theory of computation, which we can indeed use to analyze these exploits.
00:01:51.680 To clarify, exploits aren't just tricks; they're machines. These machines—computational machines—function similarly to standard programs. They take inputs, manipulate them, and produce outputs, sometimes resulting in malicious actions.
00:02:00.000 What differentiates an exploit, however, is its unique components. For an exploit to work, it must rely on undefined behavior within the application. This undefined behavior must be accessible through the application’s inputs, making exploits complex machines. The term 'weird machine' is often employed in the offensive security community to describe the peculiar programming languages that create such exploits.
00:02:21.440 Let's visualize this: if your application is vulnerable, the attacker sends input directly to the exploit, and that output will traverse your application boundaries. Instead of hitting the expected code, it redirects to the exploit, leading to unforeseen operations based on this manipulation.
00:02:38.880 So, how do we prevent these exploits? The straightforward answer is to break their communication. While undefined behavior within your application is problematic, if you eliminate the potential to exert that behavior, it simply becomes 'dead code.' Input validation prior to any processing is key.
00:03:00.320 This brings us to the discussion of what constitutes input. The inputs represent instructions to the exploit, much like program code in the language of that exploit. For instance, in the case of a SQL injection, you're essentially crafting a program using SQL as the language.
00:03:23.040 As an illustration, picture last year when Rails had a problem where it improperly parsed XML data that had a special element making it read as YAML. This is problematic because YAML allows you to instantiate any class in the runtime environment, which poses significant risks. An attacker could craft an XML document that uses the YAML element in a way that executes arbitrary Ruby code.
00:04:06.640 These scenarios highlight that the root of the problem lies in flawed input validation. To address this, we must implement stringent input validation measures to ensure that bad input never reaches the core of our application. If the input is deemed invalid, it should not be processed at all.
00:04:25.440 Within this context, I want to introduce the term 'LangSec,' which stands for language-theoretic security. This field in information security employs computational theories and formal language principles to enhance system security. By adhering to formal language models, we can make certain assumptions regarding security and identify areas that are potentially vulnerable.
00:04:49.400 Now, let’s discuss some of the core concepts of LangSec. In whether a formal language can encompass all statements, we face challenges like the halting problem which explains the difficulty of determining whether a program will terminate or loop indefinitely. For instance, in web applications, it's critical to know if the input will lead to a hang, as that can paralyze the application's ability to respond.
00:05:09.559 As you can imagine, certain types of inputs can lead your applications towards undecidable languages, which could present a scenario where your code may run indefinitely. Thus, accepting such inputs could make it impossible to secure your application reliably.
00:05:31.360 As we ventured into areas of input languages and their classifications, I want to touch briefly on the Chomsky hierarchy, which outlines various language complexities within grammars. The lowest tier consists of regular languages, which are relatively safe but limited in scope. At the other end of the spectrum lies recursive enumerable languages, where vulnerabilities are more likely to arise.
00:05:50.080 In summary, channels for recognizing languages are reliable to different extents. Everything below recursively enumerable is relatively safe—they are decidable. The critical takeaway is that if you're dealing with something recursively enumerable as input, the security implications can be severe.
00:06:02.560 Moreover, the comparison of two implementations of parsers also encounters decidability questions as you explore different grammars. If you're working with deterministic context-free languages, you can determine whether two parsers are equivalent, ensuring robustness in implementation.
00:06:19.440 While discussing best practices, I emphasize avoiding Turing complete input. These types of inputs, which can create unpredictable behavior in your application, often stem from poorly constructed command interfaces or configurations that accidentally allow such programming inputs.
00:06:37.440 It's crucial to steer clear of ad hoc validation strategies since they lead to weak input security. In the Ruby/Rails ecosystem, this is less of a concern due to Rails' structured paradigm which assists in adequate input validation practices.
00:06:58.720 In terms of input validation and parsing, we shouldstrive to maintain strict schemas for input formats. Employing robust language parsers is essential—leveraging stronger structures than what you're parsing ensures predictable behaviors and minimizes security risks.
00:07:16.320 For example, when processing HTML in Rails, they utilize XML parsers rather than trying to rely on regular expressions, which could lead to unexpected states. You want to ensure that certain behaviors are encapsulated within the parsing stage to avoid any potential exploits.
00:07:35.680 Reflecting on the Rails 4 introduction of Strong Parameters, it's evident that it brought forth positive changes, placing input validations closer to their utilization points. This streamlines the development process and enhances security through clearer input handling.
00:07:50.480 I developed a library called Muskox, which aims to facilitate parser generation—simplifying the creation of secure applications. It acts as a schema-based generator that allows for precise type specifications. If a parser doesn't work as expected, it immediately provides feedback regarding the error, detailing what was unexpected.
00:08:04.080 Muskox breaks down a parser into distinct components: a tokenizer that recognizes the language, in this case JSON, passing tokens to a validator that applies schema to ensure everything is safe. If everything checks out, it produces the corresponding Ruby objects.
00:08:19.679 Although I haven't created a primary extension for Rails yet, the plan is to incorporate this library into Rails applications to streamline form data processing. The aim is to reduce ambiguity in inputs, promoting better security practices.
00:08:33.840 If you're interested in LangSec, I recommend visiting langsec.org to find links to various talks that delve deeper into these subjects. You can also find my library Muskox on GitHub, where it is actively being developed.
00:08:56.320 Thank you for being here today. Your attention was greatly appreciated!