Katie Miller
Pursuing the Strong, Not So Silent Type: A Haskell Story

Summarized using AI

Pursuing the Strong, Not So Silent Type: A Haskell Story

Katie Miller • February 10, 2016 • Earth

In the video titled 'Pursuing the Strong, Not So Silent Type: A Haskell Story' presented at RubyConf AU 2016, speaker Katie Miller shares her experiences transitioning from Ruby to Haskell while working at Facebook. Miller discusses the successful migration from an in-house interpreted language, FXL, to a strong and statically typed Haskell DSL called Haxl, which is used to combat spam and malicious content across Facebook's platforms.

The main points of the talk include:

  • Cultural Adaptation: Katie reflects on her experiences as an Australian adapting to UK culture, using this analogy to introduce her foreigner status within the Ruby community, as a Haskeller.

  • Haskell Language Features: The talk emphasizes two crucial aspects of Haskell: its functional programming paradigm and strong static typing. These features provide useful abstractions and correctness guarantees essential for robust software development.

  • Use Case at Facebook: Miller presents the use case of Haxl at Facebook, designed to evaluate user actions for spam generation efficiently. This involves using a rule engine (Sigma) to check the legitimacy of posts in real-time, leveraging Haskell's capabilities to handle high volumes of data efficiently.

  • Performance Improvements: The migration from FXL to Haskell resulted in significant performance enhancements, citing a 20 times speed-up in some cases and improved efficiency in data fetching and processing.

  • Myths Surrounding Haskell: Miller addresses common misconceptions about Haskell, such as its perception as solely an academic language, highlighting its practical applications in production environments like Facebook.

  • Learning Curve: She discusses the challenges associated with learning Haskell, especially for developers coming from imperative programming backgrounds, and outlines Facebook's organized approach to onboarding new Haskell programmers through structured courses and support groups.

  • Community Insights: Katie synthesizes valuable lessons from both the Ruby and Haskell communities, stressing the importance of community engagement and inclusive documentation in supporting new developers.

In conclusion, Miller invites the audience to explore Haskell and its functional programming concepts, asserting that learning from other programming languages can enrich one’s understanding and improve coding practices. She encourages attendees to engage with the open-source Haxl project and continues to advocate for the merits of Haskell in tackling practical software challenges.

Pursuing the Strong, Not So Silent Type: A Haskell Story
Katie Miller • February 10, 2016 • Earth

RubyConf AU 2016: In recent months Katie's Facebook team has completely replaced an in-house interpreted language, moving to a strong and statically typed Haskell DSL called Haxl. Dozens of Facebook developers have become functional programmers, using the open-source Haxl framework to battle spam at scale. This talk will explain how Haskell shines in this context, bust a few myths about the language, and highlight lessons Rubyists and Haskellers could learn from each other.

RubyConf AU 2016

00:00:00 Yes, I am Katie, and I am a former Gold Coast girl. I'm delighted to have an excuse to be back here, especially on the company's dime.
00:00:03 I left the coast in May last year when I landed my dream job writing Haskell at Facebook London. One thing I've learned living in the UK and working with people from around the world is that I do not, in fact, speak English; I speak Australian.
00:00:11 Those of you who have lived overseas have probably experienced something similar. I knew going over there that people would likely give me strange looks if I used words like "sunnies" and "pants," because they have vastly different meanings.
00:00:27 I quickly discovered that I could bamboozle people with all the Aussie abbreviations—things like "big E's" and "arvo" and "fireys". But what surprised me even more is just how many terms that, to me, are regular speech, could completely confuse my colleagues.
00:00:46 For example, one day I was at lunch, and someone asked me how I was doing. I said, "Oh yeah, I rate it a five out of ten," and they just stared at me, waiting for the rest of the sentence. They didn't understand that I was rating something.
00:01:05 Another time, I told my colleagues that I was 'paying someone out' and they had no idea what I meant. There are also very awkward moments, like when I told Brett I've been having a bit of a bad day and I "crashed the proverbial." Let me tell you, when you say this in England, people do not think 'anger'; they think 'bowel movement', and it can be a tad embarrassing.
00:01:19 These are lessons you learn as a foreigner. There can be awkward moments and blank stares from people, but actually, I find talking with others about these differences in language quite enjoyable. I really enjoy it, and that's pretty much what I want to achieve here today.
00:01:44 You might be wondering why I'm at RubyConf talking about Haskell. As a Haskeller, I am a foreigner here. Basically, what I'm hoping to do is give you a little bit of insight into the Haskell point of view.
00:02:00 I firmly believe that exposure to other languages and understanding them broadens our horizons and gives us lots of new ideas. So I'm hoping to do a little bit of that today and, hopefully, give you the impression that Haskell is worth learning about.
00:02:17 I think it's a language full of interesting ideas that is very different from Ruby in many ways. There are two things I want to focus on: the fact that Haskell is a purely functional language and the fact that it is a language with strong static types.
00:02:34 The functional programming approach, combined with a rich type system, gives rise to all kinds of useful abstractions and safety guarantees. These wins are among the major reasons why Haskell was chosen for the project I work on at Facebook, which is called Haxl.
00:02:52 I would like to share some of the story of how we came to be using Haskell for this at Facebook and explain why I think this language is a great match for our use case.
00:03:02 Our use case is fighting spam at scale, as well as dealing with malicious URLs, malware, and other nasty things that people want to put on the site. The process goes something like this: a user comes along and takes some action, maybe writing a status update or sending a message, which has to end up in a database somewhere on the back end.
00:03:22 Before we actually do that, though, we want to check: Is this activity legitimate? Should we let it go ahead? If the answer is no, we might want to take some other action, like saying, "Sorry, you can't post that link," or maybe presenting a CAPTCHA.
00:03:39 The system we use to do these checks is a rule engine called Sigma. Sigma is a Haskell and C++ hybrid. It runs rules written in Haskell in an embedded domain-specific language.
00:03:56 Haxl is this system that handles more than a million requests per second, so it needs to be really fast, efficient, and robust. We classify tens of billions of user actions every day across Messenger, Facebook, and Instagram.
00:04:08 The logic is expressed in one of these rules. It might look something like this: let's say we're trying to see if someone is spamming friends about Haskell. They write a post on a friend’s wall. We will check whether that post is actually about Haskell.
00:04:25 We will also check whether these people do not have many friends in common and if most of those friends like Ruby. If that is the case, we then decide to block the post.
00:04:42 This is the kind of thing we can express through our rules, which include both manually written rules and machine learning implementations.
00:04:50 At Facebook, we need to be able to respond quickly to spam attacks; you never know when they're going to occur. Therefore, these rules are continuously deployed.
00:05:08 The code that runs in Sigma in production is the same code in our source control repository. These days that code is written in Haskell in the DSL I mentioned, but prior to that, the journey to Haskell started with another language called FXL.
00:05:25 FXL stands for Feature Extraction Language. This was an in-house language developed at Facebook; an interpreted language, and I say 'was' because we completely migrated everything over to Haskell.
00:05:44 FXL was developed specifically for the Sigma use case of spam fighting. The decision to migrate to Haskell was made because it is a functional language with strong correctness guarantees, which is exactly what we wanted.
00:06:04 Choosing a functional language is significant because it means functions do not have any side effects; they only return results based on their inputs. I won't go deeper into this concept, as the anchors will be talking tomorrow about functional programming, but this property is crucial for the Sigma use case.
00:06:19 It facilitates one of the key needs this rule engine must achieve: efficient data fetching. When a request comes into Sigma to check one of those particular write actions, it's usually going to need to consult other data sources to compute that result.
00:06:38 For a Haskell spammer example, it has to determine how many friends a user has, and whether those friends like Ruby. Latency is extremely important here; we want data fetching to be as efficient as possible.
00:06:52 Instead of processing these requests sequentially, it makes sense to do them concurrently. We also want to batch requests to the same data source, and that's exactly what Haxl was optimized for.
00:07:10 Requests in the rules of code are automatically branched and run concurrently without the programmer having to write specific instructions about how concurrency occurs.
00:07:27 These and other optimizations in FXL were possible due to the functional style used. For example, we can run checks lazily, ensuring that if a condition fails, subsequent checks don't need to be executed.
00:07:45 This characteristic of pure functions allows us to memoize results, meaning if a calculation is repeated, we can save the result the first time and return it without re-fetching data.
00:08:01 Additionally, the system can safely reorder function calls since these functions don't modify any external state. This flexibility significantly contributes to the overall performance.
00:08:19 Another advantage of having pure functions is that we can safely run operations concurrently, as there are no side effects involved. This capability is invaluable in optimizing the performance of our system.
00:08:37 In this example with Haskell, we are checking against potential spamming while efficiently managing data flows, benefiting from the optimization features of Haxl.
00:08:57 So did this approach work? Yes, it did! The estimate was that it provided a 20 times speed-up compared to FXL, significantly enhancing data fetching efficiency.
00:09:23 However, FXL was an interpreted language, which turned out to be much slower than we had hoped. Its CPU and memory usage were often excessive.
00:09:39 While it was statically typed, it didn't provide strong enough correctness guarantees for our needs, especially for abstraction facilities like modules and user-defined data types.
00:09:56 So, we turned to Haskell. I realize Haskell might seem like a surprising choice to some of you. There are a lot of myths about this language.
00:10:09 Many people are shocked to learn that Facebook has Haskell in production, as they believe it to be an academic language meant only for researchers rather than practical applications.
00:10:28 Haskell is academic in the sense that it originated from academia, but it is also a practical, general-purpose language utilized in industry by big companies like Facebook, as well as numerous startups.
00:10:45 Why are people using Haskell? I think its purity, combined with a rich type system, allows you to reason about your code effectively. You can draw conclusions about the code based solely on what is presented, without diving into other parts of the codebase.
00:11:05 This has significant benefits for tasks like testing and property-based testing. It helps with understanding, refactoring, and, importantly, concurrency.
00:11:22 Here's the spammer example again, this time in Haskell. The syntax may be different, but the fundamental checks still apply. We ensure that checking for spam remains efficient and effective.
00:11:38 The key point here is that Haskell has optimizations that allow us to manage those checks effectively. The implementation does not require explicit concurrency handling, making it easier to code.
00:11:55 Furthermore, we benefit from the Haskell type system that offers abstraction facilities that were lacking in FXL. We can create our own data types, enhancing our code quality and preventing classes of bugs.
00:12:10 The goal is to use types in a way that the Haskell type checker alerts you if something is amiss. A strong type system like Haskell's is essential, especially with the continuous deployment model.
00:12:27 People are constantly committing rules to the repository, and a few minutes later, those rules go live in production. We want to be confident that those rules won’t crash Sigma or interfere with each other.
00:12:42 We achieve these guarantees through the type system, which has been a significant win for us. I've discussed some of the things that types can prevent, including issues with function definitions.
00:12:57 For example, I had a function checking whether you have a pair of IDs in a graph and a relationship between them. Issues can arise if there's confusion over which ID goes where.
00:13:15 In Haskell, we can wrap existing types with new names, separating them, allowing the type checker to verify that the values passed to functions are as expected.
00:13:32 This prevents mistakes where developers erroneously flip IDs during production, as the compiler will flag issues before the code even runs.
00:13:50 Another demonstration involves defining algebraic data types in Haskell. By structuring our types thoughtfully, we can ensure that all necessary implementations are accounted for.
00:14:02 When adding new types to these definitions, the compiler requires corresponding implementations to ensure no one forgets to handle cases, further promoting correctness.
00:14:18 These examples illustrate how the Haskell type system improves code quality, leading to fewer runtime errors, catching bugs at compile time.
00:14:36 However, some may argue that Haskell is difficult to learn. Many developers coming from imperative languages may think they can quickly pick up Haskell without realizing the steep learning curve.
00:14:54 Haskell's unique approach stems from lambda calculus, making it essential to adjust your expectations and understanding while learning.
00:15:10 While functional programming concepts can be simple in Haskell, the abstractions involved require time and practice to fully grasp. It's beneficial to see many concrete examples before this abstraction becomes clear.
00:15:29 At Facebook, we've adopted a structured approach to teaching Haskell, running three-day courses to help onboard developers onto the language.
00:15:50 We also have a group called Haskell Therapy, where people can ask questions and get support. This process helped minimize the number of abstractions taught upfront, allowing learners to focus on core concepts.
00:16:08 As a result, dozens of developers are now using Haskell and Haxl at Facebook, and they are successfully committing code just as they did with FX.
00:16:27 Another myth is that Haskell is a panacea and will resolve all your problems. Unfortunately, this is not true. We had a significant amount of code in FX that needed to be machine translated to Haskell.
00:16:46 Some of that ended up being quite convoluted, known as 'code tornadoes'. Just using Haskell does not guarantee more elegant code, and there are still lessons to learn.
00:17:05 The outcome of our efforts has shown that Haskell can perform three times faster than FX for some requests, with throughput improving by 30%. Moving to Haskell has been a significant win performance-wise.
00:17:26 Many developers have become comfortable with Haskell, and we're now reaching a point where they create their own types, which is fantastic.
00:17:43 We are effectively fighting spam using Haskell, and it has proven to be a suitable fit for our requirements.
00:18:00 The takeaway from this talk is that exploring and learning other programming languages is valuable. Haskell, in particular, has influenced many other languages, making it worth looking into.
00:18:14 If you have a chance, consider exploring these functional concepts in whatever language you prefer. Many of these ideas can also be applied to languages like Ruby.
00:18:30 As someone who has been part of the Ruby community in Australia for several years, I've gained insights into what the Ruby community does well.
00:18:46 One key lesson is community engagement. The Ruby community excels in creating an inclusive atmosphere and offers excellent documentation to help newcomers learn.
00:19:02 From the Haskell side, we definitely have things we can improve upon in supporting new learners, and it is something we should focus on.
00:19:16 As I conclude, I'd like to acknowledge everyone who has contributed to Haxl. This is an open-source project, and you can find more information about it on GitHub.
00:19:36 There's also a paper detailing the inner workings of Haxl, which you can check out for further reading. Cheers for giving this Haskell 'foreigner' a fair go.
00:19:54 Thank you for your attention!
Explore all talks recorded at RubyConf AU 2016
+15