Talks
Cooking with Chef - Your Servers Will Thank You

Cooking with Chef - Your Servers Will Thank You

by James Golick

In a presentation titled "Cooking with Chef - Your Servers Will Thank You" delivered at the MountainWest RubyConf 2010, James Golick tackles the challenges of sysadmin work and showcases how Chef, a configuration management system, can mitigate these issues. He begins with a humorous nod to the Festivus tradition of airing grievances, specifically reflecting on the frustrations associated with sysadmin tasks that often clash with programming sensibilities.

Key Points Discussed:

  • Frustration with Sysadmin Work: Golick points out that sysadmin tasks can be tedious and manual, contrasting with the automation preferences of programmers. The variety of Unix service configuration languages adds complexity and difficulty to the process.

  • Illustration of Configuration Challenges: He shares personal experiences with configuring HAProxy for SSL, highlighting the obscure steps often required that can consume hours of time.

  • Introduction to Chef: Chef simplifies sysadmin work by standardizing configuration management with a Ruby-based Domain Specific Language (DSL). Golick emphasizes that knowing Ruby makes it much easier for programmers to pick up Chef without needing to learn another unique configuration language.

  • Central Unit of Work - Recipes: Golick explains that recipes in Chef declare resources, such as installing a MySQL server, and automatically handle platform differences, making the process less error-prone. Each recipe ensures idempotency, which prevents unnecessary actions if configurations are already correct.

  • Examples of Chef in Action: He discusses several practical applications of Chef:

    • MySQL Configuration: Chef handles the installation and service declarations across different systems seamlessly.
    • HAProxy Configuration: A simplified method to set up load balancers that include SSL features helps avoid the common pitfalls of manual configurations.
    • Security with Recipes: A recipe that enforces security measures such as iptables and SSH hardening, emphasizing the automation of essential security setups that are often overlooked.
  • Future of Configuration Management: Golick expresses hope that, with a community-driven approach, Chef recipes will simplify standard configurations for a vast majority of use cases in the future.

Conclusion:

Golick concludes with an encouragement for programmers to explore Chef, with resources like the wiki and IRC channel to support their learning. By leveraging Chef, sysadmin processes can become easier, more automated, and less prone to human error, making it a beneficial tool for developers managing infrastructure.

This presentation is a valuable resource for programmers looking to improve their sysadmin skills and modernize infrastructure management using configuration management tools like Chef.

00:00:14.480 Okay, so my name is James Golick, and I wanted to start out today by honoring my personal favorite Festivus tradition. No, not the feats of strength, although that might be something fun we can do later. I wanted to start out today the same way that every good Festivus starts: with the airing of grievances. Now, I don't have any sound, but I have this video of Frank Costanza explaining how the airing of grievances works. At the Festivus dinner, you gather your family around and tell them all the ways they have disappointed you over the past year. But instead of talking about all the ways that my family has disappointed me over the last year, I'm going to be talking about all the ways that sysadmin work has disappointed me forever. So let's get started.
00:01:13.760 There's a good reason most programmers hate doing sysadmin work: because it's boring. As a programmer, the idea that this horribly manual process of somebody sitting in front of a terminal and typing commands into an SSH session could possibly result in something that's production-worthy on a consistent basis just seems completely insane. Sysadmin work offends every one of my sensibilities. It's exactly the kind of process that should be automated by software, but for whatever reason, typically isn't.
00:01:47.280 And have you ever noticed that every Unix service in the universe has its own configuration language? I mean, there's no standard for that. These are some services that we use in our cluster at work, and every single one of them has its own configuration language. Unlike a real programming language or something standardized like JSON or YAML, there's no support for these configuration languages. If I'm writing Ruby or even if I'm writing JSON or whatever, I've got syntax coloring in my text editor, and typically, there's no way to even verify that your config file will parse other than starting the service. Trying to work on these configuration files is extraordinarily painful.
00:02:28.239 Personally, I'm amazed that anybody can get Nagios to work on a consistent basis. But even if you've figured out the configuration files, perhaps you’re a veteran sysadmin, there is always an esoteric detail waiting around the corner to bite you. Let me tell you what I mean by that. I don't know how many of you are familiar with HAProxy, but it's a great TCP and HTTP load balancer that we use as our public-facing load balancer in our cluster. The other day, we set up SSL on our site for logins and for a few other reasons. In order to get HAProxy to load balance HTTPS traffic, I had to do not one, but two things, which took me a couple of hours of Googling.
00:03:57.439 The first was that I had to turn it off of HTTP mode because HAProxy doesn't actually handle the encryption. Since HTTP mode is protocol-aware, it gets confused by encrypted requests. The second, more obscure thing was that since HAProxy does health checks on all the back ends to make sure that it's not load balancing against dead servers, I had to set an option for SSL health checks. Otherwise, HAProxy would think that all of my backend servers were down. This is easy to explain quickly, but when you're actually trying to configure this stuff, you know it's not so easy.
00:05:01.120 I think my friend Howard summed it up nicely when he said, 'There must be a way, except it's obscure and non-obvious, and you'll forget next time.' That's the Unix way. But the good news is that things are getting better. It's now possible to use configuration management systems like Chef to alleviate some of the pain that has traditionally been associated with sysadmin work.
00:06:00.720 Chef is a configuration management system, and the briefest definition of that I could think of is that it's a place where you store information about how to configure your systems. I realize at this point there are about a million configuration management solutions, particularly in the Ruby world, where they seem to be popping up all the time. This reminds me of the test framework boom of 2009. I'm a big fan of Chef; it's got an awesome community, and the configuration DSL is pure Ruby.
00:07:52.840 So, how many of you know Ruby? Pretty much everyone, right? So you basically already know Chef; you just need to read the API docs and figure out how the system works. That's a huge benefit. Some of the other configuration management systems that shall remain unnamed force you to learn yet another configuration language that isn't a real programming language, which seems backward to me if part of the goal is to stop having to learn all of these configuration languages. Let's take a look at how Chef can alleviate some of this sysadmin pain.
00:09:22.320 The central unit of work in Chef is the recipe. Each one of these blocks here, these declarations, is about declaring a resource in Chef terms. In case it's not obvious, here is a recipe for installing MySQL server, probably on a Debian-based system just because of the package naming. I'm going to go through each of these resources and explain a little bit about how they work. The first resource says, 'package mysql server action install.' This is pretty straightforward: it installs the MySQL server package.
00:10:23.840 But this is a cross-platform representation of package management, so if you're on a Debian-based system, it's going to install it with apt-get. If you're on a Red Hat-based system, it's going to install it with yum. If you're on Solaris, I assume it would probably install it with whatever the hell Solaris uses for package management. So right away we can see that some of the especially stupid details you have to keep in your mind when you're doing sysadmin work are starting to fade away behind this Ruby DSL.
00:11:48.000 The second resource is declaring a service in Chef. Again, this is a cross-platform representation of a service. If you're on a Debian-based system, it's just going to execute the init script directly. If you're on a Red Hat-based system, it's going to run 'sbin service' to execute the script. We tell Chef which commands this script supports, and we can tell Chef to enable that service, which means to make sure that it starts at the next boot. This is the kind of thing that I always forget how to do since I'm not really a sysadmin; it's just something I do for a few hours a week.
00:13:09.760 The third resource is a template resource. Templates are typically how we manage config files in Chef. We tell Chef about the source of the template in the cookbook repository, and we can set permissions and the owner and group of the config file. The last line starts to get into a little bit more interesting functionality of Chef: it says to notify the MySQL service to restart if this template changes. To understand how this works, you have to know that Chef resources, at least the ones that are written correctly—and certainly all the ones that are bundled with it—are idempotent. If you're not familiar with that term, it means a function that, when applied to the same value twice, will have the same result.
00:14:24.919 The canonical example of an idempotent function is absolute value: the absolute value of negative two is two, and the absolute value of two is two, and so on. When this template resource runs, Chef evaluates the template in memory and compares the evaluated template to whatever's on disk—in this case, at 'etc/mysql/my.cnf.' If they're the same, it's a no-op, so nothing happens. But if they're different, Chef will back up the existing config file and write the new config file to disk and call any callbacks that have been registered. In this case, restarting MySQL, so changing a tuning parameter in MySQL config will lead MySQL to get restarted—something you don't have to remember to do.
00:15:53.680 The templates are ERB. I'm assuming everyone in here is familiar with ERB. If you've ever done any web development with Ruby, you've probably seen ERB. This snippet is from our old HAProxy recipe, and what we're doing here is iterating over an array of app server IP addresses and emitting a line of config for each app server to create this load balancer. It’s not hard to imagine that if we were on a cloud, like, say, EC2, where we have API-based provisioning, you could run a command that would provision a new instance and run the Chef recipes on that instance. When they ran, this array of app server IP addresses would be one element larger, which would mean the template would get written to disk and HAProxy would be gracefully reloaded if you have your callback set up properly.
00:17:31.760 All of a sudden, this process of adding capacity to a cluster is a one-command affair instead of this horrible manual process involving a lot of human intervention and praying that your config files will parse correctly. The data that recipes consume are called attributes, and attributes are the unsung hero of Chef. This line that says, "this list of recipes" basically just says to install these recipes on the node. That's the only line in this whole JSON file specific to Chef; all the rest of this is completely arbitrary data that you can write your recipes to consume. What this means is that if you write your recipes correctly, you can really implement generic instructions on how to configure a piece of software.
00:19:12.720 For example, MySQL tuning parameters just go in the attributes. This means that someone trying to install or tune a MySQL server, if your recipes are sufficiently complete, might never have to open a MySQL configuration file and might not even have to be particularly aware of the syntax, depending on the service. You start to have these recipes that are an entire abstraction of configuring a service. Opening config files becomes a thing of the past; it's just a matter of writing a Ruby hash and running your Chef recipes.
00:20:56.920 All of this stuff lives in the cookbook repository. This is just a basic cookbook repository layout. The interesting points here are under the recipes directory, where you've got default.rb and server.rb. If you refer to a recipes list or a recipe dependency and refer to MySQL unqualified, it will run default.rb. If you refer to it as MySQL::server, it will run the recipe in server.rb. This is a really convenient way to organize your cookbooks.
00:22:06.880 For example, with something like Nagios, where you've got a host machine and a bunch of nodes that it does checks on, you need to install services on—you can keep everything in the same cookbook, which is convenient. Under the templates directory, there's a default directory where the my.cnf.erb file resides. It's possible to specify alternate directories named after particular platforms. For instance, if you have a directory under here called ubuntu-8.04, then if you're on Hardy, the templates in the Ubuntu 8.04 directory will take precedence over the ones in the default directory.
00:23:39.760 This is very convenient because a lot of people run FreeBSD as a firewall, and if you've got multiple platforms in your cluster, this is a really convenient way to organize things. Now let's look at some examples. HAProxy: I alluded to that earlier. Let's examine how we can solve the problem of setting up an SSL load balancer with HAProxy. The attributes for my HAProxy recipe look like this. Basically, we're setting up two listen directives: one non-secure on port 80 balancing to 70 and 71 on port 81, and that's on port 80. My HAProxy recipe defaults to HTTP mode, so we don't have to specify that. The second listen directive is a little more interesting because it says SSL = true and it’s on port 443 balancing to the same two machines on port 4443.
00:25:04.660 The interesting part is this SSL = true because that is not a syntax in the HAProxy configuration file; this is something that I coded into my recipe. So the next time I have to do this, I won't have to spend three hours Googling around for what the hell's wrong with my setup when I'm getting all these cryptic errors out of HAProxy. The implementation of this is really simple; this is the template where I'm iterating over those listen directives. Anyway, all this is kind of esoteric, but the interesting part is that if the option SSL is set, then set it to TCP mode and set this SSL health check option. It's really simple, but you know, this is going to save me a lot of time.
00:26:41.600 Since recipes are shareable, if they're sufficiently generic, I hope that in the future you'll be able to just grab a great open-source recipe that does a lot of this stuff for you. For 80% of cases, you won't even have to worry about the esoteric details associated with configuring Unix services. My second example is Heartbeat. If you're not familiar with Heartbeat, it's a service for managing virtual IP addresses in a cluster. It's usually used to create high-availability configurations. In our case, we have three public load balancers, but only one of them is actually active at any given time. They’re in a heartbeat, so if the active one goes down, one of the others will grab the IP address, resulting in very little downtime.
00:28:59.960 The attributes for Heartbeat are very simple: what interface we're heartbeating on, which nodes are in the heartbeat (this is a mapping of fully qualified domain names to private IP addresses), and we set a password for the heartbeat because you have to. The resources are the actual IP addresses being shared in the cluster. This is a mapping of the fully qualified domain name of the machine that should own that IP address, with the virtual IP address. This is really straightforward. The Heartbeat configuration is not complicated, but it's got at least three configuration files depending on how you set it up.
00:30:41.560 I can never remember which of these directives goes in which configuration file, so I wrote the Chef recipe and it’s just a matter of writing this simple Ruby code or data structure, and the Heartbeat gets set up for me automatically. We use this all over our cluster, and it's super convenient. Anytime we want to set up a heartbeat, I simply write out this little bit of Ruby and it's done. I don't have to think about any of the configuration.
00:32:20.000 The last example is really simple, but I think it's almost the most powerful, and it obviously has to do with security. This recipe has no attributes; just run the recipe, and it will do its thing. The recipe simply sets dependencies on two other recipes: the iptables recipe and the sshd recipe. The iptables recipe, by default, just locks the machine down. Of course, you can override that with attributes, but by default, the machine is locked down completely— all ports are closed.
00:34:14.680 The sshd recipe only turns off password authentication. This is really powerful because these two elements are like the basics of reasonable security on a Linux machine with a public IP address. A lot of people forget to do this. I’ve forgotten it many times; I’ve been in multiple companies that had popular websites where this wasn’t done. Most operating systems don’t ship with any method for detecting a brute-force attack on sshd, so if one of your users has passwords enabled and sshd is allowing password logins, you’re going to get brute-forced. It’s just a matter of time because you have no way of detecting it.
00:35:55.560 We set this recipe to run by default on all new machines that we provision, and it’s just set and forget. We never have to think about it; it’s really simple. But it’s just so easy to forget to do. We don’t have to remember anymore. If you want to learn more about Chef, the wiki is a pretty good documentation source. I mean, it’s Ruby, so you could get started relatively easily. The IRC channel is super awesome; the OpsCode guys, who write and maintain Chef, are amazing and really helpful.
00:37:08.160 They’re really eager to help people get started and get through problems and file bug reports. So definitely check out the IRC channel if you’re exploring Chef.
00:38:05.840 Are there any questions?
00:38:55.560 Yes, I mean, you can certainly write a recipe to build something from source. We do that by writing a pretty simple recipe to compile and install it somewhere, or we compile it in advance and just untar it in /opt or something. That’s typically the easiest way to do it because you’re not dependent on being able to get the file from somewhere, waiting for it to compile when you bring a new machine online. How many machines do you manage? Well, hundreds to thousands.
00:40:45.680 Currently, we don’t have hundreds of thousands of machines; we have like one hundred to two thousand. The biggest cluster we’ve had so far while on virtualized hardware was about thirty-five machines. The way that Chef Server and Chef Client work is that the recipes are distributed when they’re run on a cron job at the same time. I would suspect that if you had hundreds to thousands of machines, that might be a problem.
00:42:52.120 They tend to run on a node, so you can think about publishing recipes to the Chef server. You take the recipes and, as the clients come in, they’ll pull the recipes down from the server.
00:43:52.520 So, yeah, it just works. Any other questions?
00:44:18.080 Thanks.