Talks

Power Rake

In this talk we will cover the "hidden" features of Rake that are not typically used by the casual Rake user. We will learn about the convenience of file lists, dynamic generation of tasks, rule based file generation and more.

GoRuCo 2012

00:00:16.560 Thank you! I’m glad to be here. It took quite a lot to get here; I don't know if you were following my travel adventures yesterday, with all the travel plans and whatnot. I think I visited most of the major cities between here and Cincinnati, including a few that are actually between those two locations. I don’t normally do shout-outs, but today I want to. If it weren't for my travel buddy Phil, I think he literally bent time and space to get me here, so big thanks to Phil!
00:00:49.360 Now, why do I want to talk about Rake? Rake has been around since about 2004; I think it was actually at RubyConf 2003 where I first gave a presentation on Rake, which was my very first presentation at a Ruby event. What I’ve noticed is that many people who use Rake come to it through the Rails community, viewing Rake as simply an easy place to dump scripts. However, they often do not fully understand the advantages that Rake provides. So, I want to discuss this a bit. I’m dividing this talk into two parts; the basic Rake boot camp I presented at the Rails conference covered some simple concepts, while this one, titled 'Power Rake,' will focus on more advanced topics.
00:01:24.320 I’ll assume that you already know a little bit about Rake, and we will go into detail about what those misconceptions are. To start, here’s exhibit A.
00:01:59.119 I didn't even know there was such a thing as a power rake! Have you guys ever seen a power rake? This is awesome! Look at that—look at the dirt being moved with that machine. If I had the sound here, you could hear the tractor running. It's amazing! But wait, there’s more. This next one is for sand flea raking. I'm curious where this terminology comes from; the English used here is rather stilted, as if it’s not written by a native speaker. Notice that I'm on a blind carbon copy, so it’s not like they're emailing me directly.
00:03:00.800 So, what in the world are sand fleas? It turns out that sand fleas are tiny little crustaceans that live in the sand on ocean beaches. They dig down into the sand at the water's edge as the tides come in and out. You can use a sand flea rake to catch them; you put it in the sand, and as the water washes away, the sand fleas remain behind, making them great bait for surf fishing. I bet you didn’t expect to learn that today! Now, you might wonder where the Rake icon comes from; it is actually modeled after a fire rake, which is used to dig up the ground. The teeth are very sharp, and it helps create fire breaks when fighting forest fires.
00:04:15.599 Let me quickly review what I assume you already know about Rake. First of all, Rake is designed to automate tasks with dependencies. For example, if you wanted to make mac and cheese, you’d need to boil water, buy pasta, and buy cheese. You declare the dependencies between these tasks in a built Rake file using the task nomenclature, specifying the task names and their dependencies. This is all basic stuff, which I assume you are familiar with. In the basic talk, we also discussed command lines and how we describe tasks, as well as file utilities that allow for easy file manipulation. If you want to learn more about basic Rake, you can check out my talk on Confreaks, which is available as a video.
00:05:21.120 With that in mind, let’s work through a more complex example than what we covered in the basic Rake talk. This example is provided by our user Bert, who wants us to create a directory of thumbnail images based on a directory of regular images. The project directory will contain an images directory filled with various image files: PNG, JPEG, GIF, etc. Our Rake file is located right under the project directory, and we want our Rake task to generate a set of thumbnail images that are 32 by 32 pixels, derived from the larger image files.
00:06:13.199 We'll start with something called a file list. I find file lists to be one of the more powerful features of Rake. In fact, I’ve been known to simply require Rake just so I can access the file list object. We create a list of image files by using the FileList constructor, which accepts any glob pattern. Every file name that matches that glob will be collected in the file list. Essentially, a file list is an array that knows it contains file names, allowing you to do interesting things. For example, you can fill a file list with multiple globs, and you can even use a double star pattern to search all directories beneath the starting point, enabling multi-level matching. You can also specify that you only want PNG or JPEG files, enabling basic matching. After creating a file list, you can include more items and even exclude files that match certain patterns.
00:07:06.560 What's really interesting about a file list is that it is lazy; after you've created it and specified its contents, it does not go out to grab those files until it is specifically requested to do so. If you print out the file list or ask for a specific file within it, that is when it will go out and search the directory. This means you can create multiple file lists at the top of your Rake file for use in several different tasks, without hitting the file system until necessary, thus saving time and resources.
00:08:01.360 Now that we have our list of images, I want to make a list of target file names where I want to store these images. To do this, I will use the pathmap method, which is a FileList method that applies a string to the list of file names. For this, I’ll create a list of files that begin with the target directory and append a dash-thumbs followed by the original filename extension. This way, I can easily derive the target file names based on the original file names.
00:08:29.920 In our example, if we have a name like images/gem.png, we construct a new name that points to thumbs, using percent-n for the original file name and appending -thumb, followed by the original file extension. This method allows us to easily construct our target file names using a simple mapping string. Now that we have a list of our images and the corresponding thumbnail files, we need to think about how we can actually convert one image to its thumbnail form. We can utilize a command line tool called ImageMagick. The command would essentially create the thumbs directory and use the convert command followed by the desired geometry setting of 32 by 32 pixels for the thumbnail, along with the source and destination file paths.
00:09:19.200 To set this up in a Rake file, we need to create a file task. File tasks are special because they are designed to work with files specifically. Regular tasks in Rake are fully aware of dependencies; they will run all their dependencies when executed. However, file tasks only run if their target file does not exist or is considered out of date compared to its dependencies. This means a file task can skip unnecessary work if the target file is up to date. To create a file task, we declare the target filename, pass the list of source files needed, and include the shell command to run ImageMagick.
00:10:10.960 Just to reiterate, the name of the file task is the target file you wish to create and it does not respect namespaces because they are tied directly to the file system. A file task only triggers if the target file is missing or if its timestamp indicates that its prerequisites are newer than the target, meaning it needs to be rebuilt. In this sense, a file task can be viewed as a recipe for generating a target file from a list of source files.
00:10:55.440 Now that we know how to take Rake to declare how to build a single thumbnail, you may wonder how to build multiple thumbnails. One straightforward approach would be to declare multiple file tasks, but this becomes unwieldy if you get beyond two or three tasks. Instead, we can utilize Ruby to iterate over the target and source name pairs while generating thumbnail tasks dynamically. We can use the zip method, which allows us to pair elements from two arrays together, making it easier to create a file task for each target and source file.
00:11:14.240 By zipping together the target and source files, we can dynamically define file tasks that handle the conversion using the shell command with the respective files involved. Notably, this method is efficient because if one of the source files changes, only its corresponding thumbnail will be rebuilt, rather than rebuilding every single thumbnail. After defining the thumbnail tasks, we add a task called 'convert' that depends on all the thumbnail files. When this 'convert' task is run, it will rebuild all the specified thumbnails for you. Lastly, we can set a default task for Rake, so if a user simply calls 'rake', it automatically runs the convert task.
00:12:17.520 We're able to take a complex operation of processing a large directory of images and build thumbnails for them, while ensuring that we only rebuild what is necessary. Now, Bert, our example user, requested that all those thumbnails be merged into a single image file. While the reasoning behind that is not entirely clear, it fits well into our example. We will introduce another file task that combines all the thumbnails into a final.png file, located in the main project directory. This final image will depend on all the individual thumbnails that we have previously created.
00:13:28.240 When we run the task, it will use ImageMagick's convert command to append each thumbnail to the final image. If we run it, we will see that the system attempts to build the output image file without first creating the thumbnails directory. It’s a bit disappointing because I expected the ImageMagick command to fail if the target directory didn’t exist, but it appears it silently fails without generating an error. When we run the built Rake task to convert thumbnails again, we notice it says it completed the task, but where is the thumbnails directory? We forgot to create that! This oversight needs to be corrected.
00:14:52.159 To resolve this, we can create a directory task, similar to a file task. The directory task is smart enough to create the directory structure as needed. If it’s asked to build a nested directory structure, it’ll check and create all parent directories as necessary. Once the thumbs directory is established as a prerequisite for each thumbnail file, we can ensure the conversion task just won’t run without the existing thumbs directory being created first. This adjustment allows the thumbnail generation to proceed smoothly, and running the `convert` command now builds the thumbs directory and converts the images as anticipated.
00:15:42.240 After running the update, we now find that we have successfully created the thumbs directory and converted our images! With Rake's efficient approach, we only regenerate the thumbnails that need updating. Next, Bert asks to digest a few additional requirements: he wants the ability to clean up the project directory by removing all the intermediate files and generated thumbnails once we complete the final image.
00:16:31.840 We can define a 'clean' task in our Rake file using a feature called rake/clean. This pre-built functionality provides us with two tasks—clean and clobber. The clean task is meant to remove intermediate files, while clobber goes further to remove final generated files as well. Integrating both into our Rake environment gives us a neat solution to achieve a pristine state for the project when needed. Calling the clean task alone will delete the necessary files, while the clobber task will ensure everything is wiped clean.
00:17:21.680 Bert chimes in with a new request, stating that the images are not only in the main images directory, but also spread across multiple subdirectories. He wants us to include those additional images while generating the thumbs directory and ensuring proper directory mapping. We’ll adapt our Rake file once again to accommodate this change, either by adjusting our file lists to capture all image files, including nested directories, or implementing logic to maintain the same directory structure in the thumbs directory after processing images. Utilizing a double star glob pattern will help us match all image files within their respective subdirectories.
00:18:40.960 Next, we’ll revise our path map command to replace the source directory ‘images’ with ‘thumbs’ while keeping the original path structure intact. Consequently, we adapt the dependency handling for creating subdirectories in our Rake file, which in turn allows our conversion processes to recognize and respect the original directory structures. When we run the Rake task this time, we observe how it efficiently builds the thumbs directory while creating thumbnails from images in both the main directory and the subdirectories. Rake does the job effectively, ensuring everything remains well-organized.
00:19:48.960 Let’s take a moment for questions before moving on to the next significant section. Feel free to ask anything about what we’ve covered so far in regard to Rake and our examples, as I’m happy to clarify any points of confusion you might have.
00:20:34.960 In the next section, we will delve into using Rake rules—essentially the process of managing rules and creating streamlined workflows in Rake. If there are many image files involved, translating these into dynamically generated tasks can lead to performance overhead—especially as the number of tasks significantly increases. An alternative approach is to set up rules effectively. When invoking Rake, if it encounters a request for a specific file and there isn’t an explicit task present, it will check for an existing rule that matches the file’s name and corresponding source file. Rules allow for powerful abstractions that streamline automation processes while minimizing the complexity of multiple tasks.
00:31:48.960 As we transition further into rules, I will explain how traditional assumptions around building complex projects can be simplified into manageable methods via Rake. I’ll share examples drawn from my experiences, demonstrating how to utilize rules for building extensive workflows. Finally, I’ll offer recommendations for effectively integrating Rake into your project lifecycle. Please hold your questions until we reach the end, as I want to ensure we have enough time to thoroughly cover all sections and key points. This will help you leverage the complete power of Rake as a build tool efficiently. Thank you for your attention, and I hope you find this information enlightening.
00:45:35.920 So, to wrap up, Rake offers flexible solutions for automating workflows of varying complexity in Ruby projects, and I'm excited to share more about these techniques!