Talks
Crushing It With Rake Tasks

Crushing It With Rake Tasks

by Barrett Clark

In the video "Crushing It With Rake Tasks," Barrett Clark discusses the utility of Rake, a Ruby scripting tool primarily used in Ruby on Rails projects for automating repetitive tasks. Clark highlights that while database migration is a critical feature of Rails, Rake offers broader capabilities for managing project workflows and simplifying processes through custom task creation. The video covers several key points:

  • Introduction to Rake: Rake is presented as a Ruby equivalent of the Unix Make tool, allowing script automation through a Ruby-based Rakefile.
  • Database Management: Clark emphasizes the significance of Rake in handling database migrations, detailing commands like 'bundle exec rake db:migrate' and strategies for managing changes to migrations, including reverting and redoing migrations.
  • Database Seeding: He discusses advanced techniques for database seeding, using 'firstorcreate' to prevent duplication and ensure unique entries upon seeding. This is especially relevant for scenarios where re-running seeds could lead to multiple identical entries.
  • Custom Rake Tasks: There’s a focus on how users can create their own Rake tasks for tasks like importing data from CSV files. Clark explains how a custom task can manage importing airport data by checking for existing records to avoid unnecessary duplication.
  • Task Management: The video illustrates the ability to run multiple Rake tasks sequentially, optimizing workflow and efficiency by combining operations like migrations and log clearing into a single command.
  • Helpful Rake Commands: Instructions for handy Rake commands are provided, such as clearing logs and locating TODOs and FIXMEs in the codebase.
  • Broader Applicability: Clark stresses that Rake is not limited to Rails applications; it can be utilized in any Ruby project, showcasing its flexibility.

The conclusion reinforces that Rake is a vital tool for developers, enhancing productivity and ensuring processes are streamlined. Overall, the material presented in this session from RailsConf 2016 encourages developers to leverage Rake for both standard and custom tasks, highlighting its powerful nature in Ruby project management.

00:00:10.599 All right, thank you all for coming after lunch. I know there are lots of good choices, and it’s hard with so many tracks and workshops going at the same time. I appreciate you being here; it's good to see so many seats taken. I'm Barrett Clark, and I've been working with Ruby for about a decade. I like to run, and I currently work at Saber Labs, where I do research and build prototypes to try to make travel suck less. Saber is a travel company, and I like to be outside as much as possible. But we're not here to talk about me; today we're here to talk about Rake.
00:00:24.960 So let's get started. Rake is a Ruby scripting utility used to streamline repetitive, tedious tasks. It can be considered a Ruby version of Make, the Unix build automation tool. Similar to Make, we have a Rakefile, much like a Makefile, but the Rakefile is written in Ruby. This allows you to perform all these tasks in Ruby. Rake was created by Jim Weirich, who unfortunately passed away in February of 2014. I still miss him; I miss his spirit, his influence, and his energy. This was one of his last tweets: there was a new TV show called Rake, but it was confusing because it wasn’t about Ruby or build automation, which is unfortunate because it could have been a good show.
00:00:56.840 Okay, Rake can help us manage our databases. If you’ve done any Rails development, you’ve probably used database migrations, and in that context, you’ve used Rake. In my opinion, this is probably the biggest killer feature in Rails: database migrations. Here we have a migration from the Rails 5 beta. So, of course, we’re going to create a Blog. Oops, it works! We've got our timestamps for created_at and updated_at. We’ll have two fields: a title and a body, and wouldn’t it be great if all blogs were as short as these? These are just strings, so they’re varchar(255). To do that, we'll create our table using the command: 'bundle exec rake db:migrate'. You'll note the inclusion of 'bundle exec' because Rake is a Ruby scripting utility.
00:02:06.200 You could just run Rake, but we're in a Rails app, and we want this to work in the context of our Rails app's Gemfile, and Bundler can help us with that. We run 'bundle exec rake db:migrate', and you'll see I’m going to run the create post migration. It's going to create a table named posts, and we’re good. But what if I changed my mind? Maybe you’re indecisive like me and you want to change fields, change the names, or add fields without knowing exactly where this is going quite yet. I want to add a permalink to my table because we want to tweet about all the things we have to say and we want people to find our blog.
00:04:43.920 We could create a new migration, which is reasonable, but we can also edit our existing migration. I’ll discuss in a minute about why and when we make those choices, but let's just say it’s totally okay to do so. We're going to add this new field in there. There are a couple of ways we can revert this thing and rerun it. The first option is to use 'db:rollback', specifically 'bundle exec rake db:rollback', and that will reverse the migration. ActiveRecord understands how to undo many operations, including creating tables.
00:05:12.360 If it’s not a reversible migration, you have to specify the up and down methods separately, and it would run the down method in that case. To revert our create post migration, for instance, we would drop the table, which means you're going to delete the table and all its data too, but that's okay since this is development. You wouldn’t do this in production. The other thing we can do is use 'bundle exec rake db:migrate:redo'. This is great; it rolls it back and then rolls it forward again. For example, we'll drop the table for create post and then recreate it. This process deletes the data but recreates the table fresh.
00:07:22.000 It’s a handy feature I use a lot. Oh, I changed my mind again. Let’s add an index to the posts table on the permalink field, so we avoid table scans every time we look up that field since table scanning is inefficient. However, introducing a new database object means that when we try to roll it back, ActiveRecord will need to undo everything it did, including dropping the index first and then the table. If you drop a table, it deletes all its objects. ActiveRecord makes sure to thoroughly handle this, or else we’re going to have a bad time. Don’t worry; that guy survived, he’s fine.
00:10:01.080 We need a different strategy: we can drop the database, recreate it, and then remigrate. We would run 'bundle exec rake db:drop db:create db:migrate', which allows us to run multiple tasks in sequence. Yes, this approach is heavy-handed and considered scorched earth, but again, we’re in development and in early stages, where we just created this table and are not quite sure about our way ahead. You might ask, "Barrett, when can I change a migration? Can I just change a migration and rerun it?" The answer in computer science is always "it depends".
00:12:59.040 Limitations on new fields are totally fine, and you don't have to worry about existing database objects when you rerun them. If you try to add new objects like indices or foreign keys or whatever reason to create new tables in that migration, that's when you would have to drop the database and start over. If you've already committed and pushed your changes, don’t change it; that's bad practice. I will tweak a migration and rerun it during initial development when I am still figuring things out. But once I’ve set that in stone, it’s fixed, and I will need to create new migrations.
00:15:44.000 But I don’t want to create ten migrations while I’m still trying to decide what needs to be in this table. So let’s keep going with Rake and database management. We’re going to discuss advanced database seeding. When you create a new Rails app, you get a seeds file, and it looks like this with comments outlining tasks you can run, such as 'db:setup'. When you run the 'db:setup' task, it executes everything in that file. An example scenario would be creating a couple of movies and then a movie character.
00:19:13.600 However, the problem is that every time we run this, it will rerun everything that’s in there, which is not ideal—Star Wars and Lord of the Rings are trilogies, but we don’t need multiple copies of Star Wars movies in the database. ActiveRecord has our back; instead of just creating, we can use 'first_or_create'. With this command, if the condition is not met (meaning the record doesn’t exist), it will create it; otherwise, it will return whatever instance it finds. The 'first_or_create' will return the object or nil if nothing is found.
00:21:59.000 Postgres 9.6 recently shipped upsert, allowing us to update or insert records. So perhaps we should look out for updates in ActiveRecord to support that. But for now, 'first_or_create' allows us to safely rerun the 'db:seed' task multiple times since it won't duplicate entries if they already exist.
00:23:58.640 So why not rake all the things all the time? We can run multiple tasks sequentially, ensuring our database is up to date each time we can safely run the 'db:migrate' task multiple times. If nothing is to be done, it simply won’t execute. We can also safely rerun our seeds task, which will do nothing if there’s nothing new to create.
00:27:02.520 We can also create a custom Rake task. For instance, this one will take a CSV file and load it in, where the CSV file has a header row matching the fields in the table. I have a custom converter in place; any field that is an empty string will convert to a NULL that the database understands. Let’s break this down: we have a Rake task in the 'db' namespace to import airports, which will look to see if there are any airports already loaded. If there are records there, we assume this task has already been run and don’t need to repeat it.
00:29:01.560 CSV from the Ruby Standard Library is quite handy. For each filename, we open it line by line, record by record. We tell it there’s a header row, so each row becomes a hash where the header field names serve as keys, and the hash values contain data. I’ll convert them to symbols so that the keys have that format, as I prefer symbol keys for hashes. If a field has data that's not NULL but is simply an empty string, we’ll convert it to NULL before sending that hash to 'Airport.create()'. This works because the field names are the same, allowing a hash to be directly pumped into ActiveRecord and quickly create multiple records for us.
00:30:48.160 Again, we could have used 'first_or_create', but I prefer to avoid loading up with 8,000 objects and generating meaningless logs for already existing records when we could simply drop a new one if it’s not there.
00:32:50.320 We also discussed a Rake task that loads the CSV file and imports it. To invoke this custom Rake task, we use 'bundle exec' to ensure it runs with our Rails app’s Gemfile context. We have 'rake db:seed', part of the 'db' namespace. Let’s imagine our logs are getting out of hand; too much logging might indicate we need to clean up that process. There’s also a Rake task for that in Rails. You can look at your logs directory and reveal that the test log is significantly bigger than the development log.
00:34:33.840 If you run tests frequently, perhaps using a Guard which reruns tests upon saves, it can quickly grow. By running 'bundle exec rake log:clear', it will open and truncate each file in your log directory. However, it won’t provide output indicating what records were cleared, which would be nice to know, but Ruby on Rails is open-source, so there's always a way to modify that. I invite you to find me afterward if you're interested in discussing improvements.
00:36:10.760 Why not just rake all the things? We can ensure our database is up to date and our logs are cleared. Performing several operations, we can run 'bundle exec rake db:migrate db:seed log:clear' in one go. When we rerun the migration and seeding tasks, Rails can also provide insights into the notes sprinkled throughout our code. By entering 'bundle exec rake notes', it will scan for all the TODOs and FIXMEs in your files, providing a line number for easy navigation.
00:38:24.240 Remember, documentation goes beyond simple comments; your app can communicate its capabilities and usage. The routes file is a vital resource, which I always check when dealing with legacy Rails applications. The routes file shows the available routes and actions for your app, providing clear insights into how users can interact with it.
00:39:15.200 Finally, I’d like to point out that Rake not only benefits Rails projects; it’s a Ruby scripting utility that can assist in non-Rails Ruby projects as well. Simply include the Rake gem and create a Rakefile.
00:39:59.520 We can create simple Rake tasks with specific descriptions. Here’s a simple example task that outputs 'hello' when invoked. Adding the 'world' task next, we can create them in a namespace to group similar tasks together. The third task depends on the first two, ensuring they run in sequence. It outputs 'Hello, world' as it processes those tasks.
00:41:36.320 We also added a fourth task that takes parameters, such as a name. In this case, it outputs a greeting including that name. Rake tasks can also be invoked inside Ruby scripts. To do so, include Rake in the script and call the Rake task name. You can even pass in parameters in parentheses.
00:44:29.080 Remember to set a default task within your Rakefile so that running Rake without specifying a task yields a friendly response. When Rake is executed without arguments, it displays available tasks and descriptions, helping users know what's available.
00:47:30.800 Furthermore, Rails ships with a plethora of Rake tasks, all of which are accessible for review in the Rails gems. By examining these, you can gain insight into their function and potentially contribute your own improvements. For instance, with the log task, it simply truncates log files; modifying its functionality could be possible based on your needs.
00:50:29.920 Ultimately, Rake is a powerful automation tool that can help streamline common tasks. Whether managing development processes or ensuring your database is clean and up-to-date, creating custom Rake tasks can greatly enhance your workflow. If you have any questions or want to discuss data visualization, my upcoming book on that topic will be available online soon.