Talks

How to migrate to Active Storage without losing your mind

How to migrate to Active Storage without losing your mind

by Colleen Schnettler

The video titled "How to migrate to Active Storage without losing your mind," presented by Colleen Schnettler at RailsConf 2019, explores the challenges and solutions involved in migrating a production application from Paperclip to Active Storage in Ruby on Rails. Colleen shares her journey of migrating a Rails application utilizing Amazon S3 for storage and provides insights into the inner workings of Active Storage.

Key Points Discussed:
- Introduction to Active Storage:

Active Storage allows Rails applications to easily attach files to Active Record objects and store these files in cloud-based storage providers. It's important to migrate because Active Storage is now the default solution for handling file uploads in Rails, and Paperclip has been deprecated.

  • Migration Steps:

    1. Install Active Storage and configure:
      Colleen explains the installation process, configuring cloud storage, and creating necessary tables in the database.
    2. Moving Data:
      The process of migrating data from the existing user table (where Paperclip stores attachments) to the new Active Storage tables (attachments and blobs) is critical.
    3. Writing a Rake Task:
      She discusses writing a Rake task to facilitate this migration, emphasizing understanding the structure of the existing data and mapping it correctly to the new tables.
  • Common Pitfalls:

    Colleen elaborates on potential issues, particularly focusing on the correct handling of keys and checksums when moving files to Amazon S3. She highlights how data relationships defined by Paperclip need careful consideration to avoid errors during migration.

  • Testing the Migration:

    After running the Rake task, it’s essential to verify the migration's success by checking that the correct number of records exists in the new tables and optionally peeking into the database to ensure data integrity.

  • Variant Processing:

    Active Storage's support for image variances offers new possibilities for image sizing, although she notes some limitations with image processing capabilities of the previous tools compared to the new ones.

  • Conclusion and Best Practices:

    Colleen concludes with a summary of her steps and stresses the necessity of validating each phase of the migration. She points out that while Active Storage may seem straightforward to use, it requires a solid understanding of different configurations and careful handling of existing data to ensure a successful transition.

By the end of this talk, attendees are equipped with knowledge on how to approach their own migrations confidently, avoiding common pitfalls and ensuring a smooth transition to Active Storage.

Main Takeaways:

- Active Storage is a robust solution for handling file attachments in modern Rails applications.

- Migration requires careful planning, awareness of database structures, and writing scripts for data mapping.
- Ensuring data integrity during the migration process is essential for a successful transition.

00:00:21.260 Hi everyone.
00:00:22.920 How about this amazing RailsConf 2019?
00:00:26.789 I hope you all have had an amazing week like I have, and I really appreciate you coming to my talk because I know you're probably tired.
00:00:34.110 My name is Colleen, and I run a Ruby on Rails consulting business.
00:00:36.629 Today, I'm here to share my adventures—or misadventures, as they were—migrating a production application from Shrine to Active Storage using Amazon S3 storage.
00:00:40.980 Actually, I used Shrine when I did this for my client, but for the purposes of this talk, I'm going to use Paperclip.
00:00:43.170 The five people that responded to my Twitter poll said they use Paperclip more than Shrine.
00:00:47.940 But first, I'd like to start with a little story.
00:00:51.449 So how did I get here?
00:00:54.149 I was contacted by a cool new startup looking for a Rails developer to do just that—migrate their solution from Shrine to Active Storage.
00:00:56.489 I was excited to work with this company, and this was the first time I was going to get to use Active Storage.
00:01:02.250 I had actually attended the Active Storage talk, I believe it was at RailsConf last year, so I felt quite confident in my ability to migrate this application.
00:01:06.509 For those of you who are not yet on Rails 5.2, let's start with what Active Storage is.
00:01:11.940 Active Storage is an easy way to attach files to Active Record objects and store those files in cloud-based storage.
00:01:17.729 Have you ever needed to add an avatar to a user or maybe a resume to an applicant? Active Storage helps you take care of all of those file attachment needs.
00:01:22.920 Well, that's great, Colleen, but Paperclip is working fine for me. Why should I go through the trouble of switching?
00:01:26.789 That's a good question. Why should you migrate to Active Storage?
00:01:30.209 The first and possibly most important reason is that Active Storage is now the built-in solution for handling file uploads to cloud storage in Rails.
00:01:34.560 It supports Amazon, Google, and Microsoft, and here's something fun: there are no additional migrations needed!
00:01:40.100 Maybe if you remember with Paperclip, every time you added a new file, you had to write a new migration.
00:01:45.660 Active Storage is different—it doesn't work that way.
00:01:47.520 And if I still haven't convinced you, Paperclip is deprecated, so you're out of luck!
00:01:51.449 So, I accepted the contract, and the first thing I did was I went and looked at the Active Storage docs.
00:01:56.209 In my experience, the documentation for Rails is usually excellent, and Active Storage appeared to be no different.
00:02:01.530 Step one: install Active Storage.
00:02:04.530 Step two: configure cloud storage.
00:02:08.610 Step three: add an attachment to a model.
00:02:11.310 Step four: let the magic of Rails extrapolate away all of the heavy lifting for you.
00:02:14.129 And it just works!
00:02:17.069 Well, has anyone tried to migrate an application to Active Storage by simply following these steps?
00:02:22.920 If you have tried, you might know that implementing Active Storage in a new application is relatively easy.
00:02:27.600 But migrating to Active Storage can be quite challenging.
00:02:30.720 Why is that?
00:02:34.560 Well, Active Storage is fundamentally different from Paperclip.
00:02:37.200 Paperclip works by attaching file data to the user table.
00:02:41.520 For example, here we have an avatar for a user.
00:02:46.050 So if we add an avatar to our user using Paperclip, it's going to change the users table.
00:02:50.040 It adds four columns to your users table.
00:02:53.670 Active Storage, however, creates two new tables: the active storage attachments table and the active storage blobs table.
00:02:56.910 So if we revisit our steps, I would say that step three, adding an attachment to a model, needs to be changed.
00:03:02.400 As Active Storage is not going to be able to access the data since there is currently nothing in your Active Storage tables.
00:03:06.209 However, we can still perform step one and step two.
00:03:10.590 Step one is to install Active Storage, create the tables, and configure your cloud storage.
00:03:14.430 The way this is set up right here is we have an Amazon S3 bucket that acts as our production storage and another Amazon bucket as our dev storage.
00:03:17.630 I created this contrived example for this talk so you can see I came up with a clever bucket name.
00:03:21.600 When we set this up on our production application, this is how it was done.
00:03:26.100 It's going to depend on your setup, but I would highly recommend testing this on a dev bucket in your cloud storage provider.
00:03:29.700 After you configure it in storage.yml, you then have to configure it on a per-environment basis.
00:03:33.920 What I'm showing you here is development, which is configured to use Amazon dev, and production that uses Amazon S3.
00:03:38.130 Great, that took just one minute!
00:03:42.030 At this point, you already have Active Storage installed and your Active Storage tables exist in your database.
00:03:47.840 Now, let's talk about step three.
00:03:50.760 I've changed step three to say: move avatar data from the user table to the Active Storage tables.
00:03:56.280 How do we move data from one table to another in our database? A rake task.
00:04:00.300 So we are going to write a rake task together.
00:04:03.600 Let's talk about this rake task: we're going to be moving a substantial amount of data.
00:04:08.400 It's not one-to-one because we have one user table and two Active Storage tables.
00:04:10.800 We'll also be mapping some data.
00:04:13.080 Understanding what we are trying to do is essential.
00:04:16.470 So we are moving this data from the users table to the Active Storage attachments and blobs, and we're technically copying it.
00:04:24.600 Reaching into my database to change records on a production application can be a bit scary.
00:04:28.800 I was told I wouldn't have to write any SQL, but unfortunately, that seems to be the case here.
00:04:32.640 Before we jump into the rake task, let's discuss the Active Storage tables.
00:04:36.810 The first table I want to talk about is the Active Storage attachments table.
00:04:42.420 We'll start with a name, which is the name of your attachment—in this case, 'avatar'.
00:04:45.960 Then you have your polymorphic association columns: user and user ID, followed by your blob ID.
00:04:52.500 Now, the second table is the blobs table.
00:05:00.120 If we look at the blobs table, the key is the location of your current file in Amazon S3 storage.
00:05:02.700 Then you have your file name, your content type, the byte size, and your checksum.
00:05:05.340 Now, how do these tables relate to one another?
00:05:08.520 On your left is the users table, and on your right is the Active Storage attachments table.
00:05:12.300 The user becomes our record type; the ID becomes our record ID, and the name becomes the 'avatar'.
00:05:15.030 You can see that the avatar file name from our users table will go to our blob as the file name.
00:05:19.740 The avatar content type will go to the content type, and file size will map to byte size.
00:05:27.930 Now we can start working on that rake task.
00:05:35.790 The good people at ThoughtBot put together the skeleton of a task that serves as an excellent starting point.
00:05:41.520 As I mentioned, they actually use a migration, but I advocate using a rake task.
00:05:44.130 If we look at this, we get our blob ID, and these two statements define our insert statements.
00:05:47.040 This is all cut and paste for you. After this, we're looping through all of the models and pulling out the attachment names.
00:05:52.410 The important thing to realize here is that this code used to pull out the attachment name is specific to Paperclip.
00:05:56.310 That's how Paperclip names the files on your user table.
00:06:00.870 So this string avatar underscore file underscore name is what we're looking at.
00:06:03.600 If you have one or two models with attachments, you do not have to do all of this.
00:06:07.290 You can directly call out the model and the attachment name instead of looping through every model.
00:06:10.590 Now, this instance represents your user, and in our example, we have the user avatar.
00:06:14.430 The statement user.avatar.path is important because it relies on the Paperclip relationship.
00:06:17.730 It is essential to note that this process requires two deploys.
00:06:22.050 Why does this process require two deploys?
00:06:25.920 The rake task we are building needs that user avatar relationship defined by Paperclip.
00:06:29.550 It also needs the Active Storage tables because it requires a destination for the data.
00:06:33.550 Now, Active Storage needs data in those tables, but you cannot run Active Storage without first running the rake task.
00:06:39.300 And the rake task is dependent on Paperclip.
00:06:42.120 Let’s return to our rake task.
00:06:45.780 What we have here is the blob insert statement.
00:06:48.960 The key and checksum methods are important and need to be written by you.
00:06:52.500 I did not include my specific solution because yours will be specific to your Paperclip and Amazon S3 configuration.
00:06:55.680 The key is how Active Storage will look for your files.
00:07:00.840 Before I move on, I have to mention a potential pitfall.
00:07:05.640 I used Paperclip, assuming the key would be 'user avatar path', but that can lead to issues.
00:07:10.380 Make sure your path does not return a forward slash unless that's the intention.
00:07:15.600 Now, concerning checksums: when I did this on production, we had about 80,000 images.
00:07:21.990 I ran each image through the MD5 process; some gems might provide the checksum automatically.
00:07:26.070 The last step is writing the records to your attachments table, which includes your model name and instance ID.
00:07:30.600 That's the entire rake task.
00:07:36.150 Next, after you run your rake task, determine if it worked.
00:07:40.200 The quickest way to do this is to see if the correct number of blob and attachment records were created.
00:07:43.620 If you're feeling adventurous, you can check individual records in your database.
00:07:47.760 I feel like I sped through a lot of code there, so let's do a brief overview.
00:07:50.880 We have created the Active Storage tables by installing Active Storage and running the migrations.
00:07:53.160 We set up the cloud storage through storage.yml at the configuration level.
00:07:57.540 We wrote the rake task to create user avatar records in the attachments and blobs tables.
00:08:03.420 We sourced the data from the user table, and hopefully, we've confirmed that records were created.
00:08:06.240 But we don't know if they're correct unless we check our database.
00:08:11.880 Before moving on, I recommend checking out a new branch.
00:08:14.880 You don't have to do this; you can push a branch, run your rake task, and then push another for Active Storage.
00:08:18.030 But it's easier to work with a new branch to ensure everything operates smoothly.
00:08:22.260 This was my preferred method; I made mistakes with the key initially.
00:08:26.520 So, I had one branch with Paperclip and another for Active Storage.
00:08:29.940 If it doesn't work, you can clear the Active Storage records, fix the rake task, and try again.
00:08:34.560 After successfully installing Active Storage, we need to alter our code, models, and views to use this functionality.
00:08:40.200 This is why it appears easy in the documentation; you just add 'has_one_attached'.
00:08:43.560 It only works if there is data present.
00:08:49.049 For instance, if you're using multiple sizes of images, you’ll utilize something called variants.
00:08:53.520 What’s cool about variants is that you can pick your image size on the fly without being constrained by predefined sizes.
00:08:56.150 If you're working with images, active storage does lazy transformations on original blobs, caching the variants.
00:09:01.470 I was working for a client to migrate this application and had a rake task, ensuring it was working.
00:09:08.760 But around thirty percent of our images were blurry, which was quite worrisome.
00:09:13.200 Why were thirty percent of our images blurry?
00:09:15.900 They were blurry because Active Storage uses MiniMagick for image transformation.
00:09:20.400 Unfortunately, MiniMagick does not support the advanced image processing we had previously utilized with Shrine.
00:09:27.240 This became a significant pain point for us.
00:09:30.420 Fortunately, there's a happy ending.
00:09:34.710 This experience is about eight months to a year ago.
00:09:38.160 I feel we might have been a bit early adopters of Active Storage, mainly due to that image processing issue.
00:09:43.560 However, Rails 6 should be addressing this specific issue.
00:09:48.600 Active Storage in Rails 6 has deprecated MiniMagick and is now utilizing the Image Processing gem.
00:09:53.250 Fortunately, now the resize functions that didn’t work with MiniMagick should function properly.
00:09:59.160 We have already discussed what steps we took: deploy with Paperclip, run the rake task, create the Active Storage tables.
00:10:05.040 The next step is to deploy with the Active Storage models and views.
00:10:08.180 If that works, then you have made good progress on your migration to Active Storage.
00:10:11.520 Let’s revisit all of our steps.
00:10:16.000 We installed Active Storage, configured cloud storage, and moved avatar data from the user table into the Active Storage tables.
00:10:19.950 Now Active Storage can perform its magic.