Implementing a Visual CSS Testing Framework

00:00:12.240 Hi everyone! I got a response. Hi!

00:00:19.000 This is awesome! Okay, thanks for having me here. I'm going to get started.

00:00:26.000 Today, I'm talking about implementing a visual CSS testing framework. We're going to be using automatic screenshot comparison to catch style regressions.

00:00:32.160 My name is Jessica. Just to introduce myself, I'm jessicart on most of the internet. I work at a company called Bugsnag based in San Francisco.

00:00:37.480 Bugsnag is an exception monitoring tool, and I'm a software engineer there, primarily working in Ruby and JavaScript. Our stack includes many languages, and we also provide error notifiers for various languages and frameworks.

00:00:48.360 We support things like .NET, Objective-C, and Angular, enabling people to monitor their errors and crashes from all their different applications all in the same place. We are currently hiring, so please get in touch with me either here or at our booth at RailsConf.

00:01:07.439 If you're interested in working with developer tools at a small company, we have mugs and stickers at the booth, so feel free to grab those!

00:01:18.200 I also wanted to let you know that I have a written version of this talk available, as I might talk fast or some of the slides might go by quickly. If you're interested, you can find it on the Bugsnag blog.

00:01:29.600 Now, back to implementing a visual CSS testing framework. What am I even talking about? Writing, reading, and reviewing CSS can be pretty intense. Refactoring, especially, can be quite a challenge.

00:01:48.320 Generally, this is what my face looks like when I'm working with CSS.

00:01:53.759 At Bugsnag, we decided to tackle a huge multi-week project that involved an entire organizational and code refactor. We wanted a way to test that our site looked the same despite making significant code changes.

00:02:07.759 Unfortunately, that didn't always work out for us.

00:02:13.160 We went through many iterations of refactoring and realized we needed a tool to help us test the pages automatically. Otherwise, our testing process would look something like, 'Did you visit all the pages? Have you clicked on that? What about that border?'

00:02:25.080 This method was wasting a lot of developer time, so we needed a better solution.

00:02:31.040 We started hunting for a way to test our CSS. We wanted to know if there was a tool already built that could do what we desired, but we weren't exactly sure what we wanted yet.

00:02:42.280 After some digging online, we found several libraries that performed similar functions. It took quite a bit of effort to compile a comprehensive list.

00:02:54.400 From the list we found, we decided to try a few and see what happened. One of the first frameworks we stumbled upon was Facebook's open-source library, Huxley.

00:03:07.599 Huxley's README states that it takes screenshots while you browse and notifies you when they change. That sounded amazing and seemed like something we might be looking for.

00:03:19.640 However, I noticed that it hadn't been updated in over a year, which wasn't promising. I thought maybe it was completely done, and there were no bugs in it.

00:03:31.799 After spending a solid day fiddling around with it, I found it did work sometimes, but it was a bit too buggy for our needs. We didn't want to deal with additional bugs when we were trying to find existing ones.

00:03:47.400 It would have random failures and sometimes wouldn't take screenshots. So, we realized this wasn't the tool we were looking for and moved on.

00:04:02.360 Next, we tried a different library called Kiot. This tool allowed you to make assertions about your page's elements and how they were styled in the browser.

00:04:14.360 It sounded interesting, but upon checking the example code, I quickly decided I didn't want to use a library that required manually checking pixel distances, like whether an element was 10 pixels away.

00:04:28.919 Designs iterate quickly, and we don't want to be manually checking all those different heights. I was looking for a smarter framework, so this one didn't work.

00:04:42.240 We tried a few more libraries, but the same failures occurred. We were still unable to find what we wanted.

00:04:54.560 As I mentioned, there are a lot of frameworks out there, so before you rush home to write your own, check out existing ones to see if they meet your needs.

00:05:05.800 But I decided I needed to clarify what I really wanted. I started thinking about what would best fit how Bugsnag is built.

00:05:17.360 I wanted a visual tool to take screenshots rather than manually measure everything. We wanted a way to compare our production site with our local development.

00:05:30.800 For instance, let's say we have a feature branch that we just committed to; we would want to compare how our homepage looks on that branch versus how it looks in production.

00:05:50.759 In that context, we wanted to highlight any differences automatically in a way that was easy to visualize.

00:06:09.319 At Bugsnag, our web dashboard is built with Rails. This, combined with the fact that I wanted to take screenshots for testing, influenced my decision to write my own framework.

00:06:22.479 Many existing CSS testing frameworks are based on JavaScript, but I wanted to leverage our Rails environment.

00:06:40.760 At Bugsnag, we use Git for source control in a feature branch workflow. This means we have a master branch that is always deployable and stable.

00:06:58.639 When we create a feature, we branch off of master until the feature is complete, at which point we merge it back.

00:07:15.479 Considering the tools available to us and examining some of the screenshot libraries, I realized there wasn't actually that much code to them.

00:07:27.520 So, I decided to write my own framework. However, just as a disclaimer, this talk is not about promoting a gem; it's about walking you through my process.

00:07:41.240 In fact, this isn't a gem or open source; I do have a blog post available if you're interested in the code.

00:07:49.600 First, I needed to develop a process for how I wanted my tests to work. I required a way to automatically visit the pages of our site through an actual browser.

00:08:09.600 Once the test visited the page, I wanted it to take a screenshot of the entire page, not just the current viewport.

00:08:27.800 This was important in case changes occurred below the footer; we needed to capture those areas as well.

00:08:46.480 Next, I had to establish a storage method for the screenshots. I needed a way to upload and download these screenshots from that storage.

00:08:59.200 Using Git, I planned to upload a screenshot every time I pushed to a branch.

00:09:05.800 My feature branch screenshots would need to upload to a storage area and allow me to download the already uploaded master screenshots.

00:09:30.480 After uploading my screenshots, I needed a way to create diffs between them.

00:09:47.680 This involved comparing the latest screenshot from my feature branch with the one downloaded from the master branch, and marking the visual differences.

00:10:08.440 Finally, I wanted to display these diffs in an accessible way so everyone on the project could view the differences depending on the commit.

00:10:27.680 Now that I had a plan, I could start building our framework based on these requirements.

00:10:44.600 First, we needed to write tests that run automatically after each push, so we decided to use RSpec.

00:10:55.440 RSpec is a testing tool for Ruby, and we already use it for our tests in the Rails app.

00:11:09.840 We aimed for our specs to look simple, where we could navigate to a local URL and save a screenshot of that page.

00:11:22.160 We didn't want to have complex assertions that could fail, except if the tests failed due to a technical issue.

00:11:39.160 Furthermore, we needed to keep these tests separate from our main tests, so we marked them with the visual tag in RSpec.

00:11:55.520 This way, our visual specs wouldn't run with our main specs unless we specifically wanted them to.

00:12:12.400 Separating these specs also helped with local build speed since we didn't want our tests bogged down by waiting for visual specs to complete.

00:12:27.200 This approach allowed us to iterate on our main specs quickly and push more often.

00:12:40.960 In Continuous Integration (CI), we wanted our main specs to be fast, enabling us to merge non-visual pull requests without waiting.

00:12:54.720 At Bugsnag, we use Buildkite for our CI, which allows us to add steps to our tests, separating our main specs from the visual ones.

00:13:09.600 Next, we needed a way to visit web pages and take screenshots with our RSpec tests, for which we decided to use Selenium.

00:13:21.440 Selenium is a tool for automating browsers for testing purposes, and we specifically needed to use their web driver API.

00:13:38.560 This API allows us to drive a browser natively on local or remote machines. We needed access to an actual browser since CI doesn’t come with built-in browsers.

00:13:55.920 To achieve this, we decided to use a service like BrowserStack. Before running our visual tests, we needed to start up our proxy to BrowserStack alongside a forked Rails server.

00:14:14.240 After that, we would create an instance of our Selenium web driver and, following our tests, terminate these services.

00:14:29.440 We also enabled WebMock in our visual tests to prevent outside web requests during tests, ensuring we could run real requests with our local server.

00:14:41.520 To set up our Selenium web driver, we passed it the desired capabilities like the browser name and version we wanted.

00:14:57.440 Unfortunately, I found out that the capability of taking full-page screenshots only worked with Firefox, which limited our options.

00:15:14.920 While we couldn't use this tool as a browser compatibility tool, it worked well for our current needs.

00:15:29.600 After setting up our Selenium driver and Rails server, we could then save our screenshots in our tests by navigating to our localhost URL.

00:15:44.480 We established a local screenshot directory for a clean area to store all the screenshots temporarily.

00:15:58.080 Once this was set up, we utilized our driver to save screenshots to the designated path with an appropriate naming convention.

00:16:10.360 When we wrote tests for static pages like our homepage, we anticipated issues with dynamic data producing false positives in diffs.

00:16:27.600 To mitigate this, we set up fixture data for our RSpec tests and adjusted any other data not covered by those fixtures using Selenium's JavaScript support.

00:16:40.440 Now that we could take screenshots, it was time to figure out how to create a diff between two screenshots.

00:16:53.760 ImageMagick was perfect for this, despite its somewhat outdated website.

00:17:07.680 ImageMagick is a tool used for converting, editing, and composing images, primarily through its command-line tools.

00:17:23.760 One of these tools is 'compare,' which, with the necessary options, allows us to shell out and produce diff screenshots based on two other images.

00:17:39.320 For example, if we made a simple change like altering the header, ImageMagick would be able to spot those differences and provide a visual diff.

00:17:55.040 There are several options we can utilize with ImageMagick's compare tool, which I will explain next.

00:18:09.760 The compare tool will visually annotate differences between an image and its reconstruction, essentially producing a diff for us.

00:18:25.040 The compare tool can output a metric that gives us a measure of the differences according to a specified metric.

00:18:39.760 For instance, we used 'PAE,' which stands for Peak Absolute Error, to identify how much of a fuzz factor was necessary to make all the pixels similar.

00:18:55.440 This fuzz factor helps in cases where we want to ignore minor changes, such as when gradients render slightly differently in different browsers.

00:19:12.640 Right now, we don't utilize this output, but it could be valuable if we wanted to make our assertions fail meaningfully.

00:19:24.800 However, we don't strictly require failures upon detecting a diff, as it doesn't necessarily indicate an issue.

00:19:42.400 While working on the specs, I noticed some occasions where the diffs were not produced. This happened due to screenshots having different heights unexpectedly.

00:20:04.160 For instance, if we accidentally removed a footer, ImageMagick wouldn't allow for a default comparison, so we utilized sub-image searching.

00:20:20.480 Sub-image searching makes ImageMagick search for the best location of a small image within a larger image.

00:20:36.560 This process can be slow, but it typically doesn’t happen often in our tests since we don’t usually modify layouts that drastically.

00:20:54.720 Another issue we encountered was when screenshots were completely different, leading to ImageMagick not providing a diff.

00:21:07.960 To address this, we found the 'dissimilarity threshold' option that allows us to determine how different two images can be to produce a diff.

00:21:19.680 The threshold initially defaults to 20%, but we changed it to 1% to allow significant differences, especially when testing.

00:21:34.800 This change, however, may slow down tests considerably, so we used it only when necessary.

00:21:47.120 The last arguments to ImageMagick are simply the paths to the current screenshot, master image, and where we want to save the diff.

00:22:05.520 Now that we have our screenshots and diffs, we needed a place to store them online so we could retrieve them in our Rails app.

00:22:20.760 We decided to use Amazon Web Services (AWS) for cloud storage, leveraging their Ruby API.

00:22:39.360 We created a bucket called 'BugsnagShots' where we would host our screenshots.

00:22:56.720 Within our specs, we called 'save_shot,' which managed our screenshots directory, taking snapshots and uploading them to AWS.

00:23:12.560 The 'save_shot' method was responsible for getting the current screenshot, master screenshot, and diff screenshot to AWS.

00:23:30.640 We would first find the correct area within our AWS bucket and upload the current screenshot.

00:23:50.480 Next, we would download the master screenshot needed to produce the diff.

00:24:06.760 After verifying we had the master screenshot, we would execute our ImageMagick comparison to generate the diff image.

00:24:28.880 Once that was complete, we would upload both the master screenshot and the newly generated diff to our AWS bucket.

00:24:44.880 We used a naming pattern incorporating the commit SHA, area of the site, page name, and image type for files in AWS.

00:25:00.400 An example could be a commit SHA of 'A1A1A1', indicating the marketing section of our site on the index page, uploading the diff for that page.

00:25:15.360 The image types could include the current screenshot, master screenshot, and possibly the diff.

00:25:30.480 Now that we had our images on AWS, we needed to view them, but accessing screenshots from a bucket directly was far from ideal.

00:25:43.440 We didn't want the entire team fiddling with the bucket, so we set up a custom viewing page in our admin dashboard.

00:25:57.440 We created a page that provided a list of current branches, along with their last three commits.

00:26:12.760 The controller action retrieves our remote branches and verifies they're available in AWS, then formats the branch names for the view.

00:26:28.200 We loop through the branches and offer a way to prune remote branches for cleanliness.

00:26:45.120 When you click into a specific area, you can view all screenshots and diffs corresponding to that section.

00:27:02.480 The controller action fetching images for our view grabs these from our AWS bucket, allowing easy visualization.

00:27:21.120 At this point, we've completed our tool, but I think improvements can still be made.

00:27:40.480 Currently, tests pass regardless of whether there is a diff, only failing for execution issues.

00:27:50.680 It might be worth considering if a diff should trigger a failure in the future, but this needs careful consideration.

00:28:05.000 We should also consider if we should avoid uploading diff images entirely when there is no visual change.

00:28:18.440 This could save space on AWS and reduce clutter on our admin dashboard.

00:28:34.960 We might even explore automatically linking these diffs to relevant GitHub PRs, but this could create excessive notifications.

00:28:50.920 Currently, we only diff between the latest commit on a branch and the most recent commit on master.

00:29:05.680 It could be useful to differentiate between the most recent and the previous commits on the same feature branch.

00:29:22.400 This would help in identifying changes pushed directly to master as well.

00:29:37.680 Additionally, I'd love to connect our tool to more browsers to establish it as a backward compatibility test.

00:29:55.440 Anyway, that's all I have! Feel free to find me online or during the conference to discuss this further or just say hi.

00:30:06.640 I also wanted to mention that I’m working on a book with Just Enough Media about a command-line tutorial series I wrote a while ago.

00:30:22.720 It's in its early stages, so feel free to visit that page for updates. Thanks for having me!

00:30:41.840 Do any of you have any questions?

00:30:52.760 So the first question was whether I’m telling the tool which pages to visit or if it traverses the site automatically.

00:31:04.320 Right now, I specify which pages to visit but we use Selenium to click around to check different states.

00:31:18.760 So, currently, there’s no automatic traversal, but that could be something to implement.

00:31:32.440 The next question was whether we’re capturing any metrics from ImageMagick.

00:31:51.160 While it provides some metrics, we aren’t capturing them yet, but it would be easy to implement that.

00:32:03.280 Finally, there was a question about committing images to check diffs in Git.

00:32:14.960 Up until recently, Git was not recommended for images due to repo size increase issues.

00:32:33.000 However, GitHub's new Git LFS addresses those problems, which we need to explore.

00:32:47.760 Thank you!