Getting Ready for I18n, Shopify's case study

00:00:00.030 Hello everyone! I hope you're all enjoying the day and that you've had a great time during the conference. Personally, I've had a wonderful time. I’d like to thank all the speakers for the quality of their talks; they were truly amazing. I learned a lot.

00:00:20.939 As the last talk of the day, I know I'm probably the last obstacle between you and that tasty Bulgarian beer! If you haven’t tried rakia yet, it’s really good but be careful; it’s pretty strong!

00:00:26.369 Earlier, I was thinking about presenting a really interesting topic, but I was worried you might fall asleep. So, let’s make this more exciting. Today, we're going to talk about localization, specifically focusing on the process we went through at Shopify to make our platform more accessible.

00:00:38.010 When discussing localization, the first key aspect to consider is the necessity of translating your product so that it can be used by users from different countries. In our case, we needed to translate the admin area of our platform.

00:00:47.470 This involved literally translating hundreds of thousands of words, if not millions. Initially, we faced two main challenges: detecting hard-coded strings visible to users and preventing new hard-coded strings from being added.

00:00:59.760 To tackle this, we developed a tool to identify hard-coded strings and sent out an email to all developers, reminding them that any new content added must be translation-ready. However, we knew we couldn’t just rely on everyone reading their emails.

00:01:12.180 Due to time constraints, we initially decided to focus mainly on Ruby files and HTML, where we recognized most of the issues were surfacing. After analyzing our code base, we realized that a significant number of hard-coded strings were scattered throughout the application.

00:01:25.290 The majority came from various sources, including HTML and embedded Ruby in views, as well as in Ruby helper files and controllers. Surprisingly, we also found hard-coded strings within CSS files, which was an unexpected discovery. Today, we will focus on the challenges we faced in this localization effort.

00:01:48.370 In order to proceed effectively, we needed a strategy to localize our products. This started with a straightforward need: translating every product and ensuring users can understand the content based on their language and region.

00:02:04.990 To summarize, at Shopify, we realized that our administrative interface had to be fully translated. This necessitated a tremendous effort, as we were looking at processing hundreds of thousands of terms that needed localization.

00:02:20.069 Many developers often think they can quickly fix issues by creating a simple regex, but this approach proved to be more complicated than anticipated. Ultimately, we needed to build a tool that would accurately detect hard-coded strings across our application.

00:02:36.059 The first step was to identify the strings visible to users rather than internal logical strings. Then, we took action on ensuring that developers did not inadvertently add new hard-coded strings. Implementing this change required a comprehensive understanding of our development process.

00:02:51.220 To manage the content that was being added, we focused on Ruby on Rails and HTML files. Initially, this approach seemed adequate, as most of the strings were embedded in these areas. We prioritized where most hard-coded strings were coming from.

00:03:03.780 To implement this, we used static analysis tools.

00:03:05.320 The main rationale behind our choice of a static analysis tool is that developers typically add content directly to the HTML body or through Rails helper methods. For example, when using the link_to method, the first argument is a string that will be visible to the user.

00:03:16.650 To summarize things, our goal was to ensure that whenever developers were adding hard-coded strings, they could clearly reference whether they were utilizing those strings in appropriate contexts.

00:03:26.650 This meant leveraging our knowledge of method signatures and ensuring we only flagged violations when developers used the correct helper methods that required translation.

00:03:35.100 Static analysis also allowed us to create meaningful checks within our application. We utilized a well-known tool called RuboCop, which was originally designed to enforce consistent code style in Ruby.

00:03:43.570 RuboCop now encompasses additional features, such as checking coding conventions, best practices, and even performance features within Ruby applications.

00:03:52.560 To demonstrate how we customized this tool, we created our own set of RuboCop rules—also known as cops—which enforced translation requirements within our code. Although it may seem overkill to create around 100 of these custom cops, doing so ensured that each helper method was adequately checked.

00:04:12.330 It is essential that every method we checked had a distinct signature to avoid false positives during this localization effort. One important aspect was correctly identifying positional versus keyword arguments.

00:04:27.270 Through our efforts, we accomplished the detection of hard-coded strings within Ruby files. However, we still had to address the next challenge: identifying strings within HTML content.

00:04:39.620 The solution we developed involved reading our HTML files, parsing them with Nokogiri, and identifying all text nodes in our documents. This insight allowed us to flag strings that were hard-coded inappropriately within various HTML structures.

00:04:51.070 To ensure accuracy, we had to implement various edge cases to determine when strings should be flagged as violations. For instance, strings nested within HTML tags like 'style' or 'script' should not trigger a violation. Utilizing a similar approach to our Ruby static analysis, we deployed a tool called AirBerlins that integrated with RuboCop to analyze our HTML files.

00:05:03.760 AirBerlins allowed us to leverage existing RuboCop rules that we had created for our Ruby files, thus streamlining our efforts.

00:05:18.650 Next, our focus shifted towards ensuring that we properly captured all cases of hidden violations. We recognized that merely relying on static analysis would not be entirely foolproof since developers could add helper methods that wouldn’t get caught by our initial checks.

00:05:31.070 In light of this, we sought to assist our translation team to improve their efficiency by minimizing the amount of manual detection required. By implementing static analysis, we aimed to detect about 60 to 70 percent of hard-coded strings automatically.

00:05:44.050 Additionally, we included an auto-correction feature via RuboCop. This feature allowed us to replace detected hard-coded strings automatically, generating keys and storing them in translation files.

00:06:01.160 We generated these keys based on a mix of file name and part of the hard-coded string, effectively minimizing the necessity for developers to constantly remember to translate strings manually.

00:06:08.210 An important takeaway is that if you’re a Rails developer, it’s highly recommended to use the i18n translation method rather than the older T method. This ensures that you get all the additional functionality that Rails provides.

00:06:22.120 Now that we had a reliable approach to identifying hard-coded strings in our Ruby files, we also wanted to share best practices for retaining localization integrity throughout the development process.

00:06:40.020 The translation team is enabled to detect and extract hard-coded strings effectively, and rigorously testing our working standards ensured that violations are flagged before code merges.

00:06:56.430 As a result, developers assembling pull requests will trigger a run of RuboCop to ensure compliance with our localization standards. If any violations are found, the pull request will be blocked from merging.

00:07:10.430 When it comes to translation tools, there are several options. You'll typically choose between Ruby i18n and gettext. Ruby i18n is the standard library supported by Rails and is utilized by many developers.

00:07:25.620 However, Ruby i18n has some drawbacks, including its reliance on translation keys, making it less convenient to search for specific strings across your codebase.

00:07:41.350 Moreover, performance can become an issue with a large number of translation files, as it requires parsing all .yml files at boot time, which becomes particularly slow.

00:07:55.550 At Shopify, we have over 3000 translation files, and loading them takes longer than we would like. This is why it’s essential to consider which library to use when dealing with translations.

00:08:15.360 As a takeaway, think about the future of your application. If you are starting an application today, it would be beneficial to choose a character set that supports complete UTF-8 encoding. Fortunately, newer versions of MySQL support this feature.

00:08:30.730 Managing character sets leads to numerous inconsistencies, especially between systems like Ruby and MySQL, as they interpret UTF-8 encoding differently.

00:08:48.680 In conclusion, the more exposed to edge cases you are, the more adept you'll become at spotting potential issues in your application. To further familiarize your team with handling localization issues, consider utilizing tests that incorporate unique characters.

00:09:06.250 Localization is distinct from translation, which is merely the substitution of words from one language to another. Localization accounts for regional adaptations and aligns your content with cultural preferences.

00:09:25.920 A prime example is how the logo for McDonald's may look different across regions to cater to local tastes and preferences. Understanding these nuances helps to ensure your product resonates with users.

00:09:43.390 To emphasize localization, one particular pain point we experienced was around date and time formats. Many countries format their date and time differently, such as the US using the MM/DD/YYYY format, while countries like Japan use YYYY/MM/DD.

00:09:58.320 Navigating these variances became a challenge, particularly when other areas, like calendars and scheduling features, needed to align with users' local expectations.

00:10:15.460 For example, Google Calendar smartly adjusts its defaults based on user preference; starting the week on Sunday for the US and Monday for France can lead to inconsistencies without careful handling.

00:10:31.510 In addition to date and time, it's crucial to consider how we format addresses. I had a personal experience where my banking information was not accepted due to improper formatting. Ensuring that formats are adaptable to local standards helps prevent such issues.

00:10:44.830 Ultimately, collecting data for various countries will empower our applications to adapt correctly. We can offer developers a single endpoint containing varying formats for essentials, like phone numbers or date formats, to ensure localized experiences stay consistent.

00:11:01.030 As we conclude, remember that the challenges of localization extend far beyond mere words. Every element of user experience—from phone numbers to geographical nuances—must be examined for internationalization effects.

00:11:14.890 Navigating the landscape of global platforms requires ongoing diligence and practice in addressing localization needs strategically. Thank you all for your attention.