00:00:00.030
Hello everyone! I hope you're all enjoying the day and that you've had a great time during the conference. Personally, I've had a wonderful time. I’d like to thank all the speakers for the quality of their talks; they were truly amazing. I learned a lot.
00:00:20.939
As the last talk of the day, I know I'm probably the last obstacle between you and that tasty Bulgarian beer! If you haven’t tried rakia yet, it’s really good but be careful; it’s pretty strong!
00:00:26.369
Earlier, I was thinking about presenting a really interesting topic, but I was worried you might fall asleep. So, let’s make this more exciting. Today, we're going to talk about localization, specifically focusing on the process we went through at Shopify to make our platform more accessible.
00:00:38.010
When discussing localization, the first key aspect to consider is the necessity of translating your product so that it can be used by users from different countries. In our case, we needed to translate the admin area of our platform.
00:00:47.470
This involved literally translating hundreds of thousands of words, if not millions. Initially, we faced two main challenges: detecting hard-coded strings visible to users and preventing new hard-coded strings from being added.
00:00:59.760
To tackle this, we developed a tool to identify hard-coded strings and sent out an email to all developers, reminding them that any new content added must be translation-ready. However, we knew we couldn’t just rely on everyone reading their emails.
00:01:12.180
Due to time constraints, we initially decided to focus mainly on Ruby files and HTML, where we recognized most of the issues were surfacing. After analyzing our code base, we realized that a significant number of hard-coded strings were scattered throughout the application.
00:01:25.290
The majority came from various sources, including HTML and embedded Ruby in views, as well as in Ruby helper files and controllers. Surprisingly, we also found hard-coded strings within CSS files, which was an unexpected discovery. Today, we will focus on the challenges we faced in this localization effort.
00:01:48.370
In order to proceed effectively, we needed a strategy to localize our products. This started with a straightforward need: translating every product and ensuring users can understand the content based on their language and region.
00:02:04.990
To summarize, at Shopify, we realized that our administrative interface had to be fully translated. This necessitated a tremendous effort, as we were looking at processing hundreds of thousands of terms that needed localization.
00:02:20.069
Many developers often think they can quickly fix issues by creating a simple regex, but this approach proved to be more complicated than anticipated. Ultimately, we needed to build a tool that would accurately detect hard-coded strings across our application.
00:02:36.059
The first step was to identify the strings visible to users rather than internal logical strings. Then, we took action on ensuring that developers did not inadvertently add new hard-coded strings. Implementing this change required a comprehensive understanding of our development process.
00:02:51.220
To manage the content that was being added, we focused on Ruby on Rails and HTML files. Initially, this approach seemed adequate, as most of the strings were embedded in these areas. We prioritized where most hard-coded strings were coming from.
00:03:03.780
To implement this, we used static analysis tools.
00:03:05.320
The main rationale behind our choice of a static analysis tool is that developers typically add content directly to the HTML body or through Rails helper methods. For example, when using the link_to method, the first argument is a string that will be visible to the user.
00:03:16.650
To summarize things, our goal was to ensure that whenever developers were adding hard-coded strings, they could clearly reference whether they were utilizing those strings in appropriate contexts.
00:03:26.650
This meant leveraging our knowledge of method signatures and ensuring we only flagged violations when developers used the correct helper methods that required translation.
00:03:35.100
Static analysis also allowed us to create meaningful checks within our application. We utilized a well-known tool called RuboCop, which was originally designed to enforce consistent code style in Ruby.
00:03:43.570
RuboCop now encompasses additional features, such as checking coding conventions, best practices, and even performance features within Ruby applications.
00:03:52.560
To demonstrate how we customized this tool, we created our own set of RuboCop rules—also known as cops—which enforced translation requirements within our code. Although it may seem overkill to create around 100 of these custom cops, doing so ensured that each helper method was adequately checked.
00:04:12.330
It is essential that every method we checked had a distinct signature to avoid false positives during this localization effort. One important aspect was correctly identifying positional versus keyword arguments.
00:04:27.270
Through our efforts, we accomplished the detection of hard-coded strings within Ruby files. However, we still had to address the next challenge: identifying strings within HTML content.
00:04:39.620
The solution we developed involved reading our HTML files, parsing them with Nokogiri, and identifying all text nodes in our documents. This insight allowed us to flag strings that were hard-coded inappropriately within various HTML structures.
00:04:51.070
To ensure accuracy, we had to implement various edge cases to determine when strings should be flagged as violations. For instance, strings nested within HTML tags like 'style' or 'script' should not trigger a violation. Utilizing a similar approach to our Ruby static analysis, we deployed a tool called AirBerlins that integrated with RuboCop to analyze our HTML files.
00:05:03.760
AirBerlins allowed us to leverage existing RuboCop rules that we had created for our Ruby files, thus streamlining our efforts.
00:05:18.650
Next, our focus shifted towards ensuring that we properly captured all cases of hidden violations. We recognized that merely relying on static analysis would not be entirely foolproof since developers could add helper methods that wouldn’t get caught by our initial checks.
00:05:31.070
In light of this, we sought to assist our translation team to improve their efficiency by minimizing the amount of manual detection required. By implementing static analysis, we aimed to detect about 60 to 70 percent of hard-coded strings automatically.
00:05:44.050
Additionally, we included an auto-correction feature via RuboCop. This feature allowed us to replace detected hard-coded strings automatically, generating keys and storing them in translation files.
00:06:01.160
We generated these keys based on a mix of file name and part of the hard-coded string, effectively minimizing the necessity for developers to constantly remember to translate strings manually.
00:06:08.210
An important takeaway is that if you’re a Rails developer, it’s highly recommended to use the i18n translation method rather than the older T method. This ensures that you get all the additional functionality that Rails provides.
00:06:22.120
Now that we had a reliable approach to identifying hard-coded strings in our Ruby files, we also wanted to share best practices for retaining localization integrity throughout the development process.
00:06:40.020
The translation team is enabled to detect and extract hard-coded strings effectively, and rigorously testing our working standards ensured that violations are flagged before code merges.
00:06:56.430
As a result, developers assembling pull requests will trigger a run of RuboCop to ensure compliance with our localization standards. If any violations are found, the pull request will be blocked from merging.
00:07:10.430
When it comes to translation tools, there are several options. You'll typically choose between Ruby i18n and gettext. Ruby i18n is the standard library supported by Rails and is utilized by many developers.
00:07:25.620
However, Ruby i18n has some drawbacks, including its reliance on translation keys, making it less convenient to search for specific strings across your codebase.
00:07:41.350
Moreover, performance can become an issue with a large number of translation files, as it requires parsing all .yml files at boot time, which becomes particularly slow.
00:07:55.550
At Shopify, we have over 3000 translation files, and loading them takes longer than we would like. This is why it’s essential to consider which library to use when dealing with translations.
00:08:15.360
As a takeaway, think about the future of your application. If you are starting an application today, it would be beneficial to choose a character set that supports complete UTF-8 encoding. Fortunately, newer versions of MySQL support this feature.
00:08:30.730
Managing character sets leads to numerous inconsistencies, especially between systems like Ruby and MySQL, as they interpret UTF-8 encoding differently.
00:08:48.680
In conclusion, the more exposed to edge cases you are, the more adept you'll become at spotting potential issues in your application. To further familiarize your team with handling localization issues, consider utilizing tests that incorporate unique characters.
00:09:06.250
Localization is distinct from translation, which is merely the substitution of words from one language to another. Localization accounts for regional adaptations and aligns your content with cultural preferences.
00:09:25.920
A prime example is how the logo for McDonald's may look different across regions to cater to local tastes and preferences. Understanding these nuances helps to ensure your product resonates with users.
00:09:43.390
To emphasize localization, one particular pain point we experienced was around date and time formats. Many countries format their date and time differently, such as the US using the MM/DD/YYYY format, while countries like Japan use YYYY/MM/DD.
00:09:58.320
Navigating these variances became a challenge, particularly when other areas, like calendars and scheduling features, needed to align with users' local expectations.
00:10:15.460
For example, Google Calendar smartly adjusts its defaults based on user preference; starting the week on Sunday for the US and Monday for France can lead to inconsistencies without careful handling.
00:10:31.510
In addition to date and time, it's crucial to consider how we format addresses. I had a personal experience where my banking information was not accepted due to improper formatting. Ensuring that formats are adaptable to local standards helps prevent such issues.
00:10:44.830
Ultimately, collecting data for various countries will empower our applications to adapt correctly. We can offer developers a single endpoint containing varying formats for essentials, like phone numbers or date formats, to ensure localized experiences stay consistent.
00:11:01.030
As we conclude, remember that the challenges of localization extend far beyond mere words. Every element of user experience—from phone numbers to geographical nuances—must be examined for internationalization effects.
00:11:14.890
Navigating the landscape of global platforms requires ongoing diligence and practice in addressing localization needs strategically. Thank you all for your attention.