Talks

How I’m trying to fix localization, and what you can do to help

How I’m trying to fix localization, and what you can do to help

by Eemeli Aro

In the video titled How I’m trying to fix localization, and what you can do to help, Eemeli Aro discusses the complexities of localization in programming, specifically in the context of JavaScript and Ruby. He emphasizes how basic localization approaches can become intricate, necessitating a deeper understanding of languages and locales.

Key points include:

- Definition of Language and Locale: Aro explains that a language code (like 'fi' for Finnish) combined with a country code (like 'FI' for Finland) creates a locale. These codes help identify regional language variations essential for proper localization in software.
- Importance of Locale Codes: He highlights the significance of using standardized language tags (IETF BCP 47) for effective communication across programming platforms. For instance, using 'en-FI' for English as spoken in Finland reflects a specific locale that requires distinct formatting considerations.
- Role of the Common Locale Data Repository (CLDR): The CLDR plays a vital role in standardizing language data to avoid redundancy for different regional English variations. Aro underscores the necessity of efficient data structures to represent messages across multiple locales.
- Challenges in Message Formatting: Aro details the difficulties involved in structuring messages for distinct audiences, especially in managing number and date formats. He notes that messages must consider context to maintain clarity, avoiding confusion for both users and translation teams.
- Collaboration with Translators: It is essential for developers to understand that effective localization goes beyond programming syntax. Integrating thoughtful communication with translators ensures that localized messages maintain their intended meaning and context.
- Practical Application of Localization Standards: Aro discusses ongoing efforts to develop guidelines for message formatting that all programming environments can understand uniformly. He stresses the importance of flexibility in message representation to enhance internationalization efforts.
- Encouragement to Engage with Standardization Efforts: Aro concludes with an encouragement for developers to actively participate in discussions regarding localization standards by utilizing the CLDR's issue tracker, reiterating that collaborative feedback can lead to significant improvements in localization practices.

The overall takeaway from Aro's session is the importance of consolidating localization practices through standardized message structuring and a continuous feedback loop with the programmer and translator communities to create a seamless international experience.

00:00:19.480 Hello everyone! I'm going to try and present this project. Hopefully, you can all hear me. Please try to ignore the occasional blinking to black my slides are doing; they have come back after every time so far, but we'll see how it goes.
00:00:24.000 Hi, I'm Eemeli Aro. Perhaps I should lead with this: I'm not a Ruby developer at my core; I am a JavaScript developer. I got into JavaScript when I was quite young, just as the world and JavaScript itself were relatively new. My first job where I was paid to do JavaScript was in 1998, so I've been at this for a while.
00:00:40.399 Some of the things I'm currently working on revolve around localization, which is really the rabbit hole I have managed to find for myself. The solutions for JavaScript are at low enough levels that they apply to everyone. What I'm presenting this time is effectively an amalgamation of two different talks that I've recently given, one that focuses on understanding what a language and a locale are, and another that discusses messaging and message formatting.
00:01:00.439 These talks aim to provide you with ideas about what to take away that are sufficiently agnostic of any specific programming languages. Most of this stuff is ultimately defined by various standardization bodies that do not necessarily relate to any specific programming languages. Now, the first question or point I want to discuss is: what is a language, and what is a locale?
00:01:21.600 The code 'fi' stands for Finnish. If you ask for the FI locale, you typically get Finnish as spoken in Finland, which is the place where Finnish is predominantly spoken. However, Finnish is not the only language spoken in Finland; there's also Swedish, so the locale code actually defines what a language is, and this holds true in Ruby and any programming language.
00:01:54.679 Locale codes are made up of a couple of different standardized parts. The first part is an ISO 639-1 language code, and the second part is usually an ISO 3166-1 Alpha-2 country code. Together, these make up what is known as an IETF Best Current Practice (BCP) 47 language tag, which computers and programming languages understand. This is important for various reasons.
00:02:10.679 For instance, when you ask for English, you typically get American English because that's the default. However, American English is not necessarily the best baseline for when you're building and communicating in English in contexts outside of the United States.
00:02:30.559 When you ask for English, especially because of historical practices, you may end up defaulting to American English standards, which can influence date formatting to month-day-year, a format that may not be widely accepted outside of the United States. There are indeed all sorts of variations of English that people use.
00:02:44.440 For example, 'en-FI' is an actual locale for English as spoken in Finland, and there are data instances available that define how messages should be formatted in that context. The fact remains that English varies by region, and each locale can have its own specific standards.
00:03:02.919 It's notable that the places where these definitions are created include the Unicode Consortium's Common Locale Data Repository (CLDR). How many of you here have ever heard of the CLDR? Let's see... okay, I've counted a couple of hands raised, which is honestly a bit more than I was expecting.
00:03:17.000 This repository standardizes language data and allows for the definition of various language and locale combinations. You get complicated structures that can represent variations, like English as spoken in different countries. The idea is to prevent the need to define separate locales for every regional variant of English when they can often share characteristics.
00:03:38.560 For example, when defining Belgian English, Finnish English, and German English, it’s crucial to avoid unnecessary duplication of data. Instead, you can define a system where variations are handled by standard locale codes, using values like UN M49 area codes.
00:03:58.560 It’s critical that houses like the CLDR establish these definitions to maintain consistency across different applications and systems. The English spoken in Finland or elsewhere can have specific encoding that should be recognized, yet, the tools for these transformations need some work to optimize how we translate messages effectively.
00:04:13.639 Ultimately, the endeavor has led to the discovery that there are sufficient data models to represent all messages in any of your systems for localization. How these messages are structured becomes a secondary concern once you have that model defined properly.
00:04:29.599 Now, as I mentioned earlier, a message can have the same data structure despite the fact that the locale and language tagging differ. However, the syntax for elements like numbers and dates will become a critical second layer concern. This pairs with the concept of localization in that many of these systems provide a way to communicate to users that a message belongs to a certain category.
00:04:44.319 For example, when someone gets a notification indicating the receipt of new messages, it can be confusing when the syntax is not communicated well to the translator or localization team. This is precisely where the lack of clarity in formatting can lead to issues.
00:05:06.959 Looking back at my experience, it's crucial to streamline these interactions so that the people who handle translations can efficiently manage the nuances of number formats and distinguish between 'zero,' 'one,' 'three,' and 'multiple' based messages.
00:05:22.039 This is why it’s important to ensure that as we define these messages, we integrate the understanding that the localization goes far beyond what a computer recognizes - it involves human translators who need coherent context to produce meaningful translations. I hope your collaborations with translators have been constructive and challenge-free!
00:05:42.079 We are seeing satisfaction across many areas that work well with these implementations, and through the Unicode work, we're refining standards for message formatting that helps clarify expectations and leads to a much more effective message presentation.
00:06:02.520 As I mentioned, a lot of our work over the past four or five years deals with developing standardization within the Unicode Consortium, which includes defining essential guidelines for message formatting in localization scenarios. This becomes especially pertinent when structuring messages about new notifications.
00:06:20.559 How do you effectively say 'you have no new messages,' or 'you have one new message,' or 'you have three new messages' in a programmatic way that all systems can understand?
00:06:40.679 In preparing these guidelines, we recognize that various programming languages and environments will utilize diverse methods for formatting these messages, and we must ensure that the core idea remains consistent across all culture-centric formats.
00:06:58.679 At the crux of it is knowing how to represent a message and capturing its fundamental structure. For translators, it's critical to have these standardized formats to communicate effectively about the intended structure without ambiguity.
00:07:15.280 We understand there are different ways of expressing messages that cater to different audiences, and the challenge is to balance these various expressions in a conducive manner to achieve clarity and effectiveness in communication.
00:07:34.200 Recognizing the limitations in current representation systems has driven the demand for a standardized way of managing translated messages so that the intricacies of language do not create rifts in understanding.
00:07:45.840 By extending our data models, we're moving beyond just the message content, understanding that practical translation must account for contextual nuances to ensure actionable meaning is fully captured.
00:08:05.360 These strategies serve to create a kind of ideal where messages essentially become flexible, meaning they can be molded into contexts that best fit each situation, enhancing how we approach internationalization as a whole.
00:08:25.639 Shifting perspectives can help us draw connections between types of messages and how they represent various cultural nuances, leading to more enriched interactions in our applications.
00:08:42.639 In conclusion, I hope you take away from this session the importance of structuring your messages in a way that embraces internationalization through effectively formatted localization practices.
00:09:04.159 The central message here is that even the most complex variations of language can be simplified with a unified approach that all developers can adopt, leading to more cohesive communications and a universal understanding of their content.
00:09:21.480 As we continue to shape our systems for better international integration, let’s ensure our mindset remains adaptable so we can overcome the unique challenges of localization while also harnessing opportunities to create seamless user experiences across borders.
00:09:41.919 If you care about making these changes, remain aware that the CLDR also has an issue tracker where you can voice your concerns. E-mailing or reaching out about the expectations in your regions on matters such as date formats can indeed prompt further discussion and drive improvements.
00:10:03.159 This engaged approach may very well be what continues pushing the dialogue forward, helping drive the changes we've identified as necessary in the global technical landscape.
00:10:15.679 Finally, I want to express my thanks to you all for your attention during my talk; discussions like these help illuminate common pathways for effective internationalization strategies. Thank you!