00:00:19.480
Hello everyone! I'm going to try and present this project. Hopefully, you can all hear me. Please try to ignore the occasional blinking to black my slides are doing; they have come back after every time so far, but we'll see how it goes.
00:00:24.000
Hi, I'm Eemeli Aro. Perhaps I should lead with this: I'm not a Ruby developer at my core; I am a JavaScript developer. I got into JavaScript when I was quite young, just as the world and JavaScript itself were relatively new. My first job where I was paid to do JavaScript was in 1998, so I've been at this for a while.
00:00:40.399
Some of the things I'm currently working on revolve around localization, which is really the rabbit hole I have managed to find for myself. The solutions for JavaScript are at low enough levels that they apply to everyone. What I'm presenting this time is effectively an amalgamation of two different talks that I've recently given, one that focuses on understanding what a language and a locale are, and another that discusses messaging and message formatting.
00:01:00.439
These talks aim to provide you with ideas about what to take away that are sufficiently agnostic of any specific programming languages. Most of this stuff is ultimately defined by various standardization bodies that do not necessarily relate to any specific programming languages. Now, the first question or point I want to discuss is: what is a language, and what is a locale?
00:01:21.600
The code 'fi' stands for Finnish. If you ask for the FI locale, you typically get Finnish as spoken in Finland, which is the place where Finnish is predominantly spoken. However, Finnish is not the only language spoken in Finland; there's also Swedish, so the locale code actually defines what a language is, and this holds true in Ruby and any programming language.
00:01:54.679
Locale codes are made up of a couple of different standardized parts. The first part is an ISO 639-1 language code, and the second part is usually an ISO 3166-1 Alpha-2 country code. Together, these make up what is known as an IETF Best Current Practice (BCP) 47 language tag, which computers and programming languages understand. This is important for various reasons.
00:02:10.679
For instance, when you ask for English, you typically get American English because that's the default. However, American English is not necessarily the best baseline for when you're building and communicating in English in contexts outside of the United States.
00:02:30.559
When you ask for English, especially because of historical practices, you may end up defaulting to American English standards, which can influence date formatting to month-day-year, a format that may not be widely accepted outside of the United States. There are indeed all sorts of variations of English that people use.
00:02:44.440
For example, 'en-FI' is an actual locale for English as spoken in Finland, and there are data instances available that define how messages should be formatted in that context. The fact remains that English varies by region, and each locale can have its own specific standards.
00:03:02.919
It's notable that the places where these definitions are created include the Unicode Consortium's Common Locale Data Repository (CLDR). How many of you here have ever heard of the CLDR? Let's see... okay, I've counted a couple of hands raised, which is honestly a bit more than I was expecting.
00:03:17.000
This repository standardizes language data and allows for the definition of various language and locale combinations. You get complicated structures that can represent variations, like English as spoken in different countries. The idea is to prevent the need to define separate locales for every regional variant of English when they can often share characteristics.
00:03:38.560
For example, when defining Belgian English, Finnish English, and German English, it’s crucial to avoid unnecessary duplication of data. Instead, you can define a system where variations are handled by standard locale codes, using values like UN M49 area codes.
00:03:58.560
It’s critical that houses like the CLDR establish these definitions to maintain consistency across different applications and systems. The English spoken in Finland or elsewhere can have specific encoding that should be recognized, yet, the tools for these transformations need some work to optimize how we translate messages effectively.
00:04:13.639
Ultimately, the endeavor has led to the discovery that there are sufficient data models to represent all messages in any of your systems for localization. How these messages are structured becomes a secondary concern once you have that model defined properly.
00:04:29.599
Now, as I mentioned earlier, a message can have the same data structure despite the fact that the locale and language tagging differ. However, the syntax for elements like numbers and dates will become a critical second layer concern. This pairs with the concept of localization in that many of these systems provide a way to communicate to users that a message belongs to a certain category.
00:04:44.319
For example, when someone gets a notification indicating the receipt of new messages, it can be confusing when the syntax is not communicated well to the translator or localization team. This is precisely where the lack of clarity in formatting can lead to issues.
00:05:06.959
Looking back at my experience, it's crucial to streamline these interactions so that the people who handle translations can efficiently manage the nuances of number formats and distinguish between 'zero,' 'one,' 'three,' and 'multiple' based messages.
00:05:22.039
This is why it’s important to ensure that as we define these messages, we integrate the understanding that the localization goes far beyond what a computer recognizes - it involves human translators who need coherent context to produce meaningful translations. I hope your collaborations with translators have been constructive and challenge-free!
00:05:42.079
We are seeing satisfaction across many areas that work well with these implementations, and through the Unicode work, we're refining standards for message formatting that helps clarify expectations and leads to a much more effective message presentation.
00:06:02.520
As I mentioned, a lot of our work over the past four or five years deals with developing standardization within the Unicode Consortium, which includes defining essential guidelines for message formatting in localization scenarios. This becomes especially pertinent when structuring messages about new notifications.
00:06:20.559
How do you effectively say 'you have no new messages,' or 'you have one new message,' or 'you have three new messages' in a programmatic way that all systems can understand?
00:06:40.679
In preparing these guidelines, we recognize that various programming languages and environments will utilize diverse methods for formatting these messages, and we must ensure that the core idea remains consistent across all culture-centric formats.
00:06:58.679
At the crux of it is knowing how to represent a message and capturing its fundamental structure. For translators, it's critical to have these standardized formats to communicate effectively about the intended structure without ambiguity.
00:07:15.280
We understand there are different ways of expressing messages that cater to different audiences, and the challenge is to balance these various expressions in a conducive manner to achieve clarity and effectiveness in communication.
00:07:34.200
Recognizing the limitations in current representation systems has driven the demand for a standardized way of managing translated messages so that the intricacies of language do not create rifts in understanding.
00:07:45.840
By extending our data models, we're moving beyond just the message content, understanding that practical translation must account for contextual nuances to ensure actionable meaning is fully captured.
00:08:05.360
These strategies serve to create a kind of ideal where messages essentially become flexible, meaning they can be molded into contexts that best fit each situation, enhancing how we approach internationalization as a whole.
00:08:25.639
Shifting perspectives can help us draw connections between types of messages and how they represent various cultural nuances, leading to more enriched interactions in our applications.
00:08:42.639
In conclusion, I hope you take away from this session the importance of structuring your messages in a way that embraces internationalization through effectively formatted localization practices.
00:09:04.159
The central message here is that even the most complex variations of language can be simplified with a unified approach that all developers can adopt, leading to more cohesive communications and a universal understanding of their content.
00:09:21.480
As we continue to shape our systems for better international integration, let’s ensure our mindset remains adaptable so we can overcome the unique challenges of localization while also harnessing opportunities to create seamless user experiences across borders.
00:09:41.919
If you care about making these changes, remain aware that the CLDR also has an issue tracker where you can voice your concerns. E-mailing or reaching out about the expectations in your regions on matters such as date formats can indeed prompt further discussion and drive improvements.
00:10:03.159
This engaged approach may very well be what continues pushing the dialogue forward, helping drive the changes we've identified as necessary in the global technical landscape.
00:10:15.679
Finally, I want to express my thanks to you all for your attention during my talk; discussions like these help illuminate common pathways for effective internationalization strategies. Thank you!