Internationalization (I18n)
The Power ⚡ and Responsibility 😓 of Unicode Adoption ✨

Summarized using AI

The Power ⚡ and Responsibility 😓 of Unicode Adoption ✨

Katie McLaughlin • February 09, 2017 • Earth

In the talk titled "The Power ⚡ and Responsibility 😓 of Unicode Adoption ✨" presented at RubyConf AU 2017, speaker Katie McLaughlin discusses the significance of adopting Unicode over more limited character sets like ASCII and Latin-1. McLaughlin highlights the flexibility of UTF-8 encoding, illustrating how it can represent diverse characters across various languages, including emojis.

Key points discussed in the video include:

  • The importance of adopting Unicode, specifically UTF-8, to ensure compatibility and versatility in representing global characters.
  • The history of emoji inclusion in the Unicode standard, beginning in the late '90s in Japan, and their subsequent explosion in popularity after Apple's iPhone incorporated them.
  • Various examples of new emojis that have been adopted over the years, such as the chipmunk, burrito, and unicorn, and the importance of continuous updates to emoji libraries.
  • The role of different platforms in rendering emojis; discrepancies in emoji representation among Apple, Android, and Windows are explored, emphasizing how interpretations can differ.
  • The process by which new emojis can be proposed to the Unicode Consortium, highlighting the criteria for requesting new emojis.
  • The complications of using emojis in digital communication, including issues of "mojibake" when older systems fail to process Unicode correctly.
  • Suggestions for improving emoji accessibility on the web, including implementing fallback images, mouseover text, and ARIA labels to aid screen readers.
  • The upcoming additions to the emoji catalog and current proposals for new representations that enhance diversity, such as gender recognition.

McLaughlin concludes by underlining the responsibility that comes with Unicode adoption, emphasizing the power it holds to transcend communication while also noting the potential for misinterpretations if not used carefully. The talk encourages viewers to embrace Unicode fully while being mindful of its complexities and implications in digital interactions.

The Power ⚡ and Responsibility 😓 of Unicode Adoption ✨
Katie McLaughlin • February 09, 2017 • Earth

http://www.rubyconf.org.au

RubyConf AU 2017

00:00:10.480 I think that's the best introduction I've ever had. Hi, I'm Katie, and I love Emoji. I love how broken they are. Now, TL;DR: everyone should be using Unicode. Don't use ASCII. Don't use Latin-1. Use Unicode! If you're familiar with Unicode, you would know that there are UTF-8, UTF-32, and UTF-16. What you need to use is UTF-8. UTF-8 is a completely flexible encoding set that everyone should adopt.
00:00:21.519 If you're using Unicode, great! You're using Unicode! In Unicode, if I want to encode the character 'a,' I would use this encoding method. If I wanted to encode the Japanese 'a,' I could just use this. While this is a little bit verbose, we can simplify it with slash encoding, allowing us to show exactly what we mean in either 16-bit or 32-bit. We need curly braces because of Ruby's syntax. If I wanted to show an 'a,' it would look like this. For example, if I wanted to combine 'e' with an accent, I simply combine the two code points, and it all works magically. This method works for most languages except for Japanese. The Unicode standard does include katakana and hiragana, but until a few years ago, they didn't have Emoji.
00:01:30.040 Emoji became a big phenomenon in Japan in the late '90s. Everyone had cool little pagers. Do you remember what pagers were? They were like mobile phones that fit in your pocket, and they beeped. These pagers had all these really cool characters that were extremely helpful. This was around the time that the Unicode standard was being proposed. People began asking, 'Hey, can we include these cool characters in the Unicode standard?' And initially, the Unicode Consortium said, 'No.' They tried again in 2007, and this time, they said yes. Now, in the Unicode standard, we can have penguins, apples, and maple leaves.
00:02:20.119 Nothing really happened until Apple decided to create an iPhone that would work in Japan. They incorporated all the Emoji into the iPhone, and the Japanese market was thrilled, thinking, 'Oh, I can have my Emoji. This is awesome!' They did all this work for Japanese phones, and later, they ported it over to U.S. phones. The story goes that this feature was a hidden setting, and you had to find and enable it. One day, someone played with their new fancy iPhone and discovered they could send a picture of poo to their friends. They sent it, and someone else asked, 'How'd you do that?' This set off a chain reaction of Emoji sharing, and by around 2010, Emoji exploded.
00:03:00.680 Since then, many new emojis have been added to the Unicode standard. In 2014, we saw the addition of a chipmunk, a satellite, and even a man in a business suit levitating — yes, it's now official! Remember Wingdings? Every character from those two fonts (Wingdings and Webdings) is now part of the Unicode standard, including the levitating businessman. In 2015, there was a whole slew of new emojis, including a burrito, a taco, and a unicorn—because who doesn't love tacos and unicorns? Recently, we saw the addition of an egg, a chicken, a duck, and even a cowboy. Most of these now work on various platforms, so if your phone doesn't display these emojis, consider upgrading. You also get new emojis and security patches when you update!
00:03:42.680 One emoji of particular note is the face palm emoji with Fitzpatrick Modifier 5. This is a combination of two emojis—face palm and the Fitzpatrick modifier—which allows you to combine them, so the emoji looks like you, which is fantastic! When working with emoji manipulation in Ruby, you can output a Unicode emoji using curly braces within your UTF-8 encoded string. You can even reverse the process by getting the ordinal to a string in 16-bit and then converting it back to the original string. Ruby allows you to do something very interesting—unlike Python—where you can use emoji as variable names.
00:04:26.080 For example, you can create a variable called 'gemstone emoji' that represents a gem. This flexibility makes programming so much fun! If you're looking for actual descriptions for emojis, you can use the G Emoji gem on GitHub. By finding the emoji by Unicode, you can retrieve its description. For instance, if you wanted to find the snake emoji, simply search using 'snake' on the slice. Here's where it gets complicated. Remember how I mentioned that everyone was sending emojis to their friends? Well, Apple got a head start because no one else was doing this. Meanwhile, the Android implementations were rushed.
00:05:02.440 The yellow heart emoji, for instance, had a peculiar appearance. The image on the left is what it looks like in the standard, while the speckling effect that appears was used in British heraldry to denote gold, as they didn’t have colors back then. So the different stripes and spots symbolize different colors. This is supposed to be the yellow heart emoji—yet the first version of Android had something entirely different! And it doesn't stop there. Takes the flushed face emoji, for example. Most representations reflect a sense of shock, like being spotted, whereas Android's interpretation showcases something quite different. These emojis can have completely different meanings based on the platform you are using.
00:05:57.760 But that’s not all. Can anyone guess what this is supposed to be? Microservices? No, it’s supposed to represent hugs! Coming from the old MSN Messenger and Yahoo Messenger days, I expected hugs to feel like an embrace emoji—but only Android's interpretation makes sense to me. Speaking of clapping, when you clap, you usually put your thumbs and palms together. The one on the end is what Windows looks like. But it’s not solely the different vendors that can’t agree on standards; even under the same vendor, things get confusing. Take these two separate emojis, for example—one is a Grimace, but can you tell which is which? There's an entire scientific paper from the University of Minnesota that proves just how confusing these can be. The emoji on the right is Grimace, by the way.
00:07:29.600 We have all of these emojis, and they’re decent, but how do you go about getting new ones? It’s simple—there’s a checklist of criteria that you need to fulfill, and it’s completely open to the public. You can submit a proposal to the Unicode Consortium emoji subcommittee. If you have a valid submission, you can request the emoji you'd like to see included, as long as it is within reason. The things that will get you accepted are if it will help with compatibility.
00:08:21.600 Look at the top line here—is Yahoo Messenger! And the one at the bottom is the standard version. See? Now we have a cowboy emoji because there was no cowboy before, but Yahoo Messenger had one. Moving on, if completeness is an issue, remember that one version of the Unicode spec didn’t include the entire zodiac set, which is awkward as people can be born in every month. Additionally, requests for emojis frequently requested items, such as unicorns, will also be considered. However, overly specific requests might not be accepted. For example, we already have a wine emoji, a cocktail emoji, a tropical drink emoji, and a whiskey emoji; thus, requesting a Manhattan emoji may not be necessary.
00:09:25.120 It's important to note that you are never allowed to submit emojis that represent brands—think Apple, for instance. You're also discouraged from having fads, memes, or anything similar represented, so you won't ever see a Star Trek emoji because that was an early proposal. As new emojis are introduced, you must update your systems, as these are literally new characters being added. The process can be tricky, especially if you have both old and new phones, leading to issues known as 'mojibake.' If your phone is outdated, you may see some weird characters instead of the new emojis.
00:10:17.760 Regardless of whether you own a new phone or an older model, if your system cannot process Unicode, you may see issues. For reference, there is the banner that was shown earlier, highlighting various speakers today. In the middle, you could see tweets, but something seems off. This was captured by Rob earlier, and it’s marvelous because someone captured a screenshot. This string came up that included some 'U's—which we've already established signal a Unicode issue. If we were to break that down and convert it, we find that what was sent was utterly mangled.
00:11:01.440 The story goes that the application used for generating tweets doesn't handle Unicode properly. However, URLs do handle Unicode just fine. This is a valid URL: the spoon emoji is incorporated. If it weren't, we wouldn’t be able to create any URL that wasn't in Latin-1. An entire RFC exists about formatting Unicode in URLs. The spoon emoji URL could look something like this. This is an encoding known as Punycode, a method to convert any Unicode into a Latin-ASCII set. The algorithm behind this conversion is quite elaborate and goes beyond the scope of a 20- or 25-minute talk, but we can touch on it briefly.
00:12:00.680 If you want to enter emojis, remember that your keyboard only has a specific set of characters, primarily letters A through Z and some numbers. On mobile phones, you might discover a smiley face that you can tap to access a small emoji keyboard. On a Mac, if you press Control + Space, you can bring up a handy little search editor to find emojis like snake or gem. Alternatively, you could consider creating an emoji keyboard! Tom Scott—a brilliant hacker from the UK—connected 14 mechanical keyboards together to create a dedicated emoji keyboard using a mix of AutoHotkey and Lisp scripting, and it worked!
00:12:50.560 This is the same chap who developed an emoji-only network called Emoj. The platform initially gained attention, but it was designed as a joke and shut down after receiving inquiries from venture capitalists due to its surprisingly large subscriber base. Now, if you want emoji on the web, it's crucial to accommodate all users, whether they are on desktop or mobile devices. For instance, Twitter has rolled out a feature allowing you to search for emojis, although it's a bit tedious for every website to create their implementation.
00:13:40.560 Instead, you might want to consider short codes. It’s intriguing to think about finding an emoji pun for every single word! If you want to use an emoji like 'cake' in Slack, you type 'cake:' and get a cake emoji. However, in HipChat, you need to use brackets—like [cake]—which will yield the same emoji, but this highlights how short codes are essentially pseudo standards that don’t work uniformly across platforms. Take the party parrot, for instance; it is not an emoji! The shipet squirrel is not an official Unicode character, either.
00:14:43.680 If you customize your Slack interface, that will work on your own setup but not on anyone else’s. This uncertainty can lead to a lack of cross-compatibility, as users often expect that their custom emojis will be recognized by everyone. Furthermore, if you're incorporating auto-completion or short codes, please give us an option to disable auto-correct! When I type a simple smiley, it sometimes gets changed to an overly enthusiastic version of happiness or, when embarrassed, turns into a vomiting emoji. It can get out of hand!
00:15:39.760 Now, regarding the topic of how we get emerging emojis back out again, we can tackle this process digitally. I work in the web space, where you have complete control over how things are rendered. One great suggestion for enhancing emoji compatibility is to use fallback images. Just don't rely solely on system defaults because these can vary significantly across devices, leading to inconsistencies in presentation—some may be seeing an embarrassed face, while others view just the blank box.
00:16:09.680 I’m utilizing reveal.js for slides, and there’s a particular slide that contains emojis, but they don’t seem to be rendering correctly. On this slide, there are two text characters that are supposed to represent emojis, but after my recent update to 10.4, it finally displays a kiwi fruit emoji properly—prior to this update, it just showed a blank square, which is incredibly frustrating. There’s an Elixir emoji as well! If you want to use something that has already been created, you can utilize tween emoji, a service used by WordPress.com. This service works by replacing Unicode characters with fallback images.
00:17:13.160 We can take this a step further. There’s a difference between changing a character into an image and the issue of small image sizes. When using tiny images, they may not represent what an emoji is supposed to mean. Thus, I propose that we implement highlights and mouseovers to enhance web accessibility. If you embed a source image as a PNG—because not every platform supports SVG yet—you can ensure your alternative text contains the actual emoji character. This way, when you copy and paste it, you retain the actual character for use.
00:18:11.480 Additionally, setting title text for each emoji enables mouseover functionality. As users hover, they can see the name of the emoji, like 'tumbler glass' or 'hugs,' giving them clarity. For bonus credit, you could also use ARIA labels, which allow screen readers to articulate what the emoji represents. This practice may seem a bit complex, but I’ve made a basic Ruby function available on GitHub that splits any string into characters, finds each character in Unicode using Gemoji’s find by Unicode, and substitutes in the corresponding image.
00:19:38.920 This feature will work seamlessly in Gemoji 3.0, which was released in late December; if you're still using an earlier version, it’s time for an upgrade. Also, if you're involved with anything web-related, please utilize a UTF-8 character set, as the default HTML 4 standard is ASCII. This doesn't suffice for most of the world. You really should embrace Unicode—it's so much more accommodating!
00:20:46.720 Looking ahead, guess what? There will be new emojis released this year! These include potential candidates like broccoli, hedgehogs, brontosaurus, stegosaurus, and the mind-blown emoji—yes, you heard it! These additions are expected to be announced around June. You might also notice updates for existing emojis. For example, remember the blushing face from Android? Now, Android 4.5 and S have a more accurate portrayal of that emoji. Curious as to why we had a duck face emoji? Well, it is indeed present, amongst all the new improvements. Windows has also updated their emoji, with the newer versions looking like actual emojis, improving upon the previous monochromatic designs.
00:23:21.440 It's an exciting time to explore all these changes! Apple has also updated their emoji library recently. If you've seen the new designs, they are all much shinier and high-resolution. Look at that cute owl, the adorable fox, and the avocados! However, sometimes the vendors do not get the emojis right. For example, Samsung created a princess bride emoji, which raised a few eyebrows. I know I have just a few minutes left, so let's not forget other things we can do with Emoji—like concatenation of existing emojis to create new ones.
00:24:16.760 Zero-width joiners enable us to combine emojis—for instance, blending the white flag and rainbow emoji into a pride flag. As of recently, there's been a push for gender recognition in emojis, featuring a male and female police officer, male and female construction workers, and the opportunity to express diversity. Now, if you combine a woman's figure with your choice of skin tone and a laptop, you can finally create an emoji that might resemble you.
00:24:34.640 There are proposals currently under consideration for new emojis, including those to represent redheads. This might take the form of adding a skin tone option or new colors altogether, providing enhanced representation for various individuals. Last but not least, if you're interested in more information and regular updates regarding emojis, visit emoji.med, which is maintained by Jeremy Bur, our local Melbourne gem who now lives in London. This website aggregates all the updates for different vendors.
00:24:49.640 Thank you all for your attention! Remember, there is immense power in Unicode, and with that comes great responsibility. We must be cautious, as misinterpretations can lead to misunderstandings, but ultimately, emojis make communication fun! Thank you very much!
Explore all talks recorded at RubyConf AU 2017
+20