Implementing Globalization (G11N) in Software Applications
This article will discuss the significance and concerns associated with implementing globalization (G11N) in software applications. Globalization generally refers to the combination of localization (L10N), and internationalization (I18N).
Localization and Internationalization are often used interchangeably, but they’re really two distinct processes.
- Localization (L10N) refers to the translation and linguistic work associated with making a product functional in a particular locale.
- Internationalization (I18N) involves higher-level architectural and product design decisions that set up an application to allow for localization in any locale.
Set up internationalization no matter where you’re building your software
It is a good idea to consider I18N early on for most applications, even if there is no initial requirement to be multi-national. For one thing, there are multiple languages spoken in many different countries. You don’t want to exclude entire demographics from your target market because they can’t read it in their native language.
Setting up a product for I18N off the bat will enable you to easily expand to more markets. This in turn makes it easier to scale, which can be a huge differentiator.
Too often, I18N is considered as an afterthought. This leads to additional work to implement I18N on a mature codebase in which is was never previously considered. Instead, addressing it early on and throughout the design/development process will make it much less daunting.
If you do this well, you’ll make it a part of the process, considering internationalization whenever new features are designed and developed. With this approach, you will end up with a more flexible and scalable product that will be more appealing to more customers and investors.
“Internationalization is not a feature, it’s an architecture.”
– Addison Phillips (Senior Principal SDE for Internationalization at Amazon)
What to look for when choosing an I18N solution
There are many different I18N solutions out there, and the best one(s) for your application will depend on your tech stack, business needs, and general preference. Regardless of the solution you go with, there are a few things you will always want to keep in mind.
Let’s discuss the main goals of an I18N.
The first step in internationalizing a codebase is to abstract out all user-facing copy. Rather than having hard-coded copy everywhere, use utility methods to fetch the copy from the dictionary for the appropriate locale.
A dictionary will generally be a file or set of files containing key/value datasets (e.g. JSON, YAML, PO, etc.), where the key will be the same across all locales, and the value will be the translated copy string.
Here are the key tips for managing your dictionaries:
- Avoid Duplicate keys – People can be quick to add new keys to the dictionary before checking if it already exists, resulting in duplicates. This should be avoided at all costs, and can be prevented in various ways. One of which is the next point:
- Alphabetization – Sounds like a given, but I have seen randomly ordered dictionaries which are very difficult to navigate. Obviously, everything is easier to find in alphabetical order. Someone adding a new copy string will easily see if it already exists, since there’s only one place it could logically be. This could even be enforced by some automated linting!
- Namespaces – This approach involves splitting the dictionary into smaller, more discernible pieces based on context. Keep in mind that this approach can make duplicate keys more likely because someone might add a key to one namespace even though it already exists in another. I suggest using a common namespace that acts as a fallback if a more specific namespace is missing the given key. Encourage people to check if the key already exists in the common namespace before adding it to a more specific namespace. This could potentially also be reinforced by linting.
- Pluralization – This can vary greatly across languages. Most I18N frameworks will have some functionality baked in to help (if it doesn’t, you might want to choose a different framework). The key is to make sure to use it. Rather than hard-coding copy with pluralized nouns, pass in the count as an argument and make sure the dictionary can handle the different pluralizations.
- Punctuation – Seems obvious, but punctuation is part of the copy. Make sure to keep all punctuation within the dictionary files themselves to prevent issues with languages that use different types of punctuation.
- Dynamic data – Make sure you have the capability of passing in variables to your translation methods. Don’t break sentences apart and piece them together manually to interpolate dynamic data. The sentence structure will most likely be significantly different in other languages. Instead, you should pass any dynamic data to your translation method to have it be interpolated in the dictionary itself. This relates directly to the next point:
- Avoid string concatenation – Word order and sentence structure differs greatly across languages. Store strings in their full form in the dictionary so they can be accurately translated in one place. If you have a full paragraph of copy, store it as such as opposed to each sentence separately. Otherwise you will end up in a difficult situation where various sentences need to be translated but they aren’t split up the same way in the new language.
- Currency – Obviously, different countries use different currencies. This means the currency symbol will need to be dynamic based on the locale. Also, always store currency values in the lowest common denominator so that conversions can be as accurate as possible.
- Addresses – State and Zip code are US specific address parameters. Keep in mind that validations and data structures around addresses will need to be flexible if you are dealing with different countries.
- Measurements – Different locales can use different measurement systems. Be conscious that anywhere that you display measurements, it will need to be dynamic based on locale. This means both displaying the correct unit of measurement as well as ensuring it has been converted to the right amount.
- Dates and Times – These formats vary greatly across locales. Make sure to store timestamps in a consistent format on the backend, and then format them according to the locale when displaying them on the frontend.
- Right-to-left Text – Some languages (e.g. Arabic, Hebrew, Chinese, Japanese) are written from right to left instead of left to right. If you need to localize for right to left, this will require significant design thought.
Other advice for implementing G11N
- Make sure to use standardized language locales: List of ISO 639-1 codes.
- Test often and understand unicode. For a deep dive into unicode and character sets, check out this article by Joel Spolsky (CEO of Stack Overflow, among other things).
- Consult with experts in each locale, because you might be making bad assumptions due to cultural differences or other nuances.
- Consider how Icons and Colors can have different meanings depending on locale.
- Example: Recall the original Playstation controller, where O (red) was used as a confirm action and X (blue) was used as a cancel button. This made sense in Japanese culture, where O means “right” and X means “wrong”, but was confusing in western culture, where X can be considered confirmation (“X marks the spot”) and red is commonly associated with “Stop”.
- Check out this interesting article on the history behind the original Playstation controller design
- Don’t use images or country flags for language selection. Many countries have multiple languages. For example, Canadians from Quebec are not from France, and their French is different than what is spoken in France. Portuguese is another good example. Here’s an article addressing the difference between Brazilian and European Portuguese.
- Don’t use regex without understanding Unicode. Here is a (very old) unicode regex tutorial. Also, for more on Unicode I again recommend the article by Joel Spolsky about character sets.
- Images that contain text should be avoided. Separate the two. The text will need to be localized, and it will be a lot easier if you don’t have to render different images depending on the locale.
- Don’t write copy that is “punny” or idiomatic. This will likely not translate well.
- Don’t always trust the browser to know the locale.
- Don’t forget about emails, API documentation, and other user-facing copy outside of the main user-facing app.
- Keep in mind that words and sentences can vary greatly in size depending on the locale. This needs to be considered when designing UIs. For example, Japanese will take up far less space than English (not to mention it’s right to left), whereas German can take up far more space than anything.
- Consider user preferences (browser, emails, etc), as this affects all aspects of the application
- Make sure everyone who tests the software (engineers, QA, and UAT) are testing changes in all locales.
- W3C: Localization vs. Internationalization
- W3C: Internationalization Quick Tips for the Web
- W3C: Internationalization techniques: Authoring HTML & CSS
- Awesome Falsehood: Internationalization
- The Challenges of Localizing Software for the Chinese and Japanese Markets
- Wikipedia: Internationalization and localization
- Software Translation, Software Localization and Software Internationalization
- 10 Tips for Website Localization