How to normalize locale identifiers to standard form

Convert locale identifiers to canonical format with correct casing and component ordering

Introduction

Locale identifiers can be written in many different ways while referring to the same language and region. A user might write EN-us, en-US, or en-us, and all three represent American English. When storing, comparing, or displaying locale identifiers, these variations create inconsistency.

Normalization converts locale identifiers to a standard canonical form. This process adjusts the casing of components, orders extension keywords alphabetically, and produces a consistent representation that you can rely on throughout your application.

JavaScript provides built-in methods to normalize locale identifiers automatically. This guide explains what normalization means, how to apply it in your code, and when normalized identifiers improve your internationalization logic.

What normalization means for locale identifiers

Normalization transforms a locale identifier into its canonical form according to the BCP 47 standard and Unicode specifications. The canonical form has specific rules for casing, ordering, and structure.

A normalized locale identifier follows these conventions:

  • Language codes are lowercase
  • Script codes are title case with the first letter capitalized
  • Region codes are uppercase
  • Variant codes are lowercase
  • Extension keywords are sorted alphabetically
  • Extension attributes are sorted alphabetically

These rules create a single standard representation for each locale. No matter how a user writes a locale identifier, the normalized form is always the same.

Understanding the normalization rules

Each component of a locale identifier has a specific casing convention in the canonical form.

Language casing

Language codes always use lowercase letters:

en (correct)
EN (incorrect, but normalizes to en)
eN (incorrect, but normalizes to en)

This applies to both two-letter and three-letter language codes.

Script casing

Script codes use title case, where the first letter is uppercase and the remaining three letters are lowercase:

Hans (correct)
hans (incorrect, but normalizes to Hans)
HANS (incorrect, but normalizes to Hans)

Common script codes include Latn for Latin, Cyrl for Cyrillic, Hans for Simplified Han characters, and Hant for Traditional Han characters.

Region casing

Region codes always use uppercase letters:

US (correct)
us (incorrect, but normalizes to US)
Us (incorrect, but normalizes to US)

This applies to the two-letter country codes used in most locale identifiers.

Extension ordering

Unicode extension tags contain keywords that specify formatting preferences. In the canonical form, these keywords appear in alphabetical order by their key:

en-US-u-ca-gregory-nu-latn (correct)
en-US-u-nu-latn-ca-gregory (incorrect, but normalizes to first form)

The calendar key ca comes before the numbering system key nu alphabetically, so ca-gregory appears first in the normalized form.

Using Intl.getCanonicalLocales to normalize

The Intl.getCanonicalLocales() method normalizes locale identifiers and returns them in canonical form. This is the primary method for normalization in JavaScript.

const normalized = Intl.getCanonicalLocales("EN-us");
console.log(normalized);
// ["en-US"]

The method accepts a locale identifier with any casing and returns the properly cased canonical form.

Normalizing language codes

The method converts language codes to lowercase:

const result = Intl.getCanonicalLocales("FR-fr");
console.log(result);
// ["fr-FR"]

The language code FR becomes fr in the output.

Normalizing script codes

The method converts script codes to title case:

const result = Intl.getCanonicalLocales("zh-HANS-cn");
console.log(result);
// ["zh-Hans-CN"]

The script code HANS becomes Hans, and the region code cn becomes CN.

Normalizing region codes

The method converts region codes to uppercase:

const result = Intl.getCanonicalLocales("en-gb");
console.log(result);
// ["en-GB"]

The region code gb becomes GB in the output.

Normalizing extension keywords

The method sorts extension keywords alphabetically:

const result = Intl.getCanonicalLocales("en-US-u-nu-latn-hc-h12-ca-gregory");
console.log(result);
// ["en-US-u-ca-gregory-hc-h12-nu-latn"]

The keywords reorder from nu-latn-hc-h12-ca-gregory to ca-gregory-hc-h12-nu-latn because ca comes before hc and hc comes before nu alphabetically.

Normalizing multiple locale identifiers

The Intl.getCanonicalLocales() method accepts an array of locale identifiers and normalizes all of them:

const locales = ["EN-us", "fr-FR", "ZH-hans-cn"];
const normalized = Intl.getCanonicalLocales(locales);
console.log(normalized);
// ["en-US", "fr-FR", "zh-Hans-CN"]

Each locale in the array is converted to its canonical form.

Removing duplicates

The method removes duplicate locale identifiers after normalization. If multiple input values normalize to the same canonical form, the result contains only one copy:

const locales = ["en-US", "EN-us", "en-us"];
const normalized = Intl.getCanonicalLocales(locales);
console.log(normalized);
// ["en-US"]

All three inputs represent the same locale, so the output contains a single normalized identifier.

This deduplication is useful when processing user input or merging locale lists from multiple sources.

Handling invalid identifiers

If any locale identifier in the array is invalid, the method throws a RangeError:

try {
  Intl.getCanonicalLocales(["en-US", "invalid", "fr-FR"]);
} catch (error) {
  console.error(error.message);
  // "invalid is not a structurally valid language tag"
}

When normalizing user-provided lists, validate or catch errors for each locale individually to identify which specific identifiers are invalid.

Using Intl.Locale for normalization

The Intl.Locale constructor also normalizes locale identifiers when creating locale objects. You can access the normalized form through the toString() method.

const locale = new Intl.Locale("EN-us");
console.log(locale.toString());
// "en-US"

The constructor accepts any valid casing and produces a normalized locale object.

Accessing normalized components

Each property of the locale object returns the normalized form of that component:

const locale = new Intl.Locale("ZH-hans-CN");

console.log(locale.language);
// "zh"

console.log(locale.script);
// "Hans"

console.log(locale.region);
// "CN"

console.log(locale.baseName);
// "zh-Hans-CN"

The language, script, and region properties all use the correct casing for the canonical form.

Normalizing with options

When you create a locale object with options, the constructor normalizes both the base identifier and the options:

const locale = new Intl.Locale("EN-us", {
  calendar: "gregory",
  numberingSystem: "latn",
  hourCycle: "h12"
});

console.log(locale.toString());
// "en-US-u-ca-gregory-hc-h12-nu-latn"

The extension keywords appear in alphabetical order in the output, even though the options object does not specify any particular order.

Why normalization matters

Normalization provides consistency across your application. When you store, display, or compare locale identifiers, using the canonical form prevents subtle bugs and improves reliability.

Consistent storage

When storing locale identifiers in databases, configuration files, or local storage, normalized forms prevent duplication:

const userPreferences = new Set();

function saveUserLocale(identifier) {
  const normalized = Intl.getCanonicalLocales(identifier)[0];
  userPreferences.add(normalized);
}

saveUserLocale("en-US");
saveUserLocale("EN-us");
saveUserLocale("en-us");

console.log(userPreferences);
// Set { "en-US" }

Without normalization, the set would contain three entries for the same locale. With normalization, it correctly contains one.

Reliable comparison

Comparing locale identifiers requires normalization. Two identifiers that differ only in casing represent the same locale:

function isSameLocale(locale1, locale2) {
  const normalized1 = Intl.getCanonicalLocales(locale1)[0];
  const normalized2 = Intl.getCanonicalLocales(locale2)[0];
  return normalized1 === normalized2;
}

console.log(isSameLocale("en-US", "EN-us"));
// true

console.log(isSameLocale("en-US", "en-GB"));
// false

Direct string comparison of unnormalized identifiers produces incorrect results.

Consistent display

When showing locale identifiers to users or in debugging output, normalized forms provide consistent formatting:

function displayLocale(identifier) {
  try {
    const normalized = Intl.getCanonicalLocales(identifier)[0];
    return `Current locale: ${normalized}`;
  } catch (error) {
    return "Invalid locale identifier";
  }
}

console.log(displayLocale("EN-us"));
// "Current locale: en-US"

console.log(displayLocale("zh-HANS-cn"));
// "Current locale: zh-Hans-CN"

Users see properly formatted locale identifiers regardless of the input format.

Practical applications

Normalization solves common problems when working with locale identifiers in real applications.

Normalizing user input

When users enter locale identifiers in forms or settings, normalize the input before storing it:

function processLocaleInput(input) {
  try {
    const normalized = Intl.getCanonicalLocales(input)[0];
    return {
      success: true,
      locale: normalized
    };
  } catch (error) {
    return {
      success: false,
      error: "Please enter a valid locale identifier"
    };
  }
}

const result = processLocaleInput("fr-ca");
console.log(result);
// { success: true, locale: "fr-CA" }

This ensures consistent formatting in your database or configuration.

Building locale lookup tables

When creating lookup tables for translations or locale-specific data, use normalized keys:

const translations = new Map();

function addTranslation(locale, key, value) {
  const normalized = Intl.getCanonicalLocales(locale)[0];

  if (!translations.has(normalized)) {
    translations.set(normalized, {});
  }

  translations.get(normalized)[key] = value;
}

addTranslation("en-us", "hello", "Hello");
addTranslation("EN-US", "goodbye", "Goodbye");

console.log(translations.get("en-US"));
// { hello: "Hello", goodbye: "Goodbye" }

Both calls to addTranslation use the same normalized key, so the translations are stored in the same object.

Merging locale lists

When combining locale identifiers from multiple sources, normalize and deduplicate them:

function mergeLocales(...sources) {
  const allLocales = sources.flat();
  const normalized = Intl.getCanonicalLocales(allLocales);
  return normalized;
}

const userLocales = ["en-us", "fr-FR"];
const appLocales = ["EN-US", "de-de"];
const systemLocales = ["en-US", "es-mx"];

const merged = mergeLocales(userLocales, appLocales, systemLocales);
console.log(merged);
// ["en-US", "fr-FR", "de-DE", "es-MX"]

The method removes duplicates and normalizes casing across all sources.

Creating locale selection interfaces

When building dropdown menus or selection interfaces, normalize locale identifiers for display:

function buildLocaleOptions(locales) {
  const normalized = Intl.getCanonicalLocales(locales);

  return normalized.map(locale => {
    const localeObj = new Intl.Locale(locale);
    const displayNames = new Intl.DisplayNames([locale], {
      type: "language"
    });

    return {
      value: locale,
      label: displayNames.of(localeObj.language)
    };
  });
}

const options = buildLocaleOptions(["EN-us", "fr-FR", "DE-de"]);
console.log(options);
// [
//   { value: "en-US", label: "English" },
//   { value: "fr-FR", label: "French" },
//   { value: "de-DE", label: "German" }
// ]

The normalized values provide consistent identifiers for form submissions.

Validating configuration files

When loading locale identifiers from configuration files, normalize them during initialization:

function loadLocaleConfig(config) {
  const validatedConfig = {
    defaultLocale: null,
    supportedLocales: []
  };

  try {
    validatedConfig.defaultLocale = Intl.getCanonicalLocales(
      config.defaultLocale
    )[0];
  } catch (error) {
    console.error("Invalid default locale:", config.defaultLocale);
    validatedConfig.defaultLocale = "en-US";
  }

  config.supportedLocales.forEach(locale => {
    try {
      const normalized = Intl.getCanonicalLocales(locale)[0];
      validatedConfig.supportedLocales.push(normalized);
    } catch (error) {
      console.warn("Skipping invalid locale:", locale);
    }
  });

  return validatedConfig;
}

const config = {
  defaultLocale: "en-us",
  supportedLocales: ["EN-us", "fr-FR", "invalid", "de-DE"]
};

const validated = loadLocaleConfig(config);
console.log(validated);
// {
//   defaultLocale: "en-US",
//   supportedLocales: ["en-US", "fr-FR", "de-DE"]
// }

This catches configuration errors early and ensures your application uses valid normalized identifiers.

Normalization and locale matching

Normalization is important for locale matching algorithms. When finding the best locale match for a user preference, compare normalized forms:

function findBestMatch(userPreference, availableLocales) {
  const normalizedPreference = Intl.getCanonicalLocales(userPreference)[0];
  const normalizedAvailable = Intl.getCanonicalLocales(availableLocales);

  if (normalizedAvailable.includes(normalizedPreference)) {
    return normalizedPreference;
  }

  const preferenceLocale = new Intl.Locale(normalizedPreference);

  const languageMatch = normalizedAvailable.find(available => {
    const availableLocale = new Intl.Locale(available);
    return availableLocale.language === preferenceLocale.language;
  });

  if (languageMatch) {
    return languageMatch;
  }

  return normalizedAvailable[0];
}

const available = ["en-us", "fr-FR", "DE-de"];
console.log(findBestMatch("EN-GB", available));
// "en-US"

Normalization ensures the matching logic works correctly regardless of input casing.

Normalization does not change meaning

Normalization only affects the representation of a locale identifier. It does not change which language, script, or region the identifier represents.

const locale1 = new Intl.Locale("en-us");
const locale2 = new Intl.Locale("EN-US");

console.log(locale1.language === locale2.language);
// true

console.log(locale1.region === locale2.region);
// true

console.log(locale1.toString() === locale2.toString());
// true

Both identifiers refer to American English. Normalization simply ensures they are written the same way.

This is different from operations like maximize() and minimize(), which add or remove components and can change the specificity of the identifier.

Browser support

The Intl.getCanonicalLocales() method works in all modern browsers. Chrome, Firefox, Safari, and Edge provide full support.

Node.js supports Intl.getCanonicalLocales() starting from version 9, with full support in version 10 and later.

The Intl.Locale constructor and its normalization behavior work in all browsers that support the Intl.Locale API. This includes modern versions of Chrome, Firefox, Safari, and Edge.

Summary

Normalization converts locale identifiers to their canonical form by applying standard casing rules and sorting extension keywords. This creates consistent representations that you can store, compare, and display reliably.

Key concepts:

  • Canonical form uses lowercase for languages, title case for scripts, and uppercase for regions
  • Extension keywords are sorted alphabetically in the canonical form
  • The Intl.getCanonicalLocales() method normalizes identifiers and removes duplicates
  • The Intl.Locale constructor also produces normalized output
  • Normalization does not change the meaning of a locale identifier
  • Use normalized identifiers for storage, comparison, and display

Normalization is a foundational operation for any application that works with locale identifiers. It prevents bugs caused by inconsistent casing and ensures your internationalization logic handles locale identifiers reliably.