How to remove redundant information from locale identifiers

Use the minimize method to create compact locale identifiers without losing meaning

Introduction

Locale identifiers like en-Latn-US and zh-Hans-CN contain multiple components that specify language, script, and region. However, not all of these components are necessary to identify a locale. Some components are redundant because they can be inferred from other parts of the identifier.

The minimize() method removes these redundant components to create the shortest equivalent locale identifier. This produces compact identifiers that preserve meaning while reducing storage size and improving readability.

Understanding redundancy in locale identifiers

A locale identifier becomes redundant when it explicitly states information that is already implied by other components. The redundancy occurs because each language has likely default values for script and region.

Consider the identifier en-Latn-US. This identifier specifies:

  • Language: English (en)
  • Script: Latin (Latn)
  • Region: United States (US)

English is only written in the Latin script, and when no region is specified, English defaults to the United States. Both the script and region components are redundant because they match the likely defaults for English. The identifier en conveys the same information.

The same principle applies to other languages. Korean (ko) is written in the Hangul script (Kore) and primarily spoken in South Korea (KR). The identifier ko-Kore-KR contains redundant information because ko alone implies these defaults.

How the minimize method works

The minimize() method is available on Intl.Locale instances. It analyzes the locale identifier and removes components that match likely default values.

const locale = new Intl.Locale("en-Latn-US");
const minimized = locale.minimize();

console.log(minimized.baseName);
// Output: "en"

The method returns a new Intl.Locale instance with redundant subtags removed. It does not modify the original locale object.

The minimization process follows the Unicode CLDR "Remove Likely Subtags" algorithm. This algorithm uses a database of likely subtag associations to determine which components can be removed without losing information.

Components affected by minimize

The minimize() method only affects core locale components: language, script, and region. It does not remove or modify Unicode extension subtags that specify formatting preferences.

const locale = new Intl.Locale("en-Latn-US-u-ca-gregory-nu-latn");
const minimized = locale.minimize();

console.log(minimized.toString());
// Output: "en-u-ca-gregory-nu-latn"

The calendar (ca-gregory) and numbering system (nu-latn) extensions remain intact. Only the redundant script (Latn) and region (US) components are removed.

Examples of minimization

Different locale identifiers minimize to different lengths depending on which components are redundant.

Removing script and region

When both script and region match defaults, both are removed:

const english = new Intl.Locale("en-Latn-US");
console.log(english.minimize().baseName);
// Output: "en"

const korean = new Intl.Locale("ko-Kore-KR");
console.log(korean.minimize().baseName);
// Output: "ko"

const japanese = new Intl.Locale("ja-Jpan-JP");
console.log(japanese.minimize().baseName);
// Output: "ja"

Keeping non-default regions

When the region differs from the default, it remains in the minimized identifier:

const britishEnglish = new Intl.Locale("en-Latn-GB");
console.log(britishEnglish.minimize().baseName);
// Output: "en-GB"

const canadianFrench = new Intl.Locale("fr-Latn-CA");
console.log(canadianFrench.minimize().baseName);
// Output: "fr-CA"

const mexicanSpanish = new Intl.Locale("es-Latn-MX");
console.log(mexicanSpanish.minimize().baseName);
// Output: "es-MX"

The script component is removed because it matches the default, but the region is preserved because it specifies a non-default variant of the language.

Keeping non-default scripts

When the script differs from the default, it remains in the minimized identifier:

const simplifiedChinese = new Intl.Locale("zh-Hans-CN");
console.log(simplifiedChinese.minimize().baseName);
// Output: "zh-Hans"

const traditionalChinese = new Intl.Locale("zh-Hant-TW");
console.log(traditionalChinese.minimize().baseName);
// Output: "zh-Hant"

const serbianCyrillic = new Intl.Locale("sr-Cyrl-RS");
console.log(serbianCyrillic.minimize().baseName);
// Output: "sr-Cyrl"

Chinese requires the script component to distinguish between simplified and traditional variants. Serbian requires the script component to distinguish between Cyrillic and Latin scripts.

Already minimal identifiers

When a locale identifier is already minimal, the method returns an equivalent locale without changes:

const minimal = new Intl.Locale("fr");
console.log(minimal.minimize().baseName);
// Output: "fr"

Relationship to maximize

The minimize() method is the inverse of maximize(). The maximize() method adds likely subtags to create a complete identifier, while minimize() removes redundant subtags to create a compact identifier.

These methods form a pair that allows bidirectional conversion between complete and compact forms:

const compact = new Intl.Locale("en");
const complete = compact.maximize();
console.log(complete.baseName);
// Output: "en-Latn-US"

const compactAgain = complete.minimize();
console.log(compactAgain.baseName);
// Output: "en"

The round trip from compact to complete and back to compact produces the original form.

However, not all locales return to their exact original form after a round trip. The method produces a canonical minimal form rather than preserving the original structure:

const locale = new Intl.Locale("en-US");
const maximized = locale.maximize();
console.log(maximized.baseName);
// Output: "en-Latn-US"

const minimized = maximized.minimize();
console.log(minimized.baseName);
// Output: "en"

The original identifier en-US contained a non-redundant region, but after maximization and minimization, it becomes en. This occurs because the United States is the likely default region for English.

When to use minimize

Use minimize() when you need compact locale identifiers that remain unambiguous. Several scenarios benefit from minimization.

Storing locale preferences

Minimized identifiers reduce storage space in databases, local storage, or configuration files:

function saveUserLocale(localeString) {
  const locale = new Intl.Locale(localeString);
  const minimized = locale.minimize().toString();

  localStorage.setItem("userLocale", minimized);
}

saveUserLocale("en-Latn-US");
// Stores "en" instead of "en-Latn-US"

This reduces the stored data size without losing information.

Creating readable URLs

Minimized identifiers produce cleaner URLs for language selection:

function createLocalizedURL(path, localeString) {
  const locale = new Intl.Locale(localeString);
  const minimized = locale.minimize().baseName;

  return `/${minimized}${path}`;
}

const url = createLocalizedURL("/products", "en-Latn-US");
console.log(url);
// Output: "/en/products"

The URL /en/products is more readable than /en-Latn-US/products.

Comparing locale identifiers

Minimization helps determine if two locale identifiers represent the same locale:

function areLocalesEquivalent(locale1String, locale2String) {
  const locale1 = new Intl.Locale(locale1String).minimize();
  const locale2 = new Intl.Locale(locale2String).minimize();

  return locale1.toString() === locale2.toString();
}

console.log(areLocalesEquivalent("en", "en-Latn-US"));
// Output: true

console.log(areLocalesEquivalent("en-US", "en-Latn-US"));
// Output: true

console.log(areLocalesEquivalent("en-US", "en-GB"));
// Output: false

Minimization produces a canonical form that enables direct comparison.

Normalizing user input

When accepting locale identifiers from users or external systems, minimize them to a standard form:

function normalizeLocale(localeString) {
  try {
    const locale = new Intl.Locale(localeString);
    return locale.minimize().toString();
  } catch (error) {
    return null;
  }
}

console.log(normalizeLocale("en-US"));
// Output: "en"

console.log(normalizeLocale("en-Latn-US"));
// Output: "en"

console.log(normalizeLocale("en-GB"));
// Output: "en-GB"

This function accepts various forms of the same locale and returns a consistent representation.

Combining minimize with other locale operations

The minimize() method works with other Intl.Locale features to create flexible locale handling.

Minimizing after modifying locale properties

When constructing a locale from components, minimize it to remove unnecessary parts:

const locale = new Intl.Locale("en", {
  region: "US",
  script: "Latn"
});

const minimized = locale.minimize();
console.log(minimized.baseName);
// Output: "en"

This ensures the final identifier is as compact as the input components allow.

Preserving extensions while minimizing

Extensions remain intact during minimization, allowing you to minimize core components while keeping formatting preferences:

function createCompactLocaleWithPreferences(language, region, preferences) {
  const locale = new Intl.Locale(language, {
    region: region,
    ...preferences
  });

  return locale.minimize().toString();
}

const localeString = createCompactLocaleWithPreferences("en", "US", {
  hourCycle: "h23",
  calendar: "gregory"
});

console.log(localeString);
// Output: "en-u-ca-gregory-hc-h23"

The core components minimize to en, but the calendar and hour cycle extensions remain.