How to add likely subtags to incomplete locales

Use JavaScript to complete partial locale identifiers with the most probable script and region

Introduction

When working with locale identifiers, you sometimes receive incomplete information. A user might specify only a language code like ja without indicating the script or region. While this partial identifier is valid, it lacks the specificity needed for certain operations like comparing locales or determining formatting conventions.

JavaScript provides a way to complete these partial identifiers by adding the most likely missing components. This process uses language data to infer the script and region that users of that language typically use.

This guide explains what likely subtags are, how JavaScript determines them, and when to use this feature in your applications.

What are likely subtags

Likely subtags are the script and region codes that most commonly appear with a given language. These associations come from real-world language usage data maintained by the Unicode Consortium.

For example, English is typically written in Latin script and most commonly used in the United States. If you have just the language code en, the likely subtags are Latn for the script and US for the region, giving you the complete identifier en-Latn-US.

The likelihood is based on speaker populations and historical usage patterns. The algorithm always returns the most statistically common combination.

Why add likely subtags

Partial locale identifiers work for most formatting operations. The Intl.DateTimeFormat and Intl.NumberFormat APIs accept en and will apply sensible defaults. However, there are situations where having complete identifiers is necessary.

Comparing locale identifiers

When comparing two locale identifiers to see if they refer to the same language and region, partial identifiers create ambiguity. Does en mean the same thing as en-US, or is it different because one specifies a region and one does not?

Adding likely subtags removes this ambiguity. Both en and en-US maximize to en-Latn-US, making them directly comparable.

const locale1 = new Intl.Locale("en");
const locale2 = new Intl.Locale("en-US");

console.log(locale1.baseName === locale2.baseName);
// false - they look different

const maximized1 = locale1.maximize();
const maximized2 = locale2.maximize();

console.log(maximized1.baseName === maximized2.baseName);
// true - both are "en-Latn-US"

Storing canonical forms

When storing locale identifiers in databases or configuration files, using complete forms ensures consistency. Every French locale becomes fr-Latn-FR, every Japanese locale becomes ja-Jpan-JP, and so on.

This consistency makes searching, filtering, and grouping by locale more reliable.

Determining script-specific behavior

Some languages use multiple scripts, and the script affects text rendering, font selection, and collation. Chinese can be written in simplified or traditional characters, and Serbian can use Cyrillic or Latin script.

Adding likely subtags makes the script explicit. If a user provides zh without specifying a script, maximizing it produces zh-Hans-CN, indicating simplified Chinese characters are expected.

How the algorithm works

The Add Likely Subtags algorithm uses a database of language usage information to determine missing components. This database is maintained by the Unicode Consortium as part of the Common Locale Data Repository.

The algorithm examines what information you provide and fills in the gaps:

  • If you provide only a language, it adds the most common script and region for that language
  • If you provide a language and script, it adds the most common region for that combination
  • If you provide a language and region, it adds the most common script for that combination
  • If you provide all three components, they remain unchanged

The decisions are based on statistical data about language usage worldwide.

Using the maximize method

The maximize() method is available on Intl.Locale objects. It returns a new locale object with likely subtags added to the base name.

const locale = new Intl.Locale("ja");
const maximized = locale.maximize();

console.log(locale.baseName);
// "ja"

console.log(maximized.baseName);
// "ja-Jpan-JP"

The method does not modify the original locale object. It creates and returns a new one.

Examples with different languages

Different languages have different likely subtags based on where they are primarily spoken and which scripts they use.

European languages

French maximizes to France with Latin script:

const french = new Intl.Locale("fr");
const maximized = french.maximize();

console.log(maximized.baseName);
// "fr-Latn-FR"

German maximizes to Germany with Latin script:

const german = new Intl.Locale("de");
const maximized = german.maximize();

console.log(maximized.baseName);
// "de-Latn-DE"

Languages with non-Latin scripts

Japanese maximizes to Japan with Japanese script:

const japanese = new Intl.Locale("ja");
const maximized = japanese.maximize();

console.log(maximized.baseName);
// "ja-Jpan-JP"

Arabic maximizes to Egypt with Arabic script:

const arabic = new Intl.Locale("ar");
const maximized = arabic.maximize();

console.log(maximized.baseName);
// "ar-Arab-EG"

Chinese without a script maximizes to simplified characters and China:

const chinese = new Intl.Locale("zh");
const maximized = chinese.maximize();

console.log(maximized.baseName);
// "zh-Hans-CN"

Partial identifiers with regions

When you provide a language and region but no script, the algorithm adds the script:

const britishEnglish = new Intl.Locale("en-GB");
const maximized = britishEnglish.maximize();

console.log(maximized.baseName);
// "en-Latn-GB"

The region remains as specified. Only the missing script is added.

Partial identifiers with scripts

When you provide a language and script but no region, the algorithm adds the most common region for that script:

const traditionalChinese = new Intl.Locale("zh-Hant");
const maximized = traditionalChinese.maximize();

console.log(maximized.baseName);
// "zh-Hant-TW"

Traditional Chinese characters are primarily used in Taiwan, so TW is added as the region.

Extension tags are preserved

Unicode extension tags specify formatting preferences like calendar systems, numbering systems, and hour cycles. These tags appear after -u- in the locale identifier.

The maximize() method does not change extension tags. It only affects the language, script, and region components.

const locale = new Intl.Locale("fr", {
  calendar: "gregory",
  numberingSystem: "latn",
  hourCycle: "h23"
});

console.log(locale.toString());
// "fr-u-ca-gregory-hc-h23-nu-latn"

const maximized = locale.maximize();

console.log(maximized.toString());
// "fr-Latn-FR-u-ca-gregory-hc-h23-nu-latn"

The base name changes from fr to fr-Latn-FR, but the extension tags remain identical.

When to use maximize

Use the maximize() method when you need complete locale identifiers for consistency or comparison purposes.

Normalizing user input

Users might enter locales in various forms. Some might type en, others en-US, and others en-Latn-US. Maximizing all inputs creates a consistent format:

function normalizeLocale(input) {
  try {
    const locale = new Intl.Locale(input);
    const maximized = locale.maximize();
    return maximized.baseName;
  } catch (error) {
    return null;
  }
}

console.log(normalizeLocale("en"));
// "en-Latn-US"

console.log(normalizeLocale("en-US"));
// "en-Latn-US"

console.log(normalizeLocale("en-Latn-US"));
// "en-Latn-US"

All three inputs produce the same normalized form.

Building locale fallback chains

When a specific locale is not available, applications fall back to more general locales. Maximizing helps build these chains correctly:

function buildFallbackChain(localeString) {
  const locale = new Intl.Locale(localeString);
  const maximized = locale.maximize();

  const chain = [maximized.toString()];

  if (maximized.script && maximized.region) {
    const withoutRegion = new Intl.Locale(
      `${maximized.language}-${maximized.script}`
    );
    chain.push(withoutRegion.toString());
  }

  if (maximized.region) {
    chain.push(maximized.language);
  }

  chain.push("en");

  return chain;
}

console.log(buildFallbackChain("zh-TW"));
// ["zh-Hant-TW", "zh-Hant", "zh", "en"]

This creates a proper fallback from the most specific to the most general locale.

Matching user preferences to available locales

When you have a set of available translations and need to find the best match for a user preference, maximizing both sides enables accurate comparison:

function findBestMatch(userPreference, availableLocales) {
  const userMaximized = new Intl.Locale(userPreference).maximize();

  const matches = availableLocales.map(available => {
    const availableMaximized = new Intl.Locale(available).maximize();

    let score = 0;
    if (userMaximized.language === availableMaximized.language) score += 1;
    if (userMaximized.script === availableMaximized.script) score += 1;
    if (userMaximized.region === availableMaximized.region) score += 1;

    return { locale: available, score };
  });

  matches.sort((a, b) => b.score - a.score);

  return matches[0].locale;
}

const available = ["en-US", "en-GB", "fr-FR", "de-DE"];
console.log(findBestMatch("en", available));
// "en-US"

The function expands the user preference en to en-Latn-US and finds the closest match.

When not to use maximize

You do not need to maximize locales before passing them to formatting APIs. The Intl.DateTimeFormat, Intl.NumberFormat, and other formatting constructors handle partial identifiers correctly.

const date = new Date("2025-03-15");

const partial = new Intl.DateTimeFormat("fr").format(date);
const maximized = new Intl.DateTimeFormat("fr-Latn-FR").format(date);

console.log(partial);
// "15/03/2025"

console.log(maximized);
// "15/03/2025"

Both produce identical output. The additional specificity does not change formatting behavior in this case.

Use maximize() when you need the explicit information for your own logic, not when passing locales to built-in formatters.

Browser support

The maximize() method is available in all modern browsers. Chrome, Firefox, Safari, and Edge all support it as part of the Intl.Locale API.

Node.js supports maximize() starting from version 12, with full support in version 14 and later.

Summary

Likely subtags complete partial locale identifiers by adding the most common script and region for a given language. The Intl.Locale.maximize() method implements the Unicode Add Likely Subtags algorithm to perform this expansion.

Key points:

  • Likely subtags are based on real-world language usage data
  • The maximize() method adds missing script and region codes
  • Extension tags for calendars and numbering systems remain unchanged
  • Use maximization for normalizing user input and comparing locales
  • Formatting APIs do not require maximized locales

The maximize() method provides a standardized way to work with complete locale identifiers when your application logic requires explicit script and region information.