How to extract language, country, script from a locale

Use JavaScript to parse locale identifiers and access their individual components

Introduction

Locale identifiers like en-US, fr-CA, and zh-Hans-CN encode multiple pieces of information in a single string. These components tell you the language being used, the region where it is spoken, and sometimes the writing system.

When building internationalized applications, you often need to extract these individual components. You might want to display only the language name to users, group locales by region, or check which script a locale uses. Instead of parsing strings manually with regular expressions, JavaScript provides the Intl.Locale API to extract components reliably.

This guide explains what components exist in locale identifiers, how to extract them using the Intl.Locale API, and when you would need to use these components in practice.

What components exist in locale identifiers

Locale identifiers follow the BCP 47 standard, which defines a structure for describing languages and regional variations. A complete locale identifier can contain several components separated by hyphens.

The three most common components are:

  • Language: The primary language being used, like English, Spanish, or Chinese
  • Region: The geographic area where the language is used, like the United States, Canada, or China
  • Script: The writing system used to represent the language, like Latin, Cyrillic, or Han characters

A simple locale identifier contains only a language code:

en

Most locale identifiers include a language and region:

en-US
fr-CA
es-MX

Some locale identifiers include a script when the language can be written in multiple writing systems:

zh-Hans-CN
zh-Hant-TW
sr-Cyrl-RS
sr-Latn-RS

Understanding these components helps you make decisions about language fallback, content selection, and user interface customization.

Using Intl.Locale to extract components

The Intl.Locale API converts locale identifier strings into structured objects. Once you create a locale object, you can read its components through properties.

Create a locale object by passing the identifier to the constructor:

const locale = new Intl.Locale("en-US");

console.log(locale.language); // "en"
console.log(locale.region); // "US"

The locale object exposes properties that correspond to each component of the identifier. These properties provide structured access without requiring string parsing.

Extracting the language code

The language property returns the language component of the locale identifier. This is the two or three letter code that identifies the primary language.

const english = new Intl.Locale("en-US");
console.log(english.language); // "en"

const french = new Intl.Locale("fr-CA");
console.log(french.language); // "fr"

const chinese = new Intl.Locale("zh-Hans-CN");
console.log(chinese.language); // "zh"

Language codes follow the ISO 639 standard. Common codes include en for English, es for Spanish, fr for French, de for German, ja for Japanese, and zh for Chinese.

The language code is always present in a valid locale identifier. It is the only required component.

const languageOnly = new Intl.Locale("ja");
console.log(languageOnly.language); // "ja"
console.log(languageOnly.region); // undefined

When you extract the language code, you can use it to select translations, determine text processing rules, or build language selectors for users.

Extracting the region code

The region property returns the region component of the locale identifier. This is the two letter code that identifies the geographic area where the language is used.

const americanEnglish = new Intl.Locale("en-US");
console.log(americanEnglish.region); // "US"

const britishEnglish = new Intl.Locale("en-GB");
console.log(britishEnglish.region); // "GB"

const canadianFrench = new Intl.Locale("fr-CA");
console.log(canadianFrench.region); // "CA"

Region codes follow the ISO 3166-1 standard. They use two uppercase letters to represent countries and territories. Common codes include US for United States, GB for United Kingdom, CA for Canada, MX for Mexico, FR for France, and CN for China.

The region code changes how dates, numbers, and currencies are formatted. American English uses month-day-year dates and periods for decimal separators. British English uses day-month-year dates and commas for thousands separators.

Region codes are optional in locale identifiers. When a locale has no region, the region property returns undefined:

const genericSpanish = new Intl.Locale("es");
console.log(genericSpanish.region); // undefined

When you extract the region code, you can use it to customize regional formatting, select region-specific content, or display location information to users.

Extracting the script code

The script property returns the script component of the locale identifier. This is the four letter code that identifies the writing system used to represent the language.

const simplifiedChinese = new Intl.Locale("zh-Hans-CN");
console.log(simplifiedChinese.script); // "Hans"

const traditionalChinese = new Intl.Locale("zh-Hant-TW");
console.log(traditionalChinese.script); // "Hant"

const serbianCyrillic = new Intl.Locale("sr-Cyrl-RS");
console.log(serbianCyrillic.script); // "Cyrl"

const serbianLatin = new Intl.Locale("sr-Latn-RS");
console.log(serbianLatin.script); // "Latn"

Script codes follow the ISO 15924 standard. They use four letters with the first letter capitalized. Common codes include Latn for Latin script, Cyrl for Cyrillic script, Hans for Simplified Han characters, Hant for Traditional Han characters, and Arab for Arabic script.

Most locales omit the script code because each language has a default writing system. English defaults to Latin script, so you write en instead of en-Latn. Russian defaults to Cyrillic, so you write ru instead of ru-Cyrl.

Script codes appear when a language can be written in multiple ways. Chinese uses both Simplified and Traditional characters. Serbian uses both Cyrillic and Latin alphabets. In these cases, the script code disambiguates which writing system to use.

When the locale has no explicit script code, the script property returns undefined:

const english = new Intl.Locale("en-US");
console.log(english.script); // undefined

When you extract the script code, you can use it to select fonts, determine text rendering, or filter content by writing system.

Understanding when components are undefined

Not all locale identifiers include all components. The language code is required, but region and script are optional.

When a component is not present in the identifier, the corresponding property returns undefined:

const locale = new Intl.Locale("fr");

console.log(locale.language); // "fr"
console.log(locale.region); // undefined
console.log(locale.script); // undefined

This behavior lets you check whether a locale specifies a region or script before using those values:

const locale = new Intl.Locale("en-US");

if (locale.region) {
  console.log(`Region-specific formatting for ${locale.region}`);
} else {
  console.log("Using default formatting");
}

You can use the nullish coalescing operator to provide default values:

const locale = new Intl.Locale("es");
const region = locale.region ?? "ES";

console.log(region); // "ES"

When building locale fallback chains, checking for undefined components helps you construct alternatives:

function buildFallbackChain(identifier) {
  const locale = new Intl.Locale(identifier);
  const fallbacks = [identifier];

  if (locale.region) {
    fallbacks.push(locale.language);
  }

  return fallbacks;
}

console.log(buildFallbackChain("fr-CA")); // ["fr-CA", "fr"]
console.log(buildFallbackChain("fr")); // ["fr"]

This creates a list of locale identifiers ordered from most specific to most general.

Practical use cases for extracting components

Extracting locale components solves several common problems when building internationalized applications.

Grouping locales by language

When displaying a list of available languages, group locales that share the same language code:

const locales = ["en-US", "en-GB", "fr-FR", "fr-CA", "es-ES", "es-MX"];

const grouped = locales.reduce((groups, identifier) => {
  const locale = new Intl.Locale(identifier);
  const language = locale.language;

  if (!groups[language]) {
    groups[language] = [];
  }

  groups[language].push(identifier);
  return groups;
}, {});

console.log(grouped);
// {
//   en: ["en-US", "en-GB"],
//   fr: ["fr-FR", "fr-CA"],
//   es: ["es-ES", "es-MX"]
// }

This organization helps users find their preferred regional variation within a language.

Building locale selectors

When building a user interface for language selection, extract components to display meaningful labels:

function buildLocaleSelector(identifiers) {
  return identifiers.map(identifier => {
    const locale = new Intl.Locale(identifier);

    const languageNames = new Intl.DisplayNames([identifier], {
      type: "language"
    });

    const regionNames = new Intl.DisplayNames([identifier], {
      type: "region"
    });

    return {
      value: identifier,
      language: languageNames.of(locale.language),
      region: locale.region ? regionNames.of(locale.region) : null
    };
  });
}

const options = buildLocaleSelector(["en-US", "en-GB", "fr-FR"]);
console.log(options);
// [
//   { value: "en-US", language: "English", region: "United States" },
//   { value: "en-GB", language: "English", region: "United Kingdom" },
//   { value: "fr-FR", language: "French", region: "France" }
// ]

This provides human-readable labels for each locale option.

Filtering by region

When you need to show content specific to a region, extract the region code to filter locales:

function filterByRegion(identifiers, targetRegion) {
  return identifiers.filter(identifier => {
    const locale = new Intl.Locale(identifier);
    return locale.region === targetRegion;
  });
}

const allLocales = ["en-US", "es-US", "en-GB", "fr-FR", "zh-CN"];
const usLocales = filterByRegion(allLocales, "US");

console.log(usLocales); // ["en-US", "es-US"]

This helps you select locales appropriate for users in a specific country.

Checking script compatibility

When selecting fonts or rendering text, check the script to ensure compatibility:

function selectFont(identifier) {
  const locale = new Intl.Locale(identifier);
  const script = locale.script;

  if (script === "Hans" || script === "Hant") {
    return "Noto Sans CJK";
  } else if (script === "Arab") {
    return "Noto Sans Arabic";
  } else if (script === "Cyrl") {
    return "Noto Sans";
  } else {
    return "Noto Sans";
  }
}

console.log(selectFont("zh-Hans-CN")); // "Noto Sans CJK"
console.log(selectFont("ar-SA")); // "Noto Sans Arabic"
console.log(selectFont("en-US")); // "Noto Sans"

This ensures text renders correctly for each writing system.

Implementing language fallback

When the user's preferred locale is not available, fall back to the base language:

function selectBestLocale(userPreference, supportedLocales) {
  const user = new Intl.Locale(userPreference);

  if (supportedLocales.includes(userPreference)) {
    return userPreference;
  }

  const languageMatch = supportedLocales.find(supported => {
    const locale = new Intl.Locale(supported);
    return locale.language === user.language;
  });

  if (languageMatch) {
    return languageMatch;
  }

  return supportedLocales[0];
}

const supported = ["en-US", "fr-FR", "es-ES"];

console.log(selectBestLocale("en-GB", supported)); // "en-US"
console.log(selectBestLocale("fr-CA", supported)); // "fr-FR"
console.log(selectBestLocale("de-DE", supported)); // "en-US"

This provides graceful fallback when exact matches are not available.