How to work with language tags like en-US and fr-CA

Introduction

Language tags are standardized codes that identify specific languages and their regional variations. These tags appear throughout internationalization work. When you detect a user's preferred language, the browser returns language tags. When you format dates or numbers, you pass language tags to the Intl API. When you load translations, you use language tags to determine which content to display.

Understanding how these tags work helps you make better decisions about language selection, fallback behavior, and content organization. This lesson explains the structure of language tags and shows you how to work with them in JavaScript.

What language tags are

A language tag is a string like en, en-US, or zh-Hans-CN that identifies a language and optionally specifies the script and region. These tags follow the BCP 47 standard, which is maintained by the Internet Engineering Task Force and the Internet Assigned Numbers Authority.

BCP 47 stands for Best Current Practice 47. The standard defines how to construct language tags from smaller components called subtags. Each subtag represents a specific aspect of the language, such as which language is being used, which writing system it uses, or which country it is associated with.

Every programming language and internationalization library uses BCP 47 tags. This consistency means you can use the same language identifiers across your entire application, from browser detection to server-side formatting to translation file names.

Structure of language tags

Language tags are composed of subtags separated by hyphens. The three most common subtags are language, script, and region. These subtags always appear in this specific order when present.

The language subtag comes first and is the only required component. It uses a two or three letter code from ISO 639. For example, en represents English, fr represents French, and zh represents Chinese.

The script subtag comes second when present. It uses a four letter code from ISO 15924 that identifies the writing system. For example, Latn represents the Latin alphabet, Cyrl represents Cyrillic, and Hans represents simplified Chinese characters.

The region subtag comes last when present. It uses a two letter code from ISO 3166-1 that typically represents a country. For example, US represents the United States, CA represents Canada, and CN represents China.

Examples of common language tags

Here are examples that demonstrate the different levels of specificity you can express with language tags.

Simple tags with only a language:

en - English (no specific region or script)
fr - French (no specific region or script)
es - Spanish (no specific region or script)

Tags with language and region:

en-US - English as used in the United States
en-GB - English as used in Great Britain
fr-CA - French as used in Canada
es-MX - Spanish as used in Mexico

Tags with language, script, and region:

zh-Hans-CN - Chinese using simplified characters in China
zh-Hant-TW - Chinese using traditional characters in Taiwan
sr-Latn-RS - Serbian using Latin script in Serbia
sr-Cyrl-RS - Serbian using Cyrillic script in Serbia

The level of specificity you need depends on your application. If you only translate text, you might only need language and region. If you work with languages that use multiple writing systems, you need script subtags.

Case conventions for language tags

Language tags are case insensitive. The tags en-US, EN-US, en-us, and En-Us all represent the same language. However, there are conventional capitalization patterns that make tags more readable.

Language subtags conventionally use lowercase letters. Write en, not EN or En.

Script subtags conventionally use title case with the first letter capitalized. Write Latn, not latn or LATN.

Region subtags conventionally use uppercase letters. Write US, not us or Us.

Following these conventions makes your tags easier to read and matches the format used in documentation and specifications. However, your code should accept language tags regardless of capitalization, because the format is officially case insensitive.

Parsing language tags with JavaScript

JavaScript provides the Intl.Locale constructor to parse language tags and extract their components. This constructor accepts a language tag string and returns an object with properties for each subtag.

const locale = new Intl.Locale("en-US");

console.log(locale.language);
// Output: "en"

console.log(locale.region);
// Output: "US"

The Intl.Locale object has properties for each component of the language tag. These properties return undefined if the corresponding subtag is not present in the original tag.

const simple = new Intl.Locale("fr");
console.log(simple.language);
// Output: "fr"

console.log(simple.region);
// Output: undefined

You can parse tags with script subtags the same way.

const complex = new Intl.Locale("zh-Hans-CN");

console.log(complex.language);
// Output: "zh"

console.log(complex.script);
// Output: "Hans"

console.log(complex.region);
// Output: "CN"

This parsing capability is useful when you need to make decisions based on specific components of a language tag. For example, you might want to load different fonts based on the script, or show different content based on the region.

When to use specific versus general tags

Choosing the right level of specificity for language tags depends on what aspects of language and culture your application needs to handle.

Use language-only tags like en or fr when you have a single translation that works for all speakers of that language. This is common for applications with limited localization budgets or languages with minimal regional variation.

Use language and region tags like en-US or fr-CA when you need to account for regional differences in vocabulary, spelling, or cultural conventions. British English and American English use different spellings for many words. Canadian French and European French have different vocabulary and expressions.

Use language, script, and region tags like zh-Hans-CN when you work with languages that use multiple writing systems. Chinese can be written with simplified or traditional characters. Serbian can be written with Latin or Cyrillic alphabets. The script subtag distinguishes these variants.

Extracting language codes for translation files

Many translation systems organize files by language code. You can extract just the language and region from a full language tag to determine which translation file to load.

const userLanguage = "zh-Hans-CN";
const locale = new Intl.Locale(userLanguage);

const translationKey = `${locale.language}-${locale.region}`;
console.log(translationKey);
// Output: "zh-CN"

This approach works even if the user's language tag includes components you do not need for file selection.

Some applications use only the language code without the region.

const userLanguage = "fr-CA";
const locale = new Intl.Locale(userLanguage);

const translationKey = locale.language;
console.log(translationKey);
// Output: "fr"

The structure you choose for translation file names should match how you extract components from language tags.

Using language tags with the Intl API

The Intl API accepts language tags directly in all its constructors. You do not need to parse the tag yourself unless you need to inspect specific components.

const date = new Date("2025-03-15");

const usFormat = new Intl.DateTimeFormat("en-US").format(date);
console.log(usFormat);
// Output: "3/15/2025"

const gbFormat = new Intl.DateTimeFormat("en-GB").format(date);
console.log(gbFormat);
// Output: "15/03/2025"

The Intl API uses the language tag to determine which formatting conventions to apply. Different regions format dates, numbers, and currencies differently, even when they speak the same language.

You can pass the language tag you get from the browser directly to Intl constructors.

const userLanguage = navigator.language;
const formatter = new Intl.NumberFormat(userLanguage);

console.log(formatter.format(1234.5));
// Output varies by language
// For "en-US": "1,234.5"
// For "de-DE": "1.234,5"

This is the most common pattern in client-side internationalization. Detect the user's language, then use that language tag throughout your application to format content appropriately.

Handling invalid language tags

The Intl.Locale constructor throws a RangeError if you pass an invalid language tag. You should handle this error when working with language tags from untrusted sources.

try {
  const locale = new Intl.Locale("invalid-tag-format");
} catch (error) {
  console.log(error.name);
  // Output: "RangeError"

  console.log(error.message);
  // Output: "invalid language tag: invalid-tag-format"
}

Most language tags from browsers are valid, but user input or external data sources might contain malformed tags. Wrapping the constructor in error handling prevents these invalid tags from crashing your application.