How to convert text to uppercase or lowercase by locale rules

Use JavaScript to change text case correctly for different languages and writing systems

Introduction

When you convert text between uppercase and lowercase, you might assume this operation works the same way for all languages. It does not. Different writing systems follow different case conversion rules, and these rules can produce unexpected results if you do not account for them.

JavaScript provides the standard toUpperCase() and toLowerCase() methods, which work correctly for English but can produce incorrect results for other languages. The locale-aware methods toLocaleUpperCase() and toLocaleLowerCase() apply language-specific case conversion rules, ensuring text transforms correctly regardless of the language.

This lesson explains why case conversion varies across languages, demonstrates specific problems that arise with standard methods, and shows how to use locale-aware methods to handle case conversion correctly for international applications.

Why case conversion varies by locale

The uppercase and lowercase versions of letters are not universal concepts that work identically across all writing systems. Different languages developed different rules for case conversion based on their historical writing conventions and typographical practices.

In English, case conversion is straightforward. The letter i becomes I when uppercased, and I becomes i when lowercased. This relationship holds for the entire English alphabet.

Other languages have more complex rules. Turkish has four distinct letter i characters instead of two. German has the letter ß (sharp s), which has specific rules for uppercase conversion. Greek has different forms of the letter sigma depending on whether it appears at the end of a word.

When you use standard JavaScript methods like toUpperCase() and toLowerCase(), the conversion follows English rules. This produces incorrect results for text in other languages. Locale-aware methods apply the appropriate rules for each language, ensuring correct conversion.

The Turkish i problem

Turkish provides the clearest example of why locale matters for case conversion. Unlike English, Turkish has four distinct letters related to i:

  • Lowercase dotted i: i (U+0069)
  • Uppercase dotted İ: İ (U+0130)
  • Lowercase dotless ı: ı (U+0131)
  • Uppercase dotless I: I (U+0049)

In Turkish, the lowercase dotted i becomes the uppercase dotted İ. The lowercase dotless ı becomes the uppercase dotless I. These are two separate letter pairs with distinct pronunciations and meanings.

Standard JavaScript methods follow English rules and convert dotted i to dotless I. This changes the meaning of Turkish words and produces incorrect text.

const turkish = "istanbul";

console.log(turkish.toUpperCase());
// Output: "ISTANBUL" (incorrect - uses dotless I)

console.log(turkish.toLocaleUpperCase("tr"));
// Output: "İSTANBUL" (correct - uses dotted İ)

The city name Istanbul contains the dotted i character. When converted to uppercase using Turkish rules, it becomes İSTANBUL with a dotted İ. Using standard toUpperCase() produces ISTANBUL with a dotless I, which is incorrect in Turkish.

The same issue occurs in reverse when converting uppercase Turkish text to lowercase.

const uppercase = "İSTANBUL";

console.log(uppercase.toLowerCase());
// Output: "i̇stanbul" (incorrect - creates i with combining dot above)

console.log(uppercase.toLocaleLowerCase("tr"));
// Output: "istanbul" (correct - produces dotted i)

The dotted İ should become the dotted i when lowercased in Turkish. Standard toLowerCase() does not handle this correctly and can produce a lowercase i with a combining dot character, which appears similar but is technically incorrect.

Other locale-specific case rules

Turkish is not the only language with special case conversion rules. Several other languages require locale-specific handling.

German has the letter ß (sharp s), which traditionally had no uppercase form. In 2017, Unicode added the uppercase ẞ character, but many systems still convert ß to SS when uppercasing.

const german = "Straße";

console.log(german.toUpperCase());
// Output: "STRASSE" (converts ß to SS)

console.log(german.toLocaleUpperCase("de"));
// Output: "STRASSE" (also converts ß to SS)

Both methods produce the same result for German text in most JavaScript environments. The locale parameter does not change the output, but using the locale-aware method ensures your code remains correct if Unicode handling changes in future implementations.

Greek has three different forms of the letter sigma. The lowercase form uses σ in the middle of words and ς at the end of words. Both forms convert to the same uppercase Σ.

Lithuanian has special rules for dotted letters. The letter i retains its dot when combined with certain diacritical marks, even when uppercased. This affects how the locale-aware methods handle specific character combinations.

Using toLocaleUpperCase for locale-aware uppercase conversion

The toLocaleUpperCase() method converts a string to uppercase using locale-specific case mapping rules. You call it on a string and optionally pass a locale identifier as an argument.

const text = "istanbul";

const result = text.toLocaleUpperCase("tr");
console.log(result);
// Output: "İSTANBUL"

This converts the string to uppercase using Turkish rules. The dotted i becomes the dotted İ, which is correct for Turkish.

You can convert the same text using different locale rules.

const text = "istanbul";

console.log(text.toLocaleUpperCase("tr"));
// Output: "İSTANBUL" (Turkish rules - dotted İ)

console.log(text.toLocaleUpperCase("en"));
// Output: "ISTANBUL" (English rules - dotless I)

The locale parameter determines which case conversion rules apply. Turkish rules preserve the dot on the i, while English rules do not.

If you call toLocaleUpperCase() without arguments, it uses the system locale determined by the JavaScript runtime environment.

const text = "istanbul";

const result = text.toLocaleUpperCase();
console.log(result);
// Output depends on system locale

The output depends on the default locale of the JavaScript environment, which typically matches the user's operating system settings.

Using toLocaleLowerCase for locale-aware lowercase conversion

The toLocaleLowerCase() method converts a string to lowercase using locale-specific case mapping rules. It works the same way as toLocaleUpperCase() but converts to lowercase instead of uppercase.

const text = "İSTANBUL";

const result = text.toLocaleLowerCase("tr");
console.log(result);
// Output: "istanbul"

This converts the uppercase Turkish text to lowercase using Turkish rules. The dotted İ becomes the dotted i, producing the correct lowercase form.

Without the locale parameter, standard toLowerCase() or toLocaleLowerCase() with default locale settings may not handle Turkish characters correctly.

const text = "İSTANBUL";

console.log(text.toLowerCase());
// Output: "i̇stanbul" (incorrect - i with combining dot above)

console.log(text.toLocaleLowerCase("tr"));
// Output: "istanbul" (correct - dotted i)

The Turkish dotted İ requires Turkish case rules to convert correctly. Using the locale-aware method with the tr locale ensures correct conversion.

You can also handle the dotless I in Turkish, which should remain dotless when lowercased.

const text = "IRAK";

console.log(text.toLocaleLowerCase("tr"));
// Output: "ırak" (Turkish rules - dotless ı)

console.log(text.toLocaleLowerCase("en"));
// Output: "irak" (English rules - dotted i)

The word IRAK (Iraq in Turkish) uses the dotless I. Turkish case rules convert it to lowercase dotless ı, while English rules convert it to the dotted i.

Specifying locale identifiers

Both toLocaleUpperCase() and toLocaleLowerCase() accept locale identifiers in BCP 47 format. These are the same language tags used throughout the Intl API and other internationalization features.

const text = "Straße";

console.log(text.toLocaleUpperCase("de-DE"));
// Output: "STRASSE"

console.log(text.toLocaleUpperCase("de-AT"));
// Output: "STRASSE"

console.log(text.toLocaleUpperCase("de-CH"));
// Output: "STRASSE"

These examples use different German locales for Germany, Austria, and Switzerland. Case conversion rules are generally consistent across regional variants of the same language, so all three produce the same output.

You can also pass an array of locale identifiers. The method uses the first locale in the array.

const text = "istanbul";

const result = text.toLocaleUpperCase(["tr", "en"]);
console.log(result);
// Output: "İSTANBUL"

The method applies Turkish rules because tr is the first locale in the array. If the runtime does not support the first locale, it falls back to subsequent locales in the array.

Using the browser's locale preferences

In web applications, you can use the user's browser locale preferences to determine which case conversion rules to apply. The navigator.language property returns the user's preferred language.

const userLocale = navigator.language;

const text = "istanbul";
const result = text.toLocaleUpperCase(userLocale);

console.log(result);
// Output varies by user's locale
// For Turkish users: "İSTANBUL"
// For English users: "ISTANBUL"

This automatically applies the correct case rules based on the user's language settings. Turkish users see text converted using Turkish rules, English users see text converted using English rules, and so on.

You can also pass the entire array of locale preferences to enable fallback behavior.

const text = "istanbul";
const result = text.toLocaleUpperCase(navigator.languages);

console.log(result);

The method uses the first locale from the user's preferences, providing better fallback handling when specific locales are unavailable.

Comparing standard and locale-aware methods

The standard toUpperCase() and toLowerCase() methods work correctly for English but can fail for other languages. The locale-aware methods toLocaleUpperCase() and toLocaleLowerCase() handle all languages correctly by applying locale-specific rules.

const turkish = "Diyarbakır";

// Standard methods (incorrect for Turkish)
console.log(turkish.toUpperCase());
// Output: "DIYARBAKIR" (dotless I - incorrect)

console.log(turkish.toUpperCase().toLowerCase());
// Output: "diyarbakir" (dotted i - lost the dotless ı)

// Locale-aware methods (correct for Turkish)
console.log(turkish.toLocaleUpperCase("tr"));
// Output: "DİYARBAKIR" (dotted İ and dotless I - correct)

console.log(turkish.toLocaleUpperCase("tr").toLocaleLowerCase("tr"));
// Output: "diyarbakır" (preserves both i types - correct)

The Turkish city name Diyarbakır contains both types of i. Standard methods cannot preserve this distinction when converting back and forth between cases. Locale-aware methods maintain the correct characters in both directions.

For text that only contains characters with simple case rules, both approaches produce identical results.

const english = "Hello World";

console.log(english.toUpperCase());
// Output: "HELLO WORLD"

console.log(english.toLocaleUpperCase("en"));
// Output: "HELLO WORLD"

English text converts the same way with either method. The locale-aware version is not necessary for English-only text, but using it ensures your code works correctly if the text contains other languages.

When to use locale-aware case conversion

Use locale-aware methods when working with user-generated content or text that may include multiple languages. This ensures correct case conversion regardless of which language the text contains.

function normalizeUsername(username) {
  return username.toLocaleLowerCase();
}

Usernames, email addresses, search terms, and other user input should use locale-aware conversion. This handles international characters correctly and prevents issues with Turkish and other special cases.

Use standard methods only when you know the text contains only English characters and you need maximum performance. Standard methods execute slightly faster because they do not need to check locale rules.

const htmlTag = "<DIV>";
const normalized = htmlTag.toLowerCase();
// Output: "<div>"

HTML tag names, CSS properties, protocol schemes, and other technical identifiers use ASCII characters and do not require locale awareness. Standard methods work correctly for this content.

How character length can change after conversion

Case conversion is not always a one-to-one character mapping. Some characters expand into multiple characters when converted to uppercase, which affects the string length.

const german = "groß";

console.log(german.length);
// Output: 4

const uppercase = german.toLocaleUpperCase("de");
console.log(uppercase);
// Output: "GROSS"

console.log(uppercase.length);
// Output: 5

The German word groß has four characters. When converted to uppercase, the ß becomes SS, producing GROSS with five characters. The string length increases by one character during conversion.

This affects operations that depend on string length or character positions. Do not assume the uppercased or lowercased version of a string has the same length as the original.

const text = "Maße";
const positions = [0, 1, 2, 3];

const uppercase = text.toLocaleUpperCase("de");
// "MASSE" (5 characters)

// Original position mapping no longer valid

The ß at position 2 becomes SS in the uppercase version, shifting all subsequent characters. Character positions from the original string do not correspond to positions in the converted string.

Reusing locale parameters

If you need to convert multiple strings using the same locale, you can store the locale identifier in a variable and reuse it. This makes your code more maintainable and ensures consistent locale handling.

const userLocale = navigator.language;

const city = "istanbul";
const country = "türkiye";

console.log(city.toLocaleUpperCase(userLocale));
console.log(country.toLocaleUpperCase(userLocale));

This approach keeps the locale selection in one place. If you need to change which locale you use, you only need to update the variable definition.

For applications that process large amounts of text, this does not provide a performance benefit. Each call to toLocaleUpperCase() or toLocaleLowerCase() performs the conversion independently. Unlike the Intl API formatters, there is no formatter object to reuse.