How do you find where to break text at character or word boundaries?
Locate safe text break positions for truncation, wrapping, and cursor operations
Introduction
When you truncate text, position a cursor, or handle clicks in a text editor, you need to find where one character ends and another begins, or where words start and end. Breaking text at the wrong position splits emoji, cuts through combining characters, or divides words incorrectly.
JavaScript's Intl.Segmenter API provides the containing() method to find the text segment at any position in a string. This tells you which character or word contains a specific index, where that segment starts, and where it ends. You can use this information to find safe break points that respect grapheme cluster boundaries and linguistic word boundaries across all languages.
This article explains why breaking text at arbitrary positions fails, how to find text boundaries with Intl.Segmenter, and how to use boundary information for truncation, cursor positioning, and text selection.
Why you cannot break text at any position
JavaScript strings consist of code units, not complete characters. A single emoji, accented letter, or flag can span multiple code units. If you cut a string at an arbitrary index, you risk splitting a character in the middle.
Consider this example:
const text = "Hello π¨βπ©βπ§βπ¦ world";
const truncated = text.slice(0, 10);
console.log(truncated); // "Hello π¨βοΏ½"
The family emoji uses 11 code units. Cutting at position 10 splits the emoji, producing broken output with a replacement character.
For words, breaking at the wrong position creates fragments that do not match user expectations:
const text = "Hello world";
const fragment = text.slice(0, 7);
console.log(fragment); // "Hello w"
Users expect text to break between words, not in the middle of a word. Finding the boundary before or after position 7 produces better results.
Finding the text segment at a specific position
The containing() method returns information about the text segment that includes a specific index:
const segmenter = new Intl.Segmenter("en", { granularity: "grapheme" });
const text = "Hello ππ½";
const segments = segmenter.segment(text);
const segment = segments.containing(6);
console.log(segment);
// { segment: "ππ½", index: 6, input: "Hello ππ½" }
The emoji at position 6 spans four code units (from index 6 to 9). The containing() method returns:
segment: the complete grapheme cluster as a stringindex: where this segment starts in the original stringinput: reference to the original string
This tells you that position 6 is inside the emoji, the emoji starts at index 6, and the complete emoji is "ππ½".
Finding safe truncation points for text
To truncate text without breaking characters, find the grapheme boundary before your target position:
function truncateAtPosition(text, maxIndex) {
const segmenter = new Intl.Segmenter("en", { granularity: "grapheme" });
const segments = segmenter.segment(text);
const segment = segments.containing(maxIndex);
// Truncate before this segment to avoid breaking it
return text.slice(0, segment.index);
}
truncateAtPosition("Hello π¨βπ©βπ§βπ¦ world", 10);
// "Hello " (stops before the emoji, not in the middle)
truncateAtPosition("cafΓ©", 3);
// "caf" (stops before Γ©)
This function finds the segment at the target position and truncates before it, ensuring you never split a grapheme cluster.
To truncate after the segment instead of before:
function truncateAfterPosition(text, minIndex) {
const segmenter = new Intl.Segmenter("en", { granularity: "grapheme" });
const segments = segmenter.segment(text);
const segment = segments.containing(minIndex);
const endIndex = segment.index + segment.segment.length;
return text.slice(0, endIndex);
}
truncateAfterPosition("Hello π¨βπ©βπ§βπ¦ world", 10);
// "Hello π¨βπ©βπ§βπ¦ " (includes the complete emoji)
This includes the entire segment that contains the target position.
Finding word boundaries for text wrapping
When wrapping text at a maximum width, you want to break between words, not in the middle of a word. Use word segmentation to find the word boundary before your target position:
function findWordBreakBefore(text, position, locale) {
const segmenter = new Intl.Segmenter(locale, { granularity: "word" });
const segments = segmenter.segment(text);
const segment = segments.containing(position);
// If we're in a word, break before it
if (segment.isWordLike) {
return segment.index;
}
// If we're in whitespace or punctuation, break here
return position;
}
const text = "Hello world";
findWordBreakBefore(text, 7, "en");
// 5 (the space before "world")
const textZh = "δ½ ε₯½δΈη";
findWordBreakBefore(textZh, 6, "zh");
// 6 (the boundary before "δΈη")
This function finds the start of the word that contains the target position. If the position is already in whitespace, it returns the position unchanged.
For text wrapping that respects word boundaries:
function wrapTextAtWidth(text, maxLength, locale) {
const segmenter = new Intl.Segmenter(locale, { granularity: "word" });
const segments = Array.from(segmenter.segment(text));
const lines = [];
let currentLine = "";
for (const { segment, isWordLike } of segments) {
const potentialLine = currentLine + segment;
if (potentialLine.length <= maxLength) {
currentLine = potentialLine;
} else {
if (currentLine) {
lines.push(currentLine.trim());
}
currentLine = isWordLike ? segment : "";
}
}
if (currentLine) {
lines.push(currentLine.trim());
}
return lines;
}
wrapTextAtWidth("Hello world from JavaScript", 12, "en");
// ["Hello world", "from", "JavaScript"]
wrapTextAtWidth("δ½ ε₯½δΈηζ¬’θΏδ½Ώη¨", 6, "zh");
// ["δ½ ε₯½δΈη", "ζ¬’θΏδ½Ώη¨"]
This function splits text into lines that respect word boundaries and fit within the maximum length.
Finding which word contains a cursor position
In text editors, you need to know which word the cursor is in to implement features like double-click selection, spell checking, or contextual menus:
function getWordAtPosition(text, position, locale) {
const segmenter = new Intl.Segmenter(locale, { granularity: "word" });
const segments = segmenter.segment(text);
const segment = segments.containing(position);
if (!segment.isWordLike) {
return null;
}
return {
word: segment.segment,
start: segment.index,
end: segment.index + segment.segment.length
};
}
const text = "Hello world";
getWordAtPosition(text, 7, "en");
// { word: "world", start: 6, end: 11 }
getWordAtPosition(text, 5, "en");
// null (position 5 is the space, not a word)
This returns the word at the cursor position along with its start and end indices, or null if the cursor is not in a word.
Use this for implementing double-click text selection:
function selectWordAtPosition(text, position, locale) {
const wordInfo = getWordAtPosition(text, position, locale);
if (!wordInfo) {
return { start: position, end: position };
}
return { start: wordInfo.start, end: wordInfo.end };
}
selectWordAtPosition("Hello world", 7, "en");
// { start: 6, end: 11 } (selects "world")
Finding sentence boundaries for navigation
For document navigation or text-to-speech segmentation, find which sentence contains a specific position:
function getSentenceAtPosition(text, position, locale) {
const segmenter = new Intl.Segmenter(locale, { granularity: "sentence" });
const segments = segmenter.segment(text);
const segment = segments.containing(position);
return {
sentence: segment.segment,
start: segment.index,
end: segment.index + segment.segment.length
};
}
const text = "Hello world. How are you? Fine thanks.";
getSentenceAtPosition(text, 15, "en");
// { sentence: "How are you? ", start: 13, end: 26 }
This finds the complete sentence that contains the target position, including its boundaries.
Finding the next boundary after a position
To move forward by one grapheme, word, or sentence, iterate through segments until you find one that starts after your current position:
function findNextBoundary(text, position, granularity, locale) {
const segmenter = new Intl.Segmenter(locale, { granularity });
const segments = Array.from(segmenter.segment(text));
for (const segment of segments) {
if (segment.index > position) {
return segment.index;
}
}
return text.length;
}
const text = "Hello π¨βπ©βπ§βπ¦ world";
findNextBoundary(text, 0, "grapheme", "en");
// 1 (boundary after "H")
findNextBoundary(text, 6, "grapheme", "en");
// 17 (boundary after the family emoji)
findNextBoundary(text, 0, "word", "en");
// 5 (boundary after "Hello")
This finds where the next segment begins, which is the safe position to move the cursor or truncate text.
Finding the previous boundary before a position
To move backward by one grapheme, word, or sentence, find the segment before your current position:
function findPreviousBoundary(text, position, granularity, locale) {
const segmenter = new Intl.Segmenter(locale, { granularity });
const segments = Array.from(segmenter.segment(text));
let previousIndex = 0;
for (const segment of segments) {
if (segment.index >= position) {
return previousIndex;
}
previousIndex = segment.index;
}
return previousIndex;
}
const text = "Hello π¨βπ©βπ§βπ¦ world";
findPreviousBoundary(text, 17, "grapheme", "en");
// 6 (boundary before the family emoji)
findPreviousBoundary(text, 11, "word", "en");
// 6 (boundary before "world")
This finds where the previous segment starts, which is the safe position to move the cursor backward.
Implementing cursor movement with boundaries
Combine boundary finding with cursor position to implement proper cursor movement:
function moveCursorForward(text, cursorPosition, locale) {
return findNextBoundary(text, cursorPosition, "grapheme", locale);
}
function moveCursorBackward(text, cursorPosition, locale) {
return findPreviousBoundary(text, cursorPosition, "grapheme", locale);
}
function moveWordForward(text, cursorPosition, locale) {
const segmenter = new Intl.Segmenter(locale, { granularity: "word" });
const segments = Array.from(segmenter.segment(text));
for (const segment of segments) {
if (segment.index > cursorPosition && segment.isWordLike) {
return segment.index;
}
}
return text.length;
}
function moveWordBackward(text, cursorPosition, locale) {
const segmenter = new Intl.Segmenter(locale, { granularity: "word" });
const segments = Array.from(segmenter.segment(text));
let previousWordIndex = 0;
for (const segment of segments) {
if (segment.index >= cursorPosition) {
return previousWordIndex;
}
if (segment.isWordLike) {
previousWordIndex = segment.index;
}
}
return previousWordIndex;
}
const text = "Hello π¨βπ©βπ§βπ¦ world";
moveCursorForward(text, 6, "en");
// 17 (moves over the entire emoji)
moveWordForward(text, 0, "en");
// 6 (moves to the start of "world")
These functions implement standard text editor cursor movement that respects grapheme and word boundaries.
Finding all break opportunities in text
To find every position where you can safely break text, iterate through all segments and collect their start indices:
function getBreakOpportunities(text, granularity, locale) {
const segmenter = new Intl.Segmenter(locale, { granularity });
const segments = Array.from(segmenter.segment(text));
return segments.map(segment => segment.index);
}
const text = "Hello π¨βπ©βπ§βπ¦ world";
getBreakOpportunities(text, "grapheme", "en");
// [0, 1, 2, 3, 4, 5, 6, 17, 18, 19, 20, 21, 22]
getBreakOpportunities(text, "word", "en");
// [0, 5, 6, 17, 18, 22]
This returns an array of every valid break position in the text. Use this for implementing advanced text layout or analysis features.
Handling edge cases with boundaries
When the position is at the very end of the text, containing() returns the last segment:
const segmenter = new Intl.Segmenter("en", { granularity: "grapheme" });
const text = "Hello";
const segments = segmenter.segment(text);
const segment = segments.containing(5);
console.log(segment);
// { segment: "o", index: 4, input: "Hello" }
The position is at the end, so it returns the last grapheme.
When the position is before the first character, containing() returns the first segment:
const segment = segments.containing(0);
console.log(segment);
// { segment: "H", index: 0, input: "Hello" }
For empty strings, there are no segments, so calling containing() on an empty string returns undefined. Check for empty strings before using containing():
function safeContaining(text, position, granularity, locale) {
if (text.length === 0) {
return null;
}
const segmenter = new Intl.Segmenter(locale, { granularity });
const segments = segmenter.segment(text);
return segments.containing(position);
}
Choosing the right granularity for boundaries
Use different granularities based on what you need to find:
-
Grapheme: Use when implementing cursor movement, character deletion, or any operation that needs to respect what users see as single characters. This prevents splitting emoji, combining characters, or other complex grapheme clusters.
-
Word: Use for word selection, spell checking, word count, or any operation that needs linguistic word boundaries. This works across languages, including those without spaces between words.
-
Sentence: Use for sentence navigation, text-to-speech segmentation, or any operation that processes text sentence by sentence. This respects abbreviations and other contexts where periods do not end sentences.
Do not use word boundaries when you need character boundaries, and do not use grapheme boundaries when you need word boundaries. Each serves a specific purpose.
Browser support for boundary operations
The Intl.Segmenter API and its containing() method reached Baseline status in April 2024. Current versions of Chrome, Firefox, Safari, and Edge support it. Older browsers do not.
Check for support before using:
if (typeof Intl.Segmenter !== "undefined") {
const segmenter = new Intl.Segmenter("en", { granularity: "word" });
const segments = segmenter.segment(text);
const segment = segments.containing(position);
// Use segment information
} else {
// Fallback for older browsers
// Use approximate boundaries based on string length
}
For applications targeting older browsers, provide fallback behavior using approximate boundaries, or use a polyfill that implements the Intl.Segmenter API.
Common mistakes when finding boundaries
Do not assume every code unit is a valid break point. Many positions split grapheme clusters or words, producing invalid or unexpected results.
Do not use string.length to find the end boundary. Use the last segment's index plus its length instead.
Do not forget to check isWordLike when working with word boundaries. Non-word segments like spaces and punctuation are also returned by the segmenter.
Do not assume word boundaries are the same across languages. Use locale-aware segmentation for correct results.
Do not call containing() repeatedly for performance-critical operations. If you need multiple boundaries, iterate through segments once and build an index.
Performance considerations for boundary operations
Creating a segmenter is fast, but iterating through all segments can be slow for very long text. For operations that need multiple boundaries, consider caching segment information:
class TextBoundaryCache {
constructor(text, granularity, locale) {
this.text = text;
const segmenter = new Intl.Segmenter(locale, { granularity });
this.segments = Array.from(segmenter.segment(text));
}
containing(position) {
for (const segment of this.segments) {
const end = segment.index + segment.segment.length;
if (position >= segment.index && position < end) {
return segment;
}
}
return this.segments[this.segments.length - 1];
}
nextBoundary(position) {
for (const segment of this.segments) {
if (segment.index > position) {
return segment.index;
}
}
return this.text.length;
}
previousBoundary(position) {
let previous = 0;
for (const segment of this.segments) {
if (segment.index >= position) {
return previous;
}
previous = segment.index;
}
return previous;
}
}
const cache = new TextBoundaryCache("Hello world", "grapheme", "en");
cache.containing(7);
cache.nextBoundary(7);
cache.previousBoundary(7);
This caches all segments once and provides fast lookups for multiple operations.
Practical example: text truncation with ellipsis
Combine boundary finding with truncation to build a function that cuts text at the last complete word before a maximum length:
function truncateAtWordBoundary(text, maxLength, locale) {
if (text.length <= maxLength) {
return text;
}
const segmenter = new Intl.Segmenter(locale, { granularity: "word" });
const segments = Array.from(segmenter.segment(text));
let lastWordEnd = 0;
for (const segment of segments) {
const segmentEnd = segment.index + segment.segment.length;
if (segmentEnd > maxLength) {
break;
}
if (segment.isWordLike) {
lastWordEnd = segmentEnd;
}
}
if (lastWordEnd === 0) {
return "";
}
return text.slice(0, lastWordEnd).trim() + "β¦";
}
truncateAtWordBoundary("Hello world from JavaScript", 15, "en");
// "Hello worldβ¦"
truncateAtWordBoundary("δ½ ε₯½δΈηζ¬’θΏδ½Ώη¨", 9, "zh");
// "δ½ ε₯½δΈηβ¦"
This function finds the last complete word before the maximum length and adds an ellipsis, producing clean truncated text that does not cut words.