---
title: "How do you find where to break text at character or word boundaries?"
subtitle: "Locate safe text break positions for truncation, wrapping, and cursor operations"
---

## Introduction

When you truncate text, position a cursor, or handle clicks in a text editor, you need to find where one character ends and another begins, or where words start and end. Breaking text at the wrong position splits emoji, cuts through combining characters, or divides words incorrectly.

JavaScript's `Intl.Segmenter` API provides the `containing()` method to find the text segment at any position in a string. This tells you which character or word contains a specific index, where that segment starts, and where it ends. You can use this information to find safe break points that respect grapheme cluster boundaries and linguistic word boundaries across all languages.

This article explains why breaking text at arbitrary positions fails, how to find text boundaries with `Intl.Segmenter`, and how to use boundary information for truncation, cursor positioning, and text selection.

## Why you cannot break text at any position

JavaScript strings consist of code units, not complete characters. A single emoji, accented letter, or flag can span multiple code units. If you cut a string at an arbitrary index, you risk splitting a character in the middle.

Consider this example:

```javascript
const text = "Hello 👨‍👩‍👧‍👦 world";
const truncated = text.slice(0, 10);
console.log(truncated); // "Hello 👨‍�"
```

The family emoji uses 11 code units. Cutting at position 10 splits the emoji, producing broken output with a replacement character.

For words, breaking at the wrong position creates fragments that do not match user expectations:

```javascript
const text = "Hello world";
const fragment = text.slice(0, 7);
console.log(fragment); // "Hello w"
```

Users expect text to break between words, not in the middle of a word. Finding the boundary before or after position 7 produces better results.

## Finding the text segment at a specific position

The `containing()` method returns information about the text segment that includes a specific index:

```javascript
const segmenter = new Intl.Segmenter("en", { granularity: "grapheme" });
const text = "Hello 👋🏽";
const segments = segmenter.segment(text);

const segment = segments.containing(6);
console.log(segment);
// { segment: "👋🏽", index: 6, input: "Hello 👋🏽" }
```

The emoji at position 6 spans four code units (from index 6 to 9). The `containing()` method returns:

- `segment`: the complete grapheme cluster as a string
- `index`: where this segment starts in the original string
- `input`: reference to the original string

This tells you that position 6 is inside the emoji, the emoji starts at index 6, and the complete emoji is "👋🏽".

## Finding safe truncation points for text

To truncate text without breaking characters, find the grapheme boundary before your target position:

```javascript
function truncateAtPosition(text, maxIndex) {
  const segmenter = new Intl.Segmenter("en", { granularity: "grapheme" });
  const segments = segmenter.segment(text);

  const segment = segments.containing(maxIndex);

  // Truncate before this segment to avoid breaking it
  return text.slice(0, segment.index);
}

truncateAtPosition("Hello 👨‍👩‍👧‍👦 world", 10);
// "Hello " (stops before the emoji, not in the middle)

truncateAtPosition("café", 3);
// "caf" (stops before é)
```

This function finds the segment at the target position and truncates before it, ensuring you never split a grapheme cluster.

To truncate after the segment instead of before:

```javascript
function truncateAfterPosition(text, minIndex) {
  const segmenter = new Intl.Segmenter("en", { granularity: "grapheme" });
  const segments = segmenter.segment(text);

  const segment = segments.containing(minIndex);
  const endIndex = segment.index + segment.segment.length;

  return text.slice(0, endIndex);
}

truncateAfterPosition("Hello 👨‍👩‍👧‍👦 world", 10);
// "Hello 👨‍👩‍👧‍👦 " (includes the complete emoji)
```

This includes the entire segment that contains the target position.

## Finding word boundaries for text wrapping

When wrapping text at a maximum width, you want to break between words, not in the middle of a word. Use word segmentation to find the word boundary before your target position:

```javascript
function findWordBreakBefore(text, position, locale) {
  const segmenter = new Intl.Segmenter(locale, { granularity: "word" });
  const segments = segmenter.segment(text);

  const segment = segments.containing(position);

  // If we're in a word, break before it
  if (segment.isWordLike) {
    return segment.index;
  }

  // If we're in whitespace or punctuation, break here
  return position;
}

const text = "Hello world";
findWordBreakBefore(text, 7, "en");
// 5 (the space before "world")

const textZh = "你好世界";
findWordBreakBefore(textZh, 6, "zh");
// 6 (the boundary before "世界")
```

This function finds the start of the word that contains the target position. If the position is already in whitespace, it returns the position unchanged.

For text wrapping that respects word boundaries:

```javascript
function wrapTextAtWidth(text, maxLength, locale) {
  const segmenter = new Intl.Segmenter(locale, { granularity: "word" });
  const segments = Array.from(segmenter.segment(text));

  const lines = [];
  let currentLine = "";

  for (const { segment, isWordLike } of segments) {
    const potentialLine = currentLine + segment;

    if (potentialLine.length <= maxLength) {
      currentLine = potentialLine;
    } else {
      if (currentLine) {
        lines.push(currentLine.trim());
      }
      currentLine = isWordLike ? segment : "";
    }
  }

  if (currentLine) {
    lines.push(currentLine.trim());
  }

  return lines;
}

wrapTextAtWidth("Hello world from JavaScript", 12, "en");
// ["Hello world", "from", "JavaScript"]

wrapTextAtWidth("你好世界欢迎使用", 6, "zh");
// ["你好世界", "欢迎使用"]
```

This function splits text into lines that respect word boundaries and fit within the maximum length.

## Finding which word contains a cursor position

In text editors, you need to know which word the cursor is in to implement features like double-click selection, spell checking, or contextual menus:

```javascript
function getWordAtPosition(text, position, locale) {
  const segmenter = new Intl.Segmenter(locale, { granularity: "word" });
  const segments = segmenter.segment(text);

  const segment = segments.containing(position);

  if (!segment.isWordLike) {
    return null;
  }

  return {
    word: segment.segment,
    start: segment.index,
    end: segment.index + segment.segment.length
  };
}

const text = "Hello world";
getWordAtPosition(text, 7, "en");
// { word: "world", start: 6, end: 11 }

getWordAtPosition(text, 5, "en");
// null (position 5 is the space, not a word)
```

This returns the word at the cursor position along with its start and end indices, or `null` if the cursor is not in a word.

Use this for implementing double-click text selection:

```javascript
function selectWordAtPosition(text, position, locale) {
  const wordInfo = getWordAtPosition(text, position, locale);

  if (!wordInfo) {
    return { start: position, end: position };
  }

  return { start: wordInfo.start, end: wordInfo.end };
}

selectWordAtPosition("Hello world", 7, "en");
// { start: 6, end: 11 } (selects "world")
```

## Finding sentence boundaries for navigation

For document navigation or text-to-speech segmentation, find which sentence contains a specific position:

```javascript
function getSentenceAtPosition(text, position, locale) {
  const segmenter = new Intl.Segmenter(locale, { granularity: "sentence" });
  const segments = segmenter.segment(text);

  const segment = segments.containing(position);

  return {
    sentence: segment.segment,
    start: segment.index,
    end: segment.index + segment.segment.length
  };
}

const text = "Hello world. How are you? Fine thanks.";
getSentenceAtPosition(text, 15, "en");
// { sentence: "How are you? ", start: 13, end: 26 }
```

This finds the complete sentence that contains the target position, including its boundaries.

## Finding the next boundary after a position

To move forward by one grapheme, word, or sentence, iterate through segments until you find one that starts after your current position:

```javascript
function findNextBoundary(text, position, granularity, locale) {
  const segmenter = new Intl.Segmenter(locale, { granularity });
  const segments = Array.from(segmenter.segment(text));

  for (const segment of segments) {
    if (segment.index > position) {
      return segment.index;
    }
  }

  return text.length;
}

const text = "Hello 👨‍👩‍👧‍👦 world";
findNextBoundary(text, 0, "grapheme", "en");
// 1 (boundary after "H")

findNextBoundary(text, 6, "grapheme", "en");
// 17 (boundary after the family emoji)

findNextBoundary(text, 0, "word", "en");
// 5 (boundary after "Hello")
```

This finds where the next segment begins, which is the safe position to move the cursor or truncate text.

## Finding the previous boundary before a position

To move backward by one grapheme, word, or sentence, find the segment before your current position:

```javascript
function findPreviousBoundary(text, position, granularity, locale) {
  const segmenter = new Intl.Segmenter(locale, { granularity });
  const segments = Array.from(segmenter.segment(text));

  let previousIndex = 0;

  for (const segment of segments) {
    if (segment.index >= position) {
      return previousIndex;
    }
    previousIndex = segment.index;
  }

  return previousIndex;
}

const text = "Hello 👨‍👩‍👧‍👦 world";
findPreviousBoundary(text, 17, "grapheme", "en");
// 6 (boundary before the family emoji)

findPreviousBoundary(text, 11, "word", "en");
// 6 (boundary before "world")
```

This finds where the previous segment starts, which is the safe position to move the cursor backward.

## Implementing cursor movement with boundaries

Combine boundary finding with cursor position to implement proper cursor movement:

```javascript
function moveCursorForward(text, cursorPosition, locale) {
  return findNextBoundary(text, cursorPosition, "grapheme", locale);
}

function moveCursorBackward(text, cursorPosition, locale) {
  return findPreviousBoundary(text, cursorPosition, "grapheme", locale);
}

function moveWordForward(text, cursorPosition, locale) {
  const segmenter = new Intl.Segmenter(locale, { granularity: "word" });
  const segments = Array.from(segmenter.segment(text));

  for (const segment of segments) {
    if (segment.index > cursorPosition && segment.isWordLike) {
      return segment.index;
    }
  }

  return text.length;
}

function moveWordBackward(text, cursorPosition, locale) {
  const segmenter = new Intl.Segmenter(locale, { granularity: "word" });
  const segments = Array.from(segmenter.segment(text));

  let previousWordIndex = 0;

  for (const segment of segments) {
    if (segment.index >= cursorPosition) {
      return previousWordIndex;
    }
    if (segment.isWordLike) {
      previousWordIndex = segment.index;
    }
  }

  return previousWordIndex;
}

const text = "Hello 👨‍👩‍👧‍👦 world";
moveCursorForward(text, 6, "en");
// 17 (moves over the entire emoji)

moveWordForward(text, 0, "en");
// 6 (moves to the start of "world")
```

These functions implement standard text editor cursor movement that respects grapheme and word boundaries.

## Finding all break opportunities in text

To find every position where you can safely break text, iterate through all segments and collect their start indices:

```javascript
function getBreakOpportunities(text, granularity, locale) {
  const segmenter = new Intl.Segmenter(locale, { granularity });
  const segments = Array.from(segmenter.segment(text));

  return segments.map(segment => segment.index);
}

const text = "Hello 👨‍👩‍👧‍👦 world";
getBreakOpportunities(text, "grapheme", "en");
// [0, 1, 2, 3, 4, 5, 6, 17, 18, 19, 20, 21, 22]

getBreakOpportunities(text, "word", "en");
// [0, 5, 6, 17, 18, 22]
```

This returns an array of every valid break position in the text. Use this for implementing advanced text layout or analysis features.

## Handling edge cases with boundaries

When the position is at the very end of the text, `containing()` returns the last segment:

```javascript
const segmenter = new Intl.Segmenter("en", { granularity: "grapheme" });
const text = "Hello";
const segments = segmenter.segment(text);

const segment = segments.containing(5);
console.log(segment);
// { segment: "o", index: 4, input: "Hello" }
```

The position is at the end, so it returns the last grapheme.

When the position is before the first character, `containing()` returns the first segment:

```javascript
const segment = segments.containing(0);
console.log(segment);
// { segment: "H", index: 0, input: "Hello" }
```

For empty strings, there are no segments, so calling `containing()` on an empty string returns `undefined`. Check for empty strings before using `containing()`:

```javascript
function safeContaining(text, position, granularity, locale) {
  if (text.length === 0) {
    return null;
  }

  const segmenter = new Intl.Segmenter(locale, { granularity });
  const segments = segmenter.segment(text);
  return segments.containing(position);
}
```

## Choosing the right granularity for boundaries

Use different granularities based on what you need to find:

- **Grapheme**: Use when implementing cursor movement, character deletion, or any operation that needs to respect what users see as single characters. This prevents splitting emoji, combining characters, or other complex grapheme clusters.

- **Word**: Use for word selection, spell checking, word count, or any operation that needs linguistic word boundaries. This works across languages, including those without spaces between words.

- **Sentence**: Use for sentence navigation, text-to-speech segmentation, or any operation that processes text sentence by sentence. This respects abbreviations and other contexts where periods do not end sentences.

Do not use word boundaries when you need character boundaries, and do not use grapheme boundaries when you need word boundaries. Each serves a specific purpose.

## Browser support for boundary operations

The `Intl.Segmenter` API and its `containing()` method reached Baseline status in April 2024. Current versions of Chrome, Firefox, Safari, and Edge support it. Older browsers do not.

Check for support before using:

```javascript
if (typeof Intl.Segmenter !== "undefined") {
  const segmenter = new Intl.Segmenter("en", { granularity: "word" });
  const segments = segmenter.segment(text);
  const segment = segments.containing(position);
  // Use segment information
} else {
  // Fallback for older browsers
  // Use approximate boundaries based on string length
}
```

For applications targeting older browsers, provide fallback behavior using approximate boundaries, or use a polyfill that implements the `Intl.Segmenter` API.

## Common mistakes when finding boundaries

Do not assume every code unit is a valid break point. Many positions split grapheme clusters or words, producing invalid or unexpected results.

Do not use `string.length` to find the end boundary. Use the last segment's index plus its length instead.

Do not forget to check `isWordLike` when working with word boundaries. Non-word segments like spaces and punctuation are also returned by the segmenter.

Do not assume word boundaries are the same across languages. Use locale-aware segmentation for correct results.

Do not call `containing()` repeatedly for performance-critical operations. If you need multiple boundaries, iterate through segments once and build an index.

## Performance considerations for boundary operations

Creating a segmenter is fast, but iterating through all segments can be slow for very long text. For operations that need multiple boundaries, consider caching segment information:

```javascript
class TextBoundaryCache {
  constructor(text, granularity, locale) {
    this.text = text;
    const segmenter = new Intl.Segmenter(locale, { granularity });
    this.segments = Array.from(segmenter.segment(text));
  }

  containing(position) {
    for (const segment of this.segments) {
      const end = segment.index + segment.segment.length;
      if (position >= segment.index && position < end) {
        return segment;
      }
    }
    return this.segments[this.segments.length - 1];
  }

  nextBoundary(position) {
    for (const segment of this.segments) {
      if (segment.index > position) {
        return segment.index;
      }
    }
    return this.text.length;
  }

  previousBoundary(position) {
    let previous = 0;
    for (const segment of this.segments) {
      if (segment.index >= position) {
        return previous;
      }
      previous = segment.index;
    }
    return previous;
  }
}

const cache = new TextBoundaryCache("Hello world", "grapheme", "en");
cache.containing(7);
cache.nextBoundary(7);
cache.previousBoundary(7);
```

This caches all segments once and provides fast lookups for multiple operations.

## Practical example: text truncation with ellipsis

Combine boundary finding with truncation to build a function that cuts text at the last complete word before a maximum length:

```javascript
function truncateAtWordBoundary(text, maxLength, locale) {
  if (text.length <= maxLength) {
    return text;
  }

  const segmenter = new Intl.Segmenter(locale, { granularity: "word" });
  const segments = Array.from(segmenter.segment(text));

  let lastWordEnd = 0;

  for (const segment of segments) {
    const segmentEnd = segment.index + segment.segment.length;

    if (segmentEnd > maxLength) {
      break;
    }

    if (segment.isWordLike) {
      lastWordEnd = segmentEnd;
    }
  }

  if (lastWordEnd === 0) {
    return "";
  }

  return text.slice(0, lastWordEnd).trim() + "…";
}

truncateAtWordBoundary("Hello world from JavaScript", 15, "en");
// "Hello world…"

truncateAtWordBoundary("你好世界欢迎使用", 9, "zh");
// "你好世界…"
```

This function finds the last complete word before the maximum length and adds an ellipsis, producing clean truncated text that does not cut words.