|
Documentation
Book a DemoPlatform
PlatformMCP
CLIAPIWorkflows
GuidesChangelog

Getting Started

  • How it works
  • Setup
  • Capabilities

Workflows

  • Create engine
  • Import glossary
  • Localize content
  • Inspect requests
  • Investigate
  • Tune engine
  • Spot-check
  • Compare engines
  • Add locale

Import from Legacy Vendors

Max PrilutskiyMax Prilutskiy·Updated 1 day ago·3 min read

When migrating from a legacy localization vendor or TMS, you likely have glossaries, term bases, or translation memory exports sitting in TMX, CSV, or TBX files. Your AI assistant can parse these and seed your localization engine's configuration directly.

The workflow#

"Here's our glossary export (CSV). Import it into our engine's glossary for all locales."

What happens:

  1. The assistant reads the CSV structure — identifies source term, target localization, locale, and term type columns
  2. Maps each row to a glossary entry: source text, target text, locale pair, and whether it's a custom localization or non-translatable term
  3. Shows the import plan: "Found 147 terms across 6 locales. 12 are marked do-not-translate, 135 are enforced localizations."
  4. On approval, creates all glossary entries via the MCP
  5. Reports: "147 glossary entries created. 3 duplicates skipped."

Supported formats#

FormatWhat it containsHow to provide it
CSV / TSVTerm bases, glossaries, simple bilingual listsPaste the content or describe the file structure
TMXTranslation memory — source/target segment pairs with metadataPaste a representative sample or describe the structure
TBXTerminology databases — structured term entries with definitionsPaste the content or describe the schema
Excel exportsVendor-specific glossary or style guide exportsDescribe the columns and paste representative rows

Step-by-step: TMX import#

TMX files from legacy vendors contain segment pairs that can seed both glossary entries and instructions.

"Here's a TMX export from our previous vendor. It has 500 translation units for en → de. Extract any recurring terminology as glossary entries."

What happens:

  1. The assistant parses the TMX structure — identifies source segments, target segments, locale pairs
  2. Groups recurring terms — words or phrases that appear 3+ times with consistent localizations
  3. Proposes glossary entries for terms with stable localizations: "'privacy policy' → 'Datenschutzerklärung' (appears 12 times, always localized this way)"
  4. Identifies patterns that should become instructions: "Compound nouns are always hyphenated in this corpus — add as instruction for de?"
  5. Shows the full plan for review
  6. Applies on approval

Step-by-step: CSV glossary import#

Most legacy localization platforms export glossaries as CSV with columns for source, target, locale, and notes.

"Import this CSV into our engine. Columns: source_term, target_term, locale, type (localize/do-not-translate), notes."

What happens:

  1. The assistant reads the column mapping
  2. Creates glossary entries: localize rows become custom localizations, do-not-translate rows become non-translatable entries
  3. Entries with notes that describe rules (not just definitions) are flagged as potential instructions: "The note for 'date format' says 'Always use DD.MM.YYYY in German' — add as instruction for de?"
  4. Shows the plan, applies on approval

What to import vs. what to leave behind#

Import as glossaryImport as instructionSkip
Brand names (non-translatable)Formatting rules (date, number, currency)Fuzzy TM matches below 95%
Product terminology (enforced localizations)Punctuation conventionsContext-dependent segment pairs
Legal terms (enforced localizations)Register/formality rulesOne-off localizations that aren't terminology
UI labels with mandated localizationsCapitalization rulesSegments longer than 2-3 sentences

After import#

  1. Verify — run a localization test with content that uses the imported terms
  2. Review — spot-check a batch against the new glossary to confirm enforcement
  3. Tune — adjust entries that don't produce the right output in context

Tips for large imports#

  • Start with high-frequency terms. A 5,000-entry TM export isn't a glossary — it's a corpus. Ask the assistant to extract only terms that appear 3+ times.
  • Import in batches by locale. Easier to review 50 German terms than 500 terms across 10 locales.
  • Use the notes column. If your export has translator notes, the assistant can convert patterns into instructions.
  • Don't import sentence-level TM as glossary. Glossary entries are terms and short phrases. Full sentences belong in reference material, not the glossary.

Was this page helpful?