When migrating from a legacy localization vendor or TMS, you likely have glossaries, term bases, or translation memory exports sitting in TMX, CSV, or TBX files. Your AI assistant can parse these and seed your localization engine's configuration directly.
The workflow#
"Here's our glossary export (CSV). Import it into our engine's glossary for all locales."
What happens:
- The assistant reads the CSV structure — identifies source term, target localization, locale, and term type columns
- Maps each row to a glossary entry: source text, target text, locale pair, and whether it's a custom localization or non-translatable term
- Shows the import plan: "Found 147 terms across 6 locales. 12 are marked do-not-translate, 135 are enforced localizations."
- On approval, creates all glossary entries via the MCP
- Reports: "147 glossary entries created. 3 duplicates skipped."
Supported formats#
| Format | What it contains | How to provide it |
|---|---|---|
| CSV / TSV | Term bases, glossaries, simple bilingual lists | Paste the content or describe the file structure |
| TMX | Translation memory — source/target segment pairs with metadata | Paste a representative sample or describe the structure |
| TBX | Terminology databases — structured term entries with definitions | Paste the content or describe the schema |
| Excel exports | Vendor-specific glossary or style guide exports | Describe the columns and paste representative rows |
Step-by-step: TMX import#
TMX files from legacy vendors contain segment pairs that can seed both glossary entries and instructions.
"Here's a TMX export from our previous vendor. It has 500 translation units for en → de. Extract any recurring terminology as glossary entries."
What happens:
- The assistant parses the TMX structure — identifies source segments, target segments, locale pairs
- Groups recurring terms — words or phrases that appear 3+ times with consistent localizations
- Proposes glossary entries for terms with stable localizations: "'privacy policy' → 'Datenschutzerklärung' (appears 12 times, always localized this way)"
- Identifies patterns that should become instructions: "Compound nouns are always hyphenated in this corpus — add as instruction for
de?" - Shows the full plan for review
- Applies on approval
Step-by-step: CSV glossary import#
Most legacy localization platforms export glossaries as CSV with columns for source, target, locale, and notes.
"Import this CSV into our engine. Columns: source_term, target_term, locale, type (localize/do-not-translate), notes."
What happens:
- The assistant reads the column mapping
- Creates glossary entries:
localizerows become custom localizations,do-not-translaterows become non-translatable entries - Entries with notes that describe rules (not just definitions) are flagged as potential instructions: "The note for 'date format' says 'Always use DD.MM.YYYY in German' — add as instruction for
de?" - Shows the plan, applies on approval
What to import vs. what to leave behind#
| Import as glossary | Import as instruction | Skip |
|---|---|---|
| Brand names (non-translatable) | Formatting rules (date, number, currency) | Fuzzy TM matches below 95% |
| Product terminology (enforced localizations) | Punctuation conventions | Context-dependent segment pairs |
| Legal terms (enforced localizations) | Register/formality rules | One-off localizations that aren't terminology |
| UI labels with mandated localizations | Capitalization rules | Segments longer than 2-3 sentences |
After import#
- Verify — run a localization test with content that uses the imported terms
- Review — spot-check a batch against the new glossary to confirm enforcement
- Tune — adjust entries that don't produce the right output in context
Tips for large imports#
- Start with high-frequency terms. A 5,000-entry TM export isn't a glossary — it's a corpus. Ask the assistant to extract only terms that appear 3+ times.
- Import in batches by locale. Easier to review 50 German terms than 500 terms across 10 locales.
- Use the notes column. If your export has translator notes, the assistant can convert patterns into instructions.
- Don't import sentence-level TM as glossary. Glossary entries are terms and short phrases. Full sentences belong in reference material, not the glossary.
