Import from Legacy Vendors

When migrating from a legacy localization vendor or TMS, you likely have glossaries, term bases, or translation memory exports sitting in TMX, CSV, or TBX files. Your AI assistant can parse these and seed your localization engine's configuration directly.

The workflow#

"Here's our glossary export (CSV). Import it into our engine's glossary for all locales."

What happens:

The assistant reads the CSV structure — identifies source term, target localization, locale, and term type columns
Maps each row to a glossary entry: source text, target text, locale pair, and whether it's a custom localization or non-translatable term
Shows the import plan: "Found 147 terms across 6 locales. 12 are marked do-not-translate, 135 are enforced localizations."
On approval, creates all glossary entries via the MCP
Reports: "147 glossary entries created. 3 duplicates skipped."

Supported formats#

Format	What it contains	How to provide it
CSV / TSV	Term bases, glossaries, simple bilingual lists	Paste the content or describe the file structure
TMX	Translation memory — source/target segment pairs with metadata	Paste a representative sample or describe the structure
TBX	Terminology databases — structured term entries with definitions	Paste the content or describe the schema
Excel exports	Vendor-specific glossary or style guide exports	Describe the columns and paste representative rows

Step-by-step: TMX import#

TMX files from legacy vendors contain segment pairs that can seed both glossary entries and instructions.

"Here's a TMX export from our previous vendor. It has 500 translation units for en → de. Extract any recurring terminology as glossary entries."

What happens:

The assistant parses the TMX structure — identifies source segments, target segments, locale pairs
Groups recurring terms — words or phrases that appear 3+ times with consistent localizations
Proposes glossary entries for terms with stable localizations: "'privacy policy' → 'Datenschutzerklärung' (appears 12 times, always localized this way)"
Identifies patterns that should become instructions: "Compound nouns are always hyphenated in this corpus — add as instruction for de?"
Shows the full plan for review
Applies on approval

Step-by-step: CSV glossary import#

Most legacy localization platforms export glossaries as CSV with columns for source, target, locale, and notes.

"Import this CSV into our engine. Columns: source_term, target_term, locale, type (localize/do-not-translate), notes."

What happens:

The assistant reads the column mapping
Creates glossary entries: localize rows become custom localizations, do-not-translate rows become non-translatable entries
Entries with notes that describe rules (not just definitions) are flagged as potential instructions: "The note for 'date format' says 'Always use DD.MM.YYYY in German' — add as instruction for de?"
Shows the plan, applies on approval

What to import vs. what to leave behind#

Import as glossary	Import as instruction	Skip
Brand names (non-translatable)	Formatting rules (date, number, currency)	Fuzzy TM matches below 95%
Product terminology (enforced localizations)	Punctuation conventions	Context-dependent segment pairs
Legal terms (enforced localizations)	Register/formality rules	One-off localizations that aren't terminology
UI labels with mandated localizations	Capitalization rules	Segments longer than 2-3 sentences

After import#

Verify — run a localization test with content that uses the imported terms
Review — spot-check a batch against the new glossary to confirm enforcement
Tune — adjust entries that don't produce the right output in context

Tips for large imports#

Start with high-frequency terms. A 5,000-entry TM export isn't a glossary — it's a corpus. Ask the assistant to extract only terms that appear 3+ times.
Import in batches by locale. Easier to review 50 German terms than 500 terms across 10 locales.
Use the notes column. If your export has translator notes, the assistant can convert patterns into instructions.
Don't import sentence-level TM as glossary. Glossary entries are terms and short phrases. Full sentences belong in reference material, not the glossary.

The workflow#

"Here's our glossary export (CSV). Import it into our engine's glossary for all locales."

What happens:

The assistant reads the CSV structure — identifies source term, target localization, locale, and term type columns
Maps each row to a glossary entry: source text, target text, locale pair, and whether it's a custom localization or non-translatable term
Shows the import plan: "Found 147 terms across 6 locales. 12 are marked do-not-translate, 135 are enforced localizations."
On approval, creates all glossary entries via the MCP
Reports: "147 glossary entries created. 3 duplicates skipped."

Supported formats#

Format	What it contains	How to provide it
CSV / TSV	Term bases, glossaries, simple bilingual lists	Paste the content or describe the file structure
TMX	Translation memory — source/target segment pairs with metadata	Paste a representative sample or describe the structure
TBX	Terminology databases — structured term entries with definitions	Paste the content or describe the schema
Excel exports	Vendor-specific glossary or style guide exports	Describe the columns and paste representative rows

Step-by-step: TMX import#

TMX files from legacy vendors contain segment pairs that can seed both glossary entries and instructions.

"Here's a TMX export from our previous vendor. It has 500 translation units for en → de. Extract any recurring terminology as glossary entries."

What happens:

The assistant parses the TMX structure — identifies source segments, target segments, locale pairs
Groups recurring terms — words or phrases that appear 3+ times with consistent localizations
Proposes glossary entries for terms with stable localizations: "'privacy policy' → 'Datenschutzerklärung' (appears 12 times, always localized this way)"
Identifies patterns that should become instructions: "Compound nouns are always hyphenated in this corpus — add as instruction for de?"
Shows the full plan for review
Applies on approval

Step-by-step: CSV glossary import#

Most legacy localization platforms export glossaries as CSV with columns for source, target, locale, and notes.

"Import this CSV into our engine. Columns: source_term, target_term, locale, type (localize/do-not-translate), notes."

What happens:

The assistant reads the column mapping
Creates glossary entries: localize rows become custom localizations, do-not-translate rows become non-translatable entries
Entries with notes that describe rules (not just definitions) are flagged as potential instructions: "The note for 'date format' says 'Always use DD.MM.YYYY in German' — add as instruction for de?"
Shows the plan, applies on approval

What to import vs. what to leave behind#

Import as glossary	Import as instruction	Skip
Brand names (non-translatable)	Formatting rules (date, number, currency)	Fuzzy TM matches below 95%
Product terminology (enforced localizations)	Punctuation conventions	Context-dependent segment pairs
Legal terms (enforced localizations)	Register/formality rules	One-off localizations that aren't terminology
UI labels with mandated localizations	Capitalization rules	Segments longer than 2-3 sentences

After import#

Verify — run a localization test with content that uses the imported terms
Review — spot-check a batch against the new glossary to confirm enforcement
Tune — adjust entries that don't produce the right output in context

Tips for large imports#

Start with high-frequency terms. A 5,000-entry TM export isn't a glossary — it's a corpus. Ask the assistant to extract only terms that appear 3+ times.
Import in batches by locale. Easier to review 50 German terms than 500 terms across 10 locales.
Use the notes column. If your export has translator notes, the assistant can convert patterns into instructions.
Don't import sentence-level TM as glossary. Glossary entries are terms and short phrases. Full sentences belong in reference material, not the glossary.