When a localization comes out wrong, the MCP server gives your AI assistant access to the full observability stack - request logs, scorer verdicts, glossary matching reports, and instruction review results. Debug quality without leaving the conversation.
Request logs#
Every localization request produces a log entry with the full execution context: which model handled it, input and output tokens, duration, whether a fallback was triggered, and the complete input/output data.
"Show me the last request log for the German engine"
The assistant retrieves the log and can answer follow-up questions: "Did it use the fallback model?" "How many tokens did it consume?" "What was the raw output?"
What each log contains#
| Field | What it tells you |
|---|---|
| Provider / model | Which LLM handled the request |
| Input / output data | Exact input sent and localization received |
| Input / output tokens | Token consumption |
| Duration | Processing time in milliseconds |
| Used fallback | Whether the primary model failed and fallback kicked in |
| Status | success, error, or in_progress |
| Error text | Error detail when status is error |
| Trigger type | Whether the request came from API, CLI, CI, playground, or integration |
AI Reviewer verdicts#
Each request log links to scorer run logs - the independent AI Reviewer evaluations that ran after the localization was produced.
"Did the last German localization pass all scorers?"
The assistant retrieves scorer run logs for a given request and reports each scorer's verdict: pass/fail (boolean scorers) or percentage score, along with the reasoning the reviewer produced.
Scorer run log fields#
| Field | What it tells you |
|---|---|
| Scorer name | Which AI Reviewer ran |
| Scorer type | boolean (pass/fail) or percentage (0-100) |
| Score result | The verdict and reasoning |
| Provider / model | Which model performed the review |
| Duration | How long the review took |
Glossary compliance#
"Were all glossary terms applied correctly in that localization?"
The assistant retrieves the glossary review log for a request, showing each matched glossary term, whether it was applied, and the reasoning if it wasn't.
The report includes:
- Each source term matched
- The expected target localization
- Whether the term is a custom localization or non-translatable
- Applied or not applied per term
- Reasoning when a term wasn't applied
- Overall compliance rate
Instruction adherence#
"Did the French localization follow the non-breaking space instruction?"
The assistant retrieves instruction review logs - one entry per instruction that was evaluated against the localization output. Each shows the instruction name, the rule text, and a pass/fail verdict with reasoning.
The debugging workflow#
A typical post-mortem conversation:
- "The German localization of 'checkout flow' looks wrong"
- "Show me the request log for that" - see what went in and came out
- "Did the glossary apply?" - check if 'checkout' was matched and preserved
- "What did the scorers say?" - see if any AI Reviewer flagged it
- "The glossary term wasn't matched - update it to also cover 'checkout flow'" - fix the root cause
The entire loop happens in one conversation, without opening the dashboard.
