Reports give you visibility into how your localization engines are performing - translation volume, token usage, locale coverage, glossary depth, codebase change rates, and AI-reviewer quality metrics. All reports are scoped to your organization and update automatically as requests flow through the engine.
Available reports#
| Report | What it measures |
|---|---|
| Word Generations | Words translated per day |
| Token Consumption | Input and output tokens used per day |
| Top Locales | Which locales consume the most resources |
| Glossary Depth | How many glossary terms exist per locale |
| Change Rate | Localization file changes in GitHub by locale |
| Average Scores | Daily average translation scores from AI reviewers |
| Terminology Coverage | How consistently glossary terms are applied in translations |
| Instruction Adherence | How consistently custom instructions are followed in translations |
Word Generations#
Tracks the total word count processed by the localization engine, aggregated by day. Use this to understand translation volume trends and plan capacity.
Filters: engine, period (month), source locale, target locale
The chart displays one bar per day for the selected month. Days with no translation activity show zero.
Token Consumption#
Monitors LLM token usage broken down into input tokens and output tokens, aggregated by day. Token consumption directly reflects cost - use this report to identify cost spikes and compare efficiency across engines or locale pairs.
Filters: engine, period (month), source locale, target locale
Input vs. output tokens
Input tokens include the system prompt, glossary, brand voice, instructions, and the source text. Output tokens are the translated result. A high input-to-output ratio may indicate that the engine's context (glossary, instructions) is large relative to the translated content.
Top Locales#
Ranks locales by resource consumption - helping you identify which languages drive the most translation volume and cost. You can view rankings by source locale or target locale, and measure by input tokens, output tokens, or word count.
Filters: engine, period (month), locale type (source or target), metric (input tokens, output tokens, or word count)
This report answers questions like: "Which target locale uses the most tokens?" or "Which source language generates the most words?"
Glossary Depth#
Shows how many glossary items exist per locale across your engines. Unlike other reports, this is a current snapshot - not time-series data - reflecting the present state of your glossary configuration.
Filters: engine, locale type (source or target)
Use this to identify gaps: if your engine translates into 12 locales but only 3 have glossary entries, the uncovered locales rely entirely on the model's judgment for terminology.
Change Rate#
Tracks the rate of localization file changes in your connected GitHub repositories, broken down by locale and day. This report requires an active GitHub integration - you'll be prompted to connect GitHub if it isn't set up.
Filters: period (month), repository, locale
The change rate report helps answer: "How actively is each locale being updated?" and "Which repositories generate the most localization changes?"
Timezone support
Date grouping respects your organization's configured timezone. A commit at 23:30 UTC appears on the correct local date, not shifted to the next day.
Average Scores#
Plots daily average translation scores from your AI reviewers, as a percentage. Use this to track quality trends over time and spot regressions after engine, model, or glossary changes.
Filters: engine, period (month), view (aggregated or breakdown)
When viewing a single engine, each line represents one scorer attached to that engine. When viewing across all engines, choose Aggregated for a single line averaging every scorer across every engine, or Breakdown to compare scorers side by side.
Requires AI reviewers
This report only has data once at least one AI reviewer is configured and scoring translations.
Terminology Coverage#
Tracks how consistently glossary terms are applied correctly in translations each day. The line shows daily coverage percentage (correctly applied terms ÷ total relevant terms); the bars show the absolute number of terms applied. Hovering reveals the applied/total breakdown and the number of reviews behind each data point.
Filters: engine, period (month)
A high terms-applied count with a falling coverage rate signals that glossary terms are being missed or mistranslated more often as volume scales - a useful early warning that the glossary or engine instructions need attention.
Instruction Adherence#
Tracks how consistently the engine's custom instructions are followed in translations each day. The percentage is calculated only across reviews where instructions were actually relevant - tooltips show followed / relevant so you can see both the rate and the sample size.
Filters: engine, period (month)
Use this to verify that newly added instructions actually change behavior, and to catch regressions where the engine starts ignoring rules after a model swap or prompt change.
Filtering and periods#
All time-based reports operate on monthly periods. The default is the current month. Filters are preserved in the URL, so filtered views are shareable and bookmarkable.
Common filters across reports:
| Filter | Available in | Description |
|---|---|---|
| Engine | Word Gen, Token, Top Locales, Glossary Depth, Average Scores, Terminology Coverage, Instruction Adherence | Narrow to a specific engine or view all |
| Period | Word Gen, Token, Top Locales, Change Rate, Average Scores, Terminology Coverage, Instruction Adherence | Select month (YYYY-MM) |
| Source locale | Word Gen, Token | Filter by source language |
| Target locale | Word Gen, Token | Filter by target language |
| Repository | Change Rate | Filter by GitHub repo |
| View | Average Scores | Aggregated single line, or per-scorer breakdown |
Quality vs. volume#
Reports split into two complementary views: volume and cost (Word Generations, Token Consumption, Top Locales, Glossary Depth, Change Rate) and quality (Average Scores, Terminology Coverage, Instruction Adherence). Quality reports require at least one configured AI reviewer.
