Compare Engines

Test the same content through two engine configurations to evaluate a change before committing.

The workflow#

"Compare our production engine against the staging engine on these 5 strings for Japanese"

What happens:

The assistant localizes the content through both engines
Presents results in a side-by-side table
Highlights differences: "The staging engine applies the new glossary term for 'onboarding' (オンボーディング) while production still uses the descriptive localization (導入手続き)"

"Localize 'Welcome to your new workspace' to German through engine A and engine B"

Shows whether the glossary entry for "workspace" is being preserved in the updated engine.

"I switched the Japanese model from GPT-4.1 to Claude Sonnet. Compare outputs for these 10 UI strings."

Side-by-side reveals which model handles short UI strings vs. longer descriptions better for your specific domain.

"Compare the engine with our full 200-term glossary against a fresh engine with no glossary on these legal strings"

Quantifies how much the glossary contributes to output quality for a specific content type.

Was this page helpful?