Research on how input/output language combinations affect AI (Claude) reasoning quality, detail coverage, and thinking patterns.
Date: 2026-03-03 Model: Claude Opus 4.6 Test scale: 36+ subagents across 5 rounds
- Input language has negligible impact on reasoning quality
- Output language affects detail coverage (not correctness) — English output scores ~8 points higher on average
- All 36+ test agents reached identical core conclusions regardless of language
- "Think in English" CLAUDE.md instruction consistently improves Cantonese output by +3-5 points (validated across 5 diverse topics)
- The improvement is strongest for reasoning/judgment tasks (+4-5) and smallest for terminology-heavy tasks (+1-2)
report.md— Full research report with tables and findingsclaude-md-config.md— Optimized CLAUDE.md language settingstest-results/— Raw test outputs from all rounds