For any app past a certain size that's gone bilingual, the question "how much hardcoded Japanese is still hiding in our repo?" never quite goes away. A naive grep for [ぁ-んァ-ヶ一-龯] returns thousands of hits, and the vast majority are inside translation tables, already-branched code, or comments. The real leaks are buried.

For one cleanup pass we attacked this with four parallel AI investigation agents plus AST-based false-positive filtering. The result: ~300 candidates detected → ~60 real leaks → cleaned up across five rounds. This post walks through the flow and the most interesting bug it uncovered — paying English users had been getting Japanese email from the Stripe webhook for months.

Why a plain grep isn't enough

A repository-wide grep returns thousands of hits, but the contents fall into four bins: translation tables / already branched by lang == 'en' / comments and docstrings / real leaks. The first three are harmless. Only the last shows Japanese to English users. The trouble is that grep can't separate them, and the volume is too high for a human to triage one by one.

Four parallel agents for "wide and shallow" detection