Three weeks ago I shipped a bilingual LMS. The architecture modeled one language as source and the rest as overlay: a source_locale column on every translatable entity, an MT pipeline reading source rows and writing overlay rows. It worked. Students used it. Certificates issued. Three published courses, fourteen real users, end-to-end.
Last week it broke in a way that did not fit a bug fix.
A teacher whose interface was set to English wrote a course entirely in Russian. The system saw teacher.preferred_locale = 'en', stamped source_locale = 'en' on every course field, and then served Russian students a course labelled "Russian translation of an English source" — when no English source existed, ever. The fallback path showed [translation missing] placeholders on a course that was, in fact, perfectly written in the language being requested.
The first fix attempt was a heuristic: detect the actual character set of the content and derive source_locale from that, not from the teacher's UI. Better. Still wrong, because per-entity source locale assumes the entity is monolingual — and there is no rule that says it has to be. A course with an English title and a Russian description is one entity with no single answer to "which locale is source?"







