There is no source language: a manifesto for symmetric multilingual content

Three weeks ago I shipped a bilingual LMS. The architecture modeled one language as source and the rest as overlay: a source_locale column on every translatable entity, an MT pipeline reading source rows and writing overlay rows. It worked. Students used it. Certificates issued. Three published courses, fourteen real users, end-to-end.

Last week it broke in a way that did not fit a bug fix.

A teacher whose interface was set to English wrote a course entirely in Russian. The system saw teacher.preferred_locale = 'en', stamped source_locale = 'en' on every course field, and then served Russian students a course labelled "Russian translation of an English source" — when no English source existed, ever. The fallback path showed [translation missing] placeholders on a course that was, in fact, perfectly written in the language being requested.

The first fix attempt was a heuristic: detect the actual character set of the content and derive source_locale from that, not from the teacher's UI. Better. Still wrong, because per-entity source locale assumes the entity is monolingual — and there is no rule that says it has to be. A course with an English title and a Russian description is one entity with no single answer to "which locale is source?"

There is no source language: a manifesto for symmetric multilingual content

Other newsrooms on this story

Related reading

Multilanguage good practices.

I built a zero-dependency CLI that catches i18n drift (and knows your plural…

Equip: an open-source LMS for Bible schools — bilingual, scripture-aware, 4…

How We Translate Entire Books with LLMs Without Losing Context

I added a language switcher to my SaaS core, and the boring feature had two…

Accelerating researchers and developers building multilingual AI with a new…