TL;DR
HiDream-O1-Image is one of the strongest open-weight text-to-image models out right now (it debuted around #8 in the Artificial Analysis T2I Arena). But it shipped inference-only, and because its architecture is radically different from SDXL/Flux — no VAE, no separate text encoder, everything is one unified transformer — the usual LoRA trainers can't touch it.
This post is one of the first publicly documented LoRA training runs and general-purpose visual-enhancement LoRAs for HiDream-O1-Image. I'll show why the standard trainers (kohya, ai-toolkit, SimpleTuner) don't fit, how I reverse-engineered a working training loop from the inference code alone, and the ~150-line trainer that produces a clean aesthetic LoRA. Plus the gotchas that cost me a night.
What this LoRA is: a general-purpose anime / semi-real visual enhancement LoRA — it improves rendering quality, lighting, and stylization across diverse subjects with a trigger phrase. It's not a character LoRA, not a single-style LoRA, and not a model-distillation artifact.
The short version of the recipe:










