One of the First Public HiDream-O1-Image LoRAs — and How to Train Your Own

TL;DR

HiDream-O1-Image is one of the strongest open-weight text-to-image models out right now (it debuted around #8 in the Artificial Analysis T2I Arena). But it shipped inference-only, and because its architecture is radically different from SDXL/Flux — no VAE, no separate text encoder, everything is one unified transformer — the usual LoRA trainers can't touch it.

This post is one of the first publicly documented LoRA training runs and general-purpose visual-enhancement LoRAs for HiDream-O1-Image. I'll show why the standard trainers (kohya, ai-toolkit, SimpleTuner) don't fit, how I reverse-engineered a working training loop from the inference code alone, and the ~150-line trainer that produces a clean aesthetic LoRA. Plus the gotchas that cost me a night.

What this LoRA is: a general-purpose anime / semi-real visual enhancement LoRA — it improves rendering quality, lighting, and stylization across diverse subjects with a trigger phrase. It's not a character LoRA, not a single-style LoRA, and not a model-distillation artifact.

The short version of the recipe:

TL;DR

The short version of the recipe:

One of the First Public HiDream-O1-Image LoRAs — and How to Train Your Own

One of the First Public HiDream-O1-Image LoRAs — and How to Train Your Own

Other newsrooms on this story

Related reading

LoRA: I Trained <1% of a 1.5B Model and Matched a Full Fine-Tune

How to Train and Use a Custom LoRA Without Setting Up a Local GPU

Introducing LoRA: A faster way to fine-tune Stable Diffusion – Replicate blog

Run 30,000+ LoRAs on Hugging Face with Replicate – Replicate blog

We trained a personal voice DoRA on Qwen3-8B for $1.50 — beat stock model 100%…

Training Design for Text-to-Image Models: Lessons from Ablations

Other newsrooms on this story

Related reading

LoRA: I Trained <1% of a 1.5B Model and Matched a Full Fine-Tune

How to Train and Use a Custom LoRA Without Setting Up a Local GPU

Introducing LoRA: A faster way to fine-tune Stable Diffusion – Replicate blog

Run 30,000+ LoRAs on Hugging Face with Replicate – Replicate blog

We trained a personal voice DoRA on Qwen3-8B for $1.50 — beat stock model 100%…

Training Design for Text-to-Image Models: Lessons from Ablations