Storia in 1 fonti

Microsoft Research's Lens proves detailed captions matter more than raw scale for training efficient image generators

Microsoft Research presents Lens, a text-to-image model with just 3.8 billion parameters that matches much larger rivals on benchmarks, at a fraction of the training cost. The secret sauce: 800 million detailed image captions generated by GPT-4.1 instead of vague web alt-text. Code and weights are openly available under an open-source license.

Raccontata da

the-decoder.com

Timeline cronologica

lunedì 8 giugno 2026·the-decoder.com
Microsoft Research's Lens proves detailed captions matter more than raw scale for training efficient image generators
Microsoft Research presents Lens, a text-to-image model with just 3.8 billion parameters that matches much larger rivals on benchmarks, at a fraction of the training cost. The…