Scaling NVFP4 Inference for FLUX.2 on NVIDIA Blackwell Data Center GPUs | NVIDIA Technical Blog

In 2025, NVIDIA partnered with Black Forest Labs (BFL) to optimize the FLUX.1 text-to-image model series, unlocking FP4 image generation performance on NVIDIA Blackwell GeForce RTX 50 Series GPUs.

As a natural extension of the latent diffusion model, FLUX.1 Kontext [dev] proved that in-context learning is a feasible technique for visual-generation models, not just large language models (LLMs). To make this experience more widely accessible, NVIDIA collaborated with BFL to enable a near real-time editing experience using low-precision quantization.

FLUX.2 is a significant leap forward, offering the public multi-image references and quality comparable to the best enterprise models. However, because FLUX.2 [dev] requires substantial compute resources, BFL, Comfy, and NVIDIA collaborated to achieve a major breakthrough: reducing the FLUX.2 [dev] memory requirement by more than 40% and enabling local deployment through ComfyUI. This optimization, using FP8 precision, has made FLUX.2 [dev] one of the most popular models in the image-generation space.

With FLUX.2 [dev] established as the gold standard for open weight models, the NVIDIA team, in collaboration with BFL, is now excited to share the next leap in performance: 4-bit acceleration for FLUX.2 [dev] on the most powerful data center NVIDIA Blackwell GPUs, including NVIDIA DGX B200 and NVIDIA DGX B300.

In 2025, NVIDIA partnered with Black Forest Labs (BFL) to optimize the FLUX.1 text-to-image model series, unlocking FP4 image generation performance on NVIDIA Blackwell GeForce RTX 50 Series GPUs.

Scaling NVFP4 Inference for FLUX.2 on NVIDIA Blackwell Data Center GPUs | NVIDIA Technical Blog

Scaling NVFP4 Inference for FLUX.2 on NVIDIA Blackwell Data Center GPUs | NVIDIA Technical Blog

Related reading

FLUX.1 Kontext enables in-context image generation for enterprise AI pipelines

FLUX.2: Multi-reference image generation now available on Together AI

NVIDIA Introduces a 4-Bit Pretraining Methodology Using NVFP4, Validated on a…

FLUX.1 Tools – Control and steerability for FLUX – Replicate blog

FLUX1.1 [pro] is here – Replicate blog

Run FLUX with an API – Replicate blog

Related reading

FLUX.1 Kontext enables in-context image generation for enterprise AI pipelines

FLUX.2: Multi-reference image generation now available on Together AI

NVIDIA Introduces a 4-Bit Pretraining Methodology Using NVFP4, Validated on a…

FLUX.1 Tools – Control and steerability for FLUX – Replicate blog

FLUX1.1 [pro] is here – Replicate blog

Run FLUX with an API – Replicate blog