Run Step 3.7 Flash on NVIDIA GPUs with Enterprise-Ready Multimodal AI | NVIDIA Technical Blog

AI applications are moving beyond text generation to multimodal systems that can perceive, search, and reason across images, documents, video, and language in real time—turning fragmented information into actionable insights.

Step 3.7 Flash, the latest from StepFun, brings these capabilities to production and enterprise-scale, available on NVIDIA-accelerated infrastructure. It is a 198B-parameter Mixture-of-Experts vision-language model, with approximately 11B activated parameters per forward pass, optimized for agentic workflows that combine perception, search, and multi-step reasoning at production scale.

With native image and video input, three configurable reasoning levels—low, medium, and high—and a 256k context window, it is designed for enterprise use cases such as financial analysis, concurrent coding agents, and other high-throughput multimodal use cases. Developers can use StepFun’s NVFP4-quantized checkpoint available through Hugging Face for boosted inference due to reduced memory bandwidth and storage requirements.

ModelStep 3.7 Flash Total parameters 198B Visual encoder parameters 1.8B Active parameters 11B Context length 256K Experts 288 (8 active) Table 1. Overview of the key Step 3.7 Flash specs, such as parameter counts, context length, and MoE configuration

Run Step 3.7 Flash on NVIDIA GPUs with Enterprise-Ready Multimodal AI | NVIDIA Technical Blog

Run Step 3.7 Flash on NVIDIA GPUs with Enterprise-Ready Multimodal AI | NVIDIA Technical Blog

Other newsrooms on this story

Related reading

StepFun Releases Step 3.7 Flash: A 198B MoE Vision-Language Model for Coding…

Deploy Long-Context Reasoning and Agentic Workflows with MiniMax M3 on NVIDIA…

Boost Inference Performance up to 15x on NVIDIA Blackwell Using DFlash…

Building NVIDIA Nemotron 3 Agents for Reasoning, Multimodal RAG, Voice, and…

Powering AI Factories with NVIDIA Enterprise Reference Architectures | NVIDIA…

Accelerating Vision AI Pipelines with Batch Mode VC-6 and NVIDIA Nsight |…

Other newsrooms on this story

Related reading

StepFun Releases Step 3.7 Flash: A 198B MoE Vision-Language Model for Coding…

Deploy Long-Context Reasoning and Agentic Workflows with MiniMax M3 on NVIDIA…

Boost Inference Performance up to 15x on NVIDIA Blackwell Using DFlash…

Building NVIDIA Nemotron 3 Agents for Reasoning, Multimodal RAG, Voice, and…

Powering AI Factories with NVIDIA Enterprise Reference Architectures | NVIDIA…

Accelerating Vision AI Pipelines with Batch Mode VC-6 and NVIDIA Nsight |…