StepFun Releases Step 3.7 Flash: A 198B MoE Vision-Language Model for Coding Agents and Search Workflows

StepFun today released Step 3.7 Flash, a multimodal Mixture-of-Experts model targeting agentic use cases. It adds native vision input and improved tool-use reliability over Step 3.5 Flash.

What is Step 3.7 Flash?

Step 3.7 Flash is a 198B-parameter sparse Mixture-of-Experts (MoE) vision-language model. It pairs a 196B-parameter language backbone with a 1.8B-parameter vision encoder (ViT) for native image understanding.

The model activates approximately 11B parameters per token during inference. In MoE architectures, only a subset of “expert” sub-networks fires per forward pass — not the full network. This keeps inference compute closer to an 11B dense model while maintaining a 198B total parameter budget.

Key specs:

StepFun today released Step 3.7 Flash, a multimodal Mixture-of-Experts model targeting agentic use cases. It adds native vision input and improved tool-use reliability over Step 3.5 Flash.

What is Step 3.7 Flash?

Key specs:

StepFun Releases Step 3.7 Flash: A 198B MoE Vision-Language Model for Coding Agents and Search Workflows

StepFun Releases Step 3.7 Flash: A 198B MoE Vision-Language Model for Coding Agents and Search Workflows

Other newsrooms on this story

Related reading

Run Step 3.7 Flash on NVIDIA GPUs with Enterprise-Ready Multimodal AI | NVIDIA…

Researchers train AI model that hits near-full performance with just 12.5…

Google's Gemini 3.6 Flash AI model series is here

FlashAttention Explained: The Optimization That Made Modern LLMs Practical

AI Week in Review 26.01.24

DeepSeek-V3: The 671B MoE Model You Can Run Locally in 2026

Other newsrooms on this story

Related reading

Run Step 3.7 Flash on NVIDIA GPUs with Enterprise-Ready Multimodal AI | NVIDIA…

Researchers train AI model that hits near-full performance with just 12.5…

Google's Gemini 3.6 Flash AI model series is here

FlashAttention Explained: The Optimization That Made Modern LLMs Practical

AI Week in Review 26.01.24

DeepSeek-V3: The 671B MoE Model You Can Run Locally in 2026