Streaming Tokens and Tools: Multi-Turn Agentic Harness Support in NVIDIA Dynamo | NVIDIA Technical Blog

An agentic exchange must preserve a structured interaction: assistant turns interleave reasoning with one or more tool calls, and subsequent user turns return the corresponding tool results to the model context. Reasoning replay is model- and turn-dependent: some reasoning should be retained, while some should be dropped.

The inference engine is responsible for supporting this more expressive interaction model and for producing correctly segmented API results. Tool-call parsing and reasoning parsing need to happen before the attached harness consumes the response. High-value agentic workflows such as coding also depend on a responsive harness experience: reasoning segments, tool-call events, and request metadata need to stream back as the turn unfolds instead of arriving only after a final text response.

This post covers lessons from running real agentic clients against NVIDIA Dynamo: how we hardened parser and API coverage, improved streaming behavior, and extracted those parser layers into standalone reusable crates.

These changes build on the performance considerations outlined in our first post, which focused on the serving architecture underneath agentic inference: the frontend, router, and KV cache management. This follow-up focuses on correctness, user-experience equivalence, and performance.

Streaming Tokens and Tools: Multi-Turn Agentic Harness Support in NVIDIA Dynamo | NVIDIA Technical Blog

Other newsrooms on this story

Related reading

Building NVIDIA Nemotron 3 Agents for Reasoning, Multimodal RAG, Voice, and…

Other newsrooms on this story

Related reading

Building NVIDIA Nemotron 3 Agents for Reasoning, Multimodal RAG, Voice, and…

NVIDIA Nemotron 3 Nano Omni Powers Multimodal Agent Reasoning in a Single…

Not All Agentic Harnesses Are Created Equal

Leading Inference Providers Achieve Lowest Token Cost With Open Source Models…

Full-Stack Optimizations for Agentic Inference with NVIDIA Dynamo | NVIDIA…

Continual Harness: The Gemini Pokémon Agent That Rewrites Its Own Loop