TL;DRAI

NVIDIA released SpatialClaw, a training-free spatial agent achieving 59.9% accuracy by using code as action interface, beating SpaceTools by 11.2 points. Code composition enables step-by-step geometric reasoning with mid-inference revision, addressing a VLM bottleneck crucial for robotics and embodied AI.

NVIDIA Research has released SpatialClaw, a training-free framework for spatial reasoning. It targets a persistent weakness in vision-language models (VLMs). These models still struggle to judge where objects are, how they relate, and how they move in 3D.

SpatialClaw does not retrain the model. Instead, it changes the action interface the agent uses to call perception tools. The research team argues the interface is the bottleneck. Their solution is to treat code as the action interface. Across 20 benchmarks, SpatialClaw reaches 59.9% average accuracy. It outperforms the recent spatial agent SpaceTools by 11.2 points.

What is SpatialClaw

SpatialClaw is an agent loop wrapped around a stateful Python kernel. The kernel is pre-loaded with input frames and a set of primitives. Perception tools are plain Python callables. Their outputs, including masks, depth maps, camera geometry, and trajectories, are ordinary Python variables.

The kernel exposes six public entry points. InputImages holds the sampled frames. Metadata carries frame rate, duration, and frame indices. tools exposes perception and geometry primitives. show() embeds an image into the agent’s next context. vlm dispatches queries to a separate VLM session. ReturnAnswer() submits the final answer.

marktechpost.com

NVIDIA AI Introduce SpatialClaw: A Training-Free Agent That Treats Code as the Action Interface for Spatial Reasoning

NVIDIA's SpatialClaw uses code as the action interface for spatial reasoning, reaching 59.9% average accuracy across 20 benchmarks.

sabato 20 giugno 2026 New tab

TL;DRAI

1,056 words~5 min read

What is SpatialClaw

NVIDIA AI Introduce SpatialClaw: A Training-Free Agent That Treats Code as the Action Interface for Spatial Reasoning

NVIDIA AI Introduce SpatialClaw: A Training-Free Agent That Treats Code as the Action Interface for Spatial Reasoning

Other newsrooms on this story

Related reading

Build a More Secure, Always-On Local AI Agent with OpenClaw and NVIDIA NemoClaw…

NVIDIA Enables the Next Era Of Physical AI Research With Agent Skills For…

NVIDIA's LocateAnything-3B: The AI Vision Model That Could Redefine Object…

Nemotron Labs: What OpenClaw Agents Mean for Every Organization

NVIDIA Research Unlocks Advanced Grasping, Smarter Autonomous Driving and Agent…

AI coding agents can autonomously direct robot training

Related reading

Build a More Secure, Always-On Local AI Agent with OpenClaw and NVIDIA NemoClaw…

NVIDIA Enables the Next Era Of Physical AI Research With Agent Skills For…

NVIDIA's LocateAnything-3B: The AI Vision Model That Could Redefine Object…

Nemotron Labs: What OpenClaw Agents Mean for Every Organization

NVIDIA Research Unlocks Advanced Grasping, Smarter Autonomous Driving and Agent…

AI coding agents can autonomously direct robot training

Other newsrooms on this story