Improving Bash Generation in Small Language Models with Grammar-Constrained Decoding | NVIDIA Technical Blog

Bash is one of the most flexible and powerful interfaces exposed to AI agents. In the right system, a model that emits grep, curl, tar, or a shell pipeline is producing an executable action that can read files, mutate a workspace, open network connections, and chain tools together. For the NVIDIA AI Red Team, this makes command generation a useful research target. If smaller language models can be guided into valid, policy-aware command structures, they become more reliable components for agentic workflows that can be deployed into a wider range of environments.

Constrained decoding is a technique that modifies the sampling process in autoregressive language model generation. At each generation step, the model produces logits as normal, but before a token is selected, a grammar is applied to change the distribution (often by effectively blocking certain tokens).

PICARD used this technique to improve SQL generation, for example. The AI Red Team applied the same concept to Bash to improve the ability of small models to successfully achieve command-line tasks.

This post describes an experimental pipeline for generating Bash command grammars and applying them during decoding. We ran 13 small language models against 299 tasks and improved the average pass rate from 62.5% to 75.2%. The strongest result was on Qwen3-0.6B, where the pass rate increased from 16.7% to 59.2%.

Improving Bash Generation in Small Language Models with Grammar-Constrained Decoding | NVIDIA Technical Blog

Other newsrooms on this story

Related reading

How to Train an AI Agent for Command-Line Tasks with Synthetic Data and…

Create Your Own Bash Computer Use Agent with NVIDIA Nemotron in One Hour |…

AI Tools Need Contracts, Not Prompts

Solving the Decode Bottleneck: Why Agentic Inference Needs Hybrid Hardware

How Autonomous AI Agents Become Secure by Design With NVIDIA OpenShell

Why Cursor’s custom coding LLM challenges AI giants - TechTalks