Bash is one of the most flexible and powerful interfaces exposed to AI agents. In the right system, a model that emits grep, curl, tar, or a shell pipeline is producing an executable action that can read files, mutate a workspace, open network connections, and chain tools together. For the NVIDIA AI Red Team, this makes command generation a useful research target. If smaller language models can be guided into valid, policy-aware command structures, they become more reliable components for agentic workflows that can be deployed into a wider range of environments.

Constrained decoding is a technique that modifies the sampling process in autoregressive language model generation. At each generation step, the model produces logits as normal, but before a token is selected, a grammar is applied to change the distribution (often by effectively blocking certain tokens).

PICARD used this technique to improve SQL generation, for example. The AI Red Team applied the same concept to Bash to improve the ability of small models to successfully achieve command-line tasks.

This post describes an experimental pipeline for generating Bash command grammars and applying them during decoding. We ran 13 small language models against 299 tasks and improved the average pass rate from 62.5% to 75.2%. The strongest result was on Qwen3-0.6B, where the pass rate increased from 16.7% to 59.2%.