AutoTTS reduces token usage by 69.5% in LLM reasoning strategies

Getting a large language model to think harder at inference time, a technique called test-time scaling, has become one of the more reliable ways to squeeze better answers out of AI systems. The problem is that designing those “think harder” strategies has traditionally been a manual, intuition-driven slog. Researchers tinker with heuristics, run expensive experiments, and hope they’ve found something close to optimal.

A new framework called AutoTTS, developed by researchers from Meta, Google, the University of Maryland, the University of Virginia, Washington University in St. Louis, and the University of North Carolina, takes humans largely out of that loop. The result: a roughly 69.5% reduction in token usage compared to strong handcrafted baselines, with essentially no loss in accuracy.

How AutoTTS works, and why the numbers matter

AutoTTS replaces manual process with an agentic loop. The system uses Anthropic’s Claude Code as an explorer agent to autonomously develop, test, and refine inference strategies. Instead of requiring repeated calls to the target LLM during the discovery phase, AutoTTS works from pre-collected reasoning trajectories and probe signals.

The benchmark comparison tells the story. Against SC@64, a well-known handcrafted baseline, AutoTTS achieved its 69.5% token reduction at a specific operating point (beta approximately 0.5) while matching the baseline’s mean held-out accuracy. The discovered strategies scored an average of 45.3 on held-out accuracy versus 45.2 for the baseline.

How AutoTTS works, and why the numbers matter

AutoTTS reduces token usage by 69.5% in LLM reasoning strategies

AutoTTS reduces token usage by 69.5% in LLM reasoning strategies

Other newsrooms on this story

Related reading

LLM reasoning, automated: tokens drop 69.5%

Other newsrooms on this story

Related reading

LLM reasoning, automated: tokens drop 69.5%

Researchers let Claude Code discover AI scaling algorithms that humans probably…

Less is more: Meta study shows shorter reasoning improves AI accuracy by 34%

How we optimized our LLM pipeline to cut token usage by 70%

Optimizing Language Models: Cost vs. Performance Trade-offs in Production

OpenAI cuts inference costs in half with new optimization technique