Instead of writing rules for more efficient AI reasoning themselves, researchers let a coding agent hunt for better control algorithms in a simulated environment. The result beats established methods while burning far less compute.
Test-time scaling (TTS) is meant to make large language models perform better by letting them spend more compute on a response, say, by running several solution paths in parallel or extending chains of thought. Until now, human-written rules almost always dictated when a model kicks off a new solution path, doubles down on a promising one, or kills it.
A research team from UMD, UVA, WUSTL, UNC, Google, and Meta flips that with AutoTTS. Humans don't write the algorithm. Instead, they build the playground where an AI agent figures out algorithms on its own.
The paper argues that many known methods are really just special cases in a shared control space defined by width (how many solution paths run at once) and depth (how far each one goes). So why, the authors ask, do researchers keep plotting paths through this space by hand instead of letting a machine search it?
Simulating the search keeps costs down














