The central question is no longer how a model produces a better answer, but how it reliably turns intent into finished work, the researchers say. The goal shifts from reactive Q&A to delegated task execution.

The paper traces the evolution of large language models through five stages, from basic chatbot to autonomous digital colleague. | Image: Tencent Youtu Lab

From fast answers to slow thinking

In the chatbot era, models mostly generated text fast. They stored language patterns and facts in their parameters, then wrote answers in one pass, token by token, following the most likely continuation without checking intermediate steps or searching for solutions.

Thinking LLMs invest extra compute at inference time, exploring solution paths, verifying intermediate steps, and correcting errors before the final answer. | Image: Tencent Youtu Lab