Why LLMs should stop thinking out loud (and what comes after chain-of-thought) - TechTalks

Recent industry events point to a massive return-on-investment dilemma as AI token spend spirals out of control. Uber reportedly blew through its entire year’s AI budget in months, forcing a drastic re-evaluation of its agentic workflows. Meta has placed hard caps on its internal AI compute spend, while Amazon has shut down its internal AI leaderboard that encouraged engineers to burn cash on LLM tokens. If tech giants with virtually limitless pockets are sweating the compute bill and hitting the brakes, the broader tech ecosystem faces an even steeper uphill climb.

Much of this compute budget is burning on Chain-of-Thought (CoT) prompting and training. CoT is the method where large language models (LLMs) are instructed, or fine-tuned, to “think step-by-step” before delivering a final answer. In general, LLMs perform better on reasoning tasks when forced to generate a sprawling sequence of intermediate “thinking” tokens.

CoT originally became the undisputed industry standard because it was a brilliant, pragmatic hack. It leveraged the existing text-generation interface of autoregressive transformers without requiring structural changes to the underlying architecture. It scaled predictably with added inference-time compute and gave human operators an easily readable, text-based trace of what the model was ostensibly doing.

Why LLMs should stop thinking out loud (and what comes after chain-of-thought) - TechTalks

Other newsrooms on this story

Related reading

LLMs generate ‘fluent nonsense’ when reasoning outside their training zone

LLM Reasoning Budget: How Developers Should Spend Thinking Tokens Without…

The Hidden Overthinking Flaw That Could Drag AI Services Down

Do reasoning models really “think” or not? Apple research sparks lively debate,…

The Sequence AI of the Week #867: Thinking in Latents: Why Sapient's HRM-Text…

Latent Reasoning in AI: Thinking Beyond Token-Based CoT