Most AI systems today work in turns. You type or speak, the model waits, processes your input, and then responds. That’s the entire interaction loop. Thinking Machines Lab, an AI research lab, is arguing that this model of interaction is a fundamental bottleneck. Thinking Machines Lab team introduced a research preview of a new class of system they call interaction models to address it. The main idea for their research is interactivity should be native to the model itself, not bolted on as an afterthought.

What’s Wrong with Turn-Based AI

If you’ve built anything with a language model or voice API, you’ve worked around the limitations of turn-based interaction. The model has no awareness of what’s happening while you’re still typing or speaking. It can’t see you pause mid-sentence, notice your camera feed, or react to something visual in real time. While the model is generating, it’s equally blind — perception freezes until it finishes or gets interrupted.

This creates a narrow channel for human-AI collaboration that limits how much of a person’s knowledge, intent, and judgment can reach the model, and how much of the model’s work can be understood.

To work around this, most real-time AI systems use a harness — a collection of separate components stitched together to simulate responsiveness. A common example is voice-activity detection (VAD), which predicts when a user has finished speaking so a turn-based model knows when to start generating. This harness is made out of components that are meaningfully less intelligent than the model itself, and it precludes capabilities like proactive visual reactions, speaking while listening, or responding to cues that are never explicitly stated aloud.