For this week in AI’s essay, I would like to discuss Thinking Machines’ work on interactive models which takes multi-modality to a new level. I’ve been diving as much as possible on their ideas and wanted to share some thoughts. The work is early but so impressive. Check it out to get started: For the last few years, the default mental model for large language models has been embarrassingly simple: concatenate tokens, predict the next token, repeat. The human writes a message, the model replies, the human writes again. This works surprisingly well for many tasks because text is forgiving. Text can wait. It can be buffered, edited, compressed, and serialized into one neat causal stream.But collaboration is not text. Collaboration is temporal.
The Sequence AI of the Week #863: The Model is the Interface: Inside Thinking Machines' Interactive Models
Thinking Machines’ interactive models turn real-time conversation, vision, audio, and tool use into one continuous learned system.











