Why Your AI Agent Needs a guy who wrote games on a ZX Spectrum

A few weeks ago I was mid-sentence, explaining to my own agent why one of its habits was wasteful, when the habit fired. We were discussing — in the conversation itself — how a skill called "Remember This" was burning four LLM round-trips to do the work of one database write. While I was typing that sentence, the trigger matched, and the thing did it again: a robo message, three tool calls, a summary, and finally — four turns later — the fact landed in storage. The runtime had every piece of data it needed before the model even knew there was a conversation to react to. It was like hiring a concert pianist to press the "on" button on a CD player, and watching him do it with feeling.

I laughed, and then I stopped laughing, because I recognized the shape of the problem. I'd seen it before, just never wearing this costume.

What kind of machine is this, actually

The instinctive way to think about a large language model is as a slow, occasionally unreliable processor. Give it better prompts the way you'd give a CPU better-optimized code, give it more context the way you'd give it more RAM, wait for the next model generation the way you'd wait for a clock-speed bump, and the rough edges — the forgetfulness, the drifting attention, the tendency to lose the plot four tool calls into a procedure — will sand themselves down. This is, I think, the consensus view, and it's wrong in a way that matters, because it assumes the LLM is a machine with the kind of deficiency that more resources fix. It isn't. It has a deficiency that's definitional, not incidental.