Continual Harness: The Gemini Pokémon Agent That Rewrites Its Own Loop

Most of the work that makes an AI agent good never happens inside the model. It happens in the harness — the code that feeds the model its observations, defines its tools, trims its context, and decides what to do with each response. When an agent fails, the usual fix is a human editing that harness: rewording a tool description, adding a memory store, changing how a screenshot gets summarized. The Continual Harness work, from the teams behind Gemini Plays Pokémon and the PokeAgent benchmark, pushes on a sharper question — what if the model edited the harness itself, while the run was still going?

The harness is where agents actually live

Gemini Plays Pokémon was a public demonstration: a Gemini model worked through a Game Boy Pokémon title via a harness that turned the game into something a language model could reason about. The harness converted pixels into labeled screenshots, a map of the current area, and an inventory list, then exposed button presses and pathfinding helpers as tools. The model never touched raw emulator memory. It saw whatever the harness chose to show it, and it acted only through the tools the harness defined.

That structure is not specific to Pokémon. A coding agent doesn't see your repository — it sees the files a retrieval step pulled in. A browser agent doesn't see a webpage — it sees an accessibility tree some extraction code produced. The harness is the agent's entire sensory system, its motor system, and its memory. The model is one component inside it.

Continual Harness: The Gemini Pokémon Agent That Rewrites Its Own Loop

Other newsrooms on this story

Related reading

Continually improving our agent harness · Cursor

Not All Agentic Harnesses Are Created Equal

The Agent Execution Loop: How to Build an AI Agent From Scratch

Hermes Agent's Learning Loop Is the Only Thing That Makes an Agent Actually Get…

The Sequence Opinion #844: Harness Engineering: The Operating System for…

Agent harnesses, like OpenClaw, are changing how we build and run AI models