When you use an AI agent, the more contextual data the agent has about the job, the better it will perform.

But agents don’t have much memory, since the large language models (LLMs) they depend on are stateless. When their memory runs out, the agent glitches out, hangs up, or spews out nonsense. Tactics like truncating or compacting agent memory can make up for this, but they’re not real solutions.

A better answer to the AI agent memory crunch is memory that lives and persists outside of the agent itself. The agent’s memory is still used for immediate work, but the longer-term, big-picture details get offloaded to another service and retrieved on demand.

The term for this is retrieval-augmented generation, or RAG. It has become as significant a technology as the agents and LLMs themselves, as it expands their capabilities in-place.

The basics of RAG