Most people talk about better prompts. Hardly anyone talks about what happens before every prompt: the instructions the assistant loads into the context before the actual work begins.

Depending on the system, you pay for that in different ways: input tokens, latency, reduced available context, or simply more noise in the assistant's active instructions. Even if the financial cost is partly reduced through prompt caching, the cognitive cost remains: the assistant still has to operate inside a larger instruction environment.

At some point, my setup had become one single, constantly growing instruction file. System structure, assistant personality, workflows, session rules, special cases: everything was in one file. And everything was loaded into the context on every interaction, no matter whether I was solving a complex task or just asking a quick question.

That is roughly like starting every phone call by reading the entire employee handbook before getting to the actual topic.

The Actual Problem