1. The Misconception That's Costing You Money

Ask a developer how to reduce their LLM bill and they'll say: "write shorter prompts." Remove adjectives. Trim examples. Cut the system prompt.

This isn't wrong — it's just the lowest-leverage version of the right idea. It optimizes the 4% of your context that is the actual user message while ignoring the 96% that is conversation history, system prompt, idle tool schemas, and over-retrieved documents.

Token optimization is a context engineering problem. The real questions are:

What is in your context that doesn't need to be there?