Dropping your entire Markdown documentation folder into an LLM prompt sounds easy - until you see the API bill. Large contexts mean large costs, especially when users ask repetitive or highly specific questions.

When building the documentation assistant for my project, LinkShift.app (a programmable redirect and link-mapping platform running on the edge), I knew the learning curve would be steep for users dealing with Regex, Liquid templates, and edge routing rules. Instead of taking the easy route and watching my API budget melt, I designed a multi-tier, ultra-low-cost AI agent architecture.

Here is how I solved token bloat and kept response times blazing fast.

The Tiered Architecture at a Glance

Instead of throwing a massive model at the full chat history and documentation for every single query, the system filters the request through three distinct phases: