How to Cheat LLM Context: A Lightweight AI Doc Assistant Architecture

Dropping your entire Markdown documentation folder into an LLM prompt sounds easy - until you see the API bill. Large contexts mean large costs, especially when users ask repetitive or highly specific questions.

When building the documentation assistant for my project, LinkShift.app (a programmable redirect and link-mapping platform running on the edge), I knew the learning curve would be steep for users dealing with Regex, Liquid templates, and edge routing rules. Instead of taking the easy route and watching my API budget melt, I designed a multi-tier, ultra-low-cost AI agent architecture.

Here is how I solved token bloat and kept response times blazing fast.

The Tiered Architecture at a Glance

Instead of throwing a massive model at the full chat history and documentation for every single query, the system filters the request through three distinct phases:

How to Cheat LLM Context: A Lightweight AI Doc Assistant Architecture

Related reading

Your LLM Forgets Everything. Give It a Wiki!

You Can’t Prompt Your Away Your LLM Problems | Towards AI

Building a Lightweight Remote MCP Knowledge Base on Cloudflare Workers

Stop Your LLM From Getting Owned

How to Orchestrate Autonomous Sub-Agents Without Blowing Your LLM Context Window

Lessons from Embedding an LLM Inside a Drag-and-Drop Editor