1. Introduction: The "Why"
Why this project exists
In modern agentic architectures, systems often rely on Large Language Models (LLMs) to make basic routing decisions (e.g., determining if a user is asking for a password reset, a refund, or general support). While effective, this approach introduces three significant bottlenecks:
High Latency: Calling an external API takes hundreds of milliseconds.
Token Costs: Paying per-token for simple classification is economically inefficient at scale.









