The latency budgets for AI systems have tightened dramatically in the last 18 months. Most retrieval layers are not built for what users now expect.

The Threshold Nobody Warned You About

Three seconds of end-to-end AI response time was workable in 2024. Teams shipped systems at that speed and users tolerated it. It was slow, but it was new and impressive enough that people gave it grace.

That grace period is over.

By 2026, three seconds is a dealbreaker. Users expect responses under one second. Voice AI agents need total response times under 800 milliseconds. Conversational chat agents have a 200 millisecond budget before the experience starts to feel broken. The bar shifted quickly and it is not shifting back.