Learn what drives API latency in LLM apps, how to measure TTFT and inter-token latency, and practical ways to reduce it with caching and vector search.