Most "chat with your documents" demos work in an afternoon. Then you hit the last

20%: retrieval that misses the right passage, an LLM that confidently makes things

up, a reranker that wrecks your latency, chunking you re-tune ten times. And if

your documents are sensitive — legal, medical, internal — you can't just paste

them into a cloud API.