Most "chat with your documents" demos work in an afternoon. Then you hit the last
20%: retrieval that misses the right passage, an LLM that confidently makes things
up, a reranker that wrecks your latency, chunking you re-tune ten times. And if
your documents are sensitive — legal, medical, internal — you can't just paste
them into a cloud API.






