This article covers the third layer of the full-stack architecture: the Hybrid Retrieval Layer. Core engineering challenge: general-purpose embedding models drift on domain-specific terminology, and single-path vector retrieval cannot distinguish fine-grained semantic differences.
0. The Pain Point
Part 1 built the knowledge base. Part 2 handled chunking. The first version of the system used text-embedding-ada-002 for retrieval — OpenAI's most mainstream embedding model at the time.
The results:
Recall rate: 82% — 18% of relevant content simply wasn't found






