A founder we work with had been debugging a confusing failure for two weeks. The internal AI assistant could not find content about "GPT-4o pricing" even though the exact phrase appeared in three different chunks in the vector database. Asking for "OpenAI model costs" returned the right results. Asking for "GPT-4o pricing" returned a discussion of GPT-3 from eighteen months ago.
This is the failure mode every vector-only RAG system has. The team was about to swap embedding models. The actual fix was three lines of configuration.
What vector search is actually good at
Dense vector search is excellent at semantic similarity. "How do I cancel my subscription" matches a document titled "Account termination procedure" because the embedding model has learned the two phrases are semantically close. Synonyms, paraphrases, cross-language matches, all of these work because the model has seen enough text to know they mean the same thing.
What dense vectors are not good at is exact-token matching for terms the embedding model has not seen often, or has seen in different contexts. "GPT-4o" is a recent product name that probably did not exist when the embedding model was trained. The model treats it as a sequence of subword tokens and does not have a strong concept of it as a unit. The cosine distance between "GPT-4o pricing" and "GPT-3 cost" can easily be smaller than the distance between "GPT-4o pricing" and "GPT-4o pricing" if the latter appears in a chunk surrounded by very different context.













