The token economy is here. And as soon as you have an economy, people start budgeting. So it’s no surprise that most conversations about AI now come back to the token budget. Teams are being asked to show the value of inference—the action phase of AI—spend and justify the cost. The answer traces back to the data that was retrieved and fed to the LLM. This step is intuitively referred to as “retrieval.”What retrieval feeds has changed, though. In the RAG era, retrieval fed a model that generated an answer, and getting it wrong meant a hallucination the user would likely catch. In the agentic era, the system retrieves, reasons about what it finds, and decides to act. Wrong retrieval becomes a wrong step, a retry, and a fresh round of tokens—and that cost accumulates across the whole trajectory instead of ending at a single response. So, inaccurate retrieval is expensive in two directions at once. It produces worse actions that erode the trust required to put an agent into production, and it burns tokens on the padded context it sends up front and again on the retries when the answer still misses. Both failures start with retrieval. McKinsey’s 2026 AI Trust Maturity Survey* found that 74% of organizations identify inaccuracy as an AI risk, the most cited concern as AI adoption scales. The practitioners I talk to have already connected inaccuracy to the budget conversation, because better retrieval means fewer retry loops, lower token consumption, and less latency—and those things compound quickly at production scale. Accurate retrieval is what makes AI trustworthy and efficient in production. The teams that get this right spend less time debugging wrong answers and less money on retry loops. It's the reason we brought Voyage AI into MongoDB last year. And today at MongoDB.local Bengaluru, we launched three new capabilities that put that accuracy closer to where the data lives: voyage-context-4, Hybrid Search, and Native Reranking in MongoDB Atlas. Together, they help developers retrieve the right information the first time, so the agents built on them act on current, relevant data and spend fewer tokens getting there.Retrieval that understands the full document contextMost AI applications that search over documents start by splitting them into smaller passages that an embedding model can process. The problem is that most embedding models treat each chunk without any awareness of the document it came from, embedding it without the context of what came before or after it in the broader text. For short, self-contained content that's usually fine, but enterprises don’t only work with short, self-contained content. Legal contracts, research reports, regulatory filings, earnings transcripts—these are documents where meaning accumulates across pages, and pulling a passage without its surrounding context produces a retrieval result that is technically relevant but substantively incomplete.voyage-context-4, generally available today, solves this at the model level by reading each chunk in the context of the full document before generating its embedding, so the vector representation carries the meaning of that passage as it exists within the whole. It slots into existing RAG pipelines without requiring teams to re-architect anything, which means the accuracy improvement arrives without the cost of a migration effort.Search that combines meaning and precision, in one queryHigh-quality embeddings determine how well the system understands content. But the questions users actually ask rarely fit into a single retrieval mode. A query references an exact product name, an error code, or a clause number alongside a conceptual question about what it means. Vector search alone can miss the exact term. Keyword search alone misses the intent. Hybrid search runs both and fuses the results into a single ranked list, and it has quietly become the standard pattern for production retrieval.Hybrid Search in MongoDB is now generally available. Two new aggregation stages, $rankFusion for a reliable default and $scoreFusion for finer control, combine full-text and vector search in a single query, composable with the rest of the aggregation pipeline developers already use. There is no separate search engine to deploy, no fusion logic to hand-write, and no second copy of the data to keep current.That last point is the one that matters most for accuracy. When retrieval runs in a system that sits beside the database, every update to the data must cross a synchronization boundary before the search reflects it, and that gap is where agents pick up stale context. Hybrid search in MongoDB runs on live operational data, with embeddings generated and kept up to date automatically as documents change. The retrieval result reflects the data as it is now, which, for an agent acting on what it retrieves, is the difference between an answer grounded in the current state and one built confidently on an old copy.Native Reranking, built into the pipelineReranking consistently improves retrieval quality, but most teams treat it as a nice-to-have rather than a default. Even with strong first-stage retrieval, results are often ranked by similarity rather than relevance, padding the context with passages that aren't useful. Native reranking re-scores results with a more sophisticated model, trimming the retrieval set to what actually matters before it reaches the LLM.Native reranking lifts retrieval quality by up to 30% out of the box**, which significantly reduces the high barrier to achieving relevant search results and accelerates development cycles. What gets less attention is the cost it removes. Every passage that reaches the model is one that it has to read and reason over on expensive GPU compute, and that cost scales with how much you feed it. Trimming the irrelevant passages before they reach your models cuts down on how much reasoning the model has to do, so you stop paying frontier-model rates to reason over context that was unlikely to matter. Reranking consistently improves retrieval quality, but most teams treat it as a nice-to-have rather than a default. The integration overhead has made it hard to justify: a separate vendor, a separate API key, additional round-trips outside the database, and another pipeline to monitor. For teams moving fast, that's often enough friction to skip it.Native Reranking in MongoDB Atlas, now in public preview, makes that trade-off unnecessary. A new $rerank aggregation stage powered by Voyage AI re-scores search results within the MongoDB aggregation pipeline, with no additional APIs or pipelines to manage. Teams get both the accuracy improvement and downstream compute savings as a native part of their existing query.The accuracy gains from better embedding models show up fastest where the requirements are hardest. Tenali AI builds in-call intelligence for enterprise sales teams, searching unstructured data across Slack, CRMs, PDFs, and past meeting transcripts to surface the right answer in under one second, while the conversation is still happening. After consolidating onto MongoDB Atlas and implementing Voyage embedding models, Tenali cut retrieval latency by 67% and saw accuracy show up in user behavior: sales reps actively using Tenali's answers on live calls rose by over 40%, the clearest signal that reps trust the answers enough to use them in front of a buyer. As Aniket Patel, Founder of Tenali AI, put it: "Voyage AI provides the accuracy and performance we need to deliver answers in under a second while the conversation is still happening." Putting accuracy where the data livesEach of the capabilities announced today reflects the same architectural principle. Retrieval accuracy can degrade when data has to cross from one system to another. Handoffs between the database and a separate search system can create opportunities for the two to fall out of sync. Embeddings fall behind when data updates. Pipelines break quietly, and the failure shows up as wrong answers in production long before anyone traces it back to a stale embedding or a misconfigured reranker.When these components live inside the database, they stay synchronized with the data by default, which means teams spend less time monitoring retrieval infrastructure and more time building. The system is designed to get more accurate as the underlying models improve without requiring teams to rewire anything. And because accuracy reduces the retries that drive up costs, improving the data layer improves the efficiency of the whole system, not just the quality of individual answers.Retrieval accuracy is where production AI gets decided, and what we shipped today is MongoDB's contribution to making that easier to get right, and to keep right, wherever you're building.Accurate retrieval only creates value if you can run it where your data lives. For regulated enterprises, that means on-premises or private cloud—and until today, that meant giving something up. My colleague Ben Cefalo covers what we shipped to close that gap: Run AI Wherever Your Compliance Framework Demands.*Data from McKinsey’s 2026 AI Trust Maturity Survey**Based on Voyage instruction-following rerankers on the MAIR benchmark; improvement measured over first-stage retrieval.