JUNE 15–18|SAN FRANCISCO
Join us at the world’s largest data, apps and AI event.
by The Databricks AI Research Team
Today we’re announcing a major update that makes Agent Bricks Knowledge Assistant both faster and higher quality. Answer generation time has dropped by 2x, and search time has dropped by more than 3x, bringing Time To First Token (TTFT) to around two seconds.¹ Thus, Knowledge Assistant users will get noticeably faster answers across their use cases, with no reconfiguration required and no tradeoff in quality.
These gains are powered by Instructed-Retriever-1, a retrieval-specialized model built for parallel test-time scaling. Unlike standard agentic retrieval, where an agent works sequentially and reasons over each result before deciding its next step, our approach fans this work out in parallel. Instructed-Retriever-1 is a single model trained for both retrieval stages: query generation to increase recall and reranking to increase precision, run in parallel to keep latency low. In this post, we describe how this approach results in a Pareto-optimal performance, how we train one model to support the full retrieval pipeline, and how we validate performance on realistic enterprise workloads.







