Better Models Won’t Save Your Agent | Pinecone

Anyone building agents in production today has run into the same wall. The model itself is rarely the limiting factor as frontier models are capable of the reasoning most jobs require. What breaks is everything before the reasoning step. The agent gets a task, decides it needs information, searches, retrieves, evaluates results, decides it needs more, searches again, reads, fragments together a partial picture, and loops. By the time the model is in a position to produce an answer, most of the token and latency budget is already gone.This is the gap that defines agent infrastructure right now. The discipline that's emerged around this problem is context engineering: shaping data into knowledge the model can use, instead of asking the agent to reassemble it from raw data at query time.Operationalizing these context pipelines is where teams get stuck, especially across a real company where the context needed for every domain (sales, legal, finance, support, R&D, ops) is shaped differently. Hand-building one context layer per domain doesn't scale past the first one or two.We spent the last year working on this. The rest of this post is about why it's hard, what we built, and what we think comes next.A concrete example: the market-intelligence agentLet’s consider a market-intelligence agent at an investment firm looking over S&P 500 10-K filings. This question is an example of dozens the agent needs to answer:“Among NVIDIA, Microsoft, and Walmart, compare the fiscal 2022 share repurchase activity disclosed in each 10-K. For each company, state (a) the dollar amount of repurchases during the fiscal year and the share count repurchased, (b) the original program authorization size and approval date if disclosed, and (c) the remaining authorization as of the company's fiscal year-end.”For this agent to ship to production, the context layer needs to clear four requirements:Accuracy: Right answers, repeatable across runs. A flaky agent that's correct 70% of the time is non-functional in practice.Task latency: Query budgets in seconds, not tens of seconds or minutes.Token cost: Per-call cost is bounded; agent bills don't compound across the workflow.Governance: Field-level permission enforcement and grounded provenance so that answers can be traced back to their source.However, satisfying all four simultaneously is harder than it sounds. When a team sets out to build a context layer for an agent workload like this, they typically dedicate a team and months of iteration to one of two patterns:Agentic RAG: Chunk the 10-K corpus, embed the chunks, and use hybrid retrieval. Let the agent loop: run the query, rerank, read the top chunks, and loop until it’s satisfied with the answer.Coding Agent in a sandbox: Give the agent access to file-list, page-read, grep, and full-doc-read tools and let it loop. It opens each 10-K, navigates to the capital returns section, parses the table and extracts the answer.While both approaches may eventually get the right answer, it’s often far too slow and expensive to put into production. Both suffer from the same underlying challenge: they make the agent assemble knowledge for each task at query time. Agentic RAG hands the agent chunks and asks it to stitch the answer together. Agentic Sandbox hands the agent files and asks it to navigate to the answer through search, grep, and parsing. In these approaches, the vast majority of the work is spent retrieving raw data and assembling the right context instead of reasoning.From Hand-Engineered Context to Compiled KnowledgeThe solution for problems like this is well-known: don't make the consumer derive structure per query. Pre-shape the data into artifacts that already encode the structure consumers care about, and serve those.This isn't new. Knowledge graphs, entity catalogs, and semantic layers have existed for decades. Every generation of data infrastructure has shipped some version of the same instinct: do the orientation work once, store the result, let downstream consumers read it directly. Context engineering is the latest version of that instinct, now applied to agents instead of dashboards.Where it breaks: operationalizing per domainWhat's hard though isn't the concept. It's operationalizing it.Building a good artifact layer for one domain takes a sophisticated team and months of iteration, deciding which specific curation strategy, retrieval design, evaluation harness, and governance hooks to use. The complication is that a real company doesn't have one domain. It has dozens (e.g. sales, customer support, legal, finance, R&D) each with its own data shapes, schemas, dialects, and access patterns.Multiply months-of-iteration by every domain that wants an agent and you’ll quickly run out of resources building these pipelines. In practice, what ends up happening is that the layer gets built for the one or two highest-value domains at most, or it doesn't get built at all.This is the problem in the agentic era where every domain in a company will run agents and every agent needs context engineered to ship.A new category of knowledge infrastructureThis problem points to the need for a new category of knowledge infrastructure: one where the context layer operates as infrastructure, automatic across domains, rather than hand-tuned and constructed per domain. The layer exists, you provision against it, you don't rebuild it from scratch every time you have a new use case.We've spent the last year building one. It’s called Pinecone Nexus, a purpose-built Knowledge Engine for agents.Inside a Knowledge EngineA Knowledge Engine is built from four primitives, each a composition of the one below:Artifact: A typed, governed piece of information constructed for a specific task or outcome. From the same 10-K data, a market-intelligence agent that wants financial metrics (e.g. revenue, capital gains) will get a different artifact than a compliance agent that wants risk-factor disclosures. Each shape is what makes the underlying representation efficient and tuned for each agent's job.Context: A curated set of artifacts designed for a specific role, team, or workflow. We bundle the analyst's financial-metric artifacts together with the narrative sections their agent needs (MD&A, segment reporting), and that bundle is the analyst's context. The compliance team has its own.Knowledge. The collective body of every context across the company, representing how the business is run across analyst, compliance, M&A, portfolio monitoring, and so on. A query against Knowledge can span as many contexts as it needs; the engine handles the routing.Knowledge Engine. The system that builds and serves all of the above. The core of this is the Context Compiler, an autonomous coding agent that writes and tunes the curation and query code for each domain. Once the build loop completes, it constructs artifacts from raw data, composes them into contexts, and serves each agent's KnowQL query.This is an example company-level artifact compiled using the 10k-SEC filings for the market-intelligence agent.The Context CompilerThe Context Compiler is the autonomous coding agent at the core of the Knowledge Engine. It uses an agentic harness pattern to construct task-optimized Contexts by pairing a coding agent with three things:An eval set you define per domain (representative tasks with known right answers) with corresponding data sourcesA library of pre-vetted skills (e.g. document processing, entity extraction, chunking) the agent can compose into solutions.A feedback loop that scores each iteration against the eval signal.With every iteration, the coding agent modifies two functions, curate() for artifact construction and query() for knowledge retrieval, runs the eval set, uses the failure signal to refine the code, and repeats until the evals pass. The output is a working, tuned Context for that domain.With this approach, any domain expert (even without a retrieval background) can produce an agent-optimized Context since they don’t need to specify schemas, retrieval logic, or artifact shapes upfront. The Context Compiler automatically discovers the right artifact structure, granularity, and construction strategy based on the evals. Most new domains are served by recombining existing skills in new ways; when something genuinely doesn't fit, we add a new skill to the library.In our work with early design partners, the compiler delivered Contexts for new domains in days rather than months. While we're still measuring across more domains and edge cases, the early signal has been promising and we believe this harness-based agentic approach is the foundation for the future of knowledge infrastructure.KnowQLOnce the Context is created, the next step is ensuring the agent can use it effectively. If the agent has to issue a paragraph-level natural-language query and parse a blob of text back, the earlier failures come right back as the agent burns time and tokens re-orienting on every call. We wanted an interface where the agent declares what it needs and gets a precise, typed, cited response back. That's KnowQL (Knowledge Query Language).The "declarative" part is the core design principle. In SQL, you describe what you want (ex. joins, filters, projections) and the engine picks the execution plan. KnowQL is the same idea applied to agentic knowledge retrieval: the agent specifies what answer it needs, in what shape, with what constraints. The Knowledge Engine decides which Contexts to search, which artifacts to read, and how to compose them.A KnowQL query is a composition of four categories to ensure it meets the production requirements for agents:Intent: The question, the response shape, and the Contexts in scope. Note that this can be composed across multiple Contexts.Filter: Deterministic predicates and access-control policies enforced at the surface. The agent only sees what its caller is permitted to see.Provenance: Field-level citations returned by construction, not reconstructed after. Every value carries its source.Control: A budget envelope (depth and latency target). Cost declared in outcomes, not tokens.For the earlier S&P 10-K question, the KnowQL query the agent issues looks like this: {

Better Models Won’t Save Your Agent | Pinecone

Better Models Won’t Save Your Agent | Pinecone

Other newsrooms on this story

Related reading

Your AI coding agent doesn't need a smarter model. It needs your backlog.

Everyone is building smarter Agents. Nobody is fixing what they run on.

Your agent demo works. That's the trap.

Why AI Agents Fail in Production (And How Engineering Teams Are Fixing It in…

My AI Agent Bottleneck Wasn't the Model. It Was the Architecture.

Enterprise AI doesn't need a better model. It needs smarter agent logic.

Other newsrooms on this story

Related reading

Your AI coding agent doesn't need a smarter model. It needs your backlog.

Everyone is building smarter Agents. Nobody is fixing what they run on.

Your agent demo works. That's the trap.

Why AI Agents Fail in Production (And How Engineering Teams Are Fixing It in…

My AI Agent Bottleneck Wasn't the Model. It Was the Architecture.

Enterprise AI doesn't need a better model. It needs smarter agent logic.