Inside AskData: How We Slashed Token Consumption by Over 90%

Data underwrites how products evolve and how companies make decisions. But pulling answers out of a data warehouse, quickly, correctly, with the right business context baked in, is still harder than it should be.As Pinecone grew into a multi-product, multi-channel business, static dashboards stopped being enough. The questions that actually drive decisions, about pipeline health, retention risk, product adoption, revenue mix, rarely fit neatly into a pre-built view. Analysts became the bottleneck. Ad-hoc questions went unasked. Decisions got made on stale numbers or gut feel.To close that gap, we built AskData: an in-house AI data agent that explores and reasons over our warehouse, informed by the accumulated knowledge of how our business actually operates. That knowledge is scattered across Slack threads, call transcripts, CRM records, billing systems, and internal documents. Surfacing all of it alongside structured warehouse data is what makes the difference between a query runner and an agent people actually trust.In May 2026, we rebuilt AskData on top of Pinecone Nexus. Token usage dropped by over 92%. Query turns dropped by 78%.This is the story of how we got there.The Last Mile is a Knowledge ProblemPinecone's data stack is a pretty standard setup. Events land in BigQuery. dbt transforms them. Mart tables feed dashboards on top.The pipelines work. The dashboards work. The gap is the last mile; translating questions phrased in business vocabulary into the right table, right column, the right filter and the right caveat. And do it at scale.That is a knowledge problem, not a data problem.The data lives in the warehouse. The meaning of data lives somewhere else. Which view is canonical for ARR? Which metric has how much lag? Which accounts to filter out? When the definition changed last quarter. None of that is discoverable from schema inspection.For analysts working in the warehouse every day, this is another normal day. For everyone else, the cost of self-service is high enough that most ad-hoc questions never get asked at all. Decisions get made on stale dashboards or gut feel. Analysts become the bottleneck for every cross-functional question.That's the gap AskData had to close.V0 — Throw it into Claude/Cursor, See what happensThe obvious starting point was to wire a set of tools to BigQuery, dbt and a few internal docs and hand them directly to local coding agents like Claude or Cursor. Many internal users tried this in late 2025.The agent loop itself was not the problem. Given enough business context fed by hand in each session, the coding agent could read SQL, reason about transformation, and replicate metric definitions well enough. The problem was everything else.Same question, different answers. Two people asking the same question would walk away with different SQL, different filters and different numbers. When an agent is reporting critical business metrics meant to drive decisions and align mental models is inconsistent, decisions stop. We needed one canonical answer to “what does ARR mean” and a way for a correction caught by one person to reach the next person immediately. We needed a centralized knowledge management and agent harness that can achieve consistency and reproducibility.No shared learning. The questions users ask range from "what's last month's revenue" to "explain why this account is at risk and what to do about it." Those questions require very different levels of reasoning and intelligence. Figuring out which model tier fits which kind of questions is challenging work you want to do once, for everyone.No feedback loop. Without centralized tooling, there was no eval to run regressions against, no production observability and no signal about which questions were tripping up the agent.Orientation tax on every session. Each fresh session starts cold. Schemas alone do not carry business meaning, so the agent has to blind-traverse the unstructured context (dbt code, Slack threads, analyst notes, query history) from scratch on every question, burning tokens orienting before it can answer.AskData V1: Building the Knowledge LayerThe agent loop wasn't the hard part. The hard part was the layer above the SQL, the one that holds what the SQL formulas and numbers actually mean for the business.A traditional 'semantic layer' holds schema and metric definitions, typically manually maintained descriptions of structured data which slowly drift out of sync with your business. What we needed was a knowledge layer that holds the unstructured context (Slack, Gong, dbt comments, docs) that explains why a metric is defined the way it is, and when that definition last changed.That knowledge layer had to bridge the vocabulary gap between how people ask questions and how SQL expresses logic. Raw dbt SQL didn't work. SQL encodes transformations, not meaning. The question "how many monthly active orgs do we have" embeds as a vector that has almost nothing in common with a count(distinct …) expression over an is_active flag.Across business questions like "how is our ARR trending" or "did our service have any outages this month," LLM-summarized markdown describing the relevant warehouse table scored at least 2X higher cosine similarity to the question than the raw SQL that defines or queries the same table. Same data, different vocabulary. Embedding models alone couldn't close the gap.So we started writing the knowledge articles. A few high-quality hand-written markdown files at first, then scripts that used LLMs to generate more from dbt models and query logs, then a Curator agent whose only job was to investigate gaps and propose edits. By the time V1 stabilized, the KB was 234 markdown files (18,000 lines) served by Pinecone Assistant. Five additional retrieval surfaces (Slack threads, Gong calls, historical SQL, dbt source) ran on Pinecone vector indexes with integrated inference. Hosted embedding and reranking meant no embedding pipeline to manage, and the retrieval substrate was Pinecone end-to-end. The five ETL pipelines feeding it hadn't been tuned beyond "it runs daily."V1 launched in #ask-data. Three months in:StatValueQuestions answered3,690Slack channels with active threads40Follow-up runs (chained conversations)~49%Avg SQL queries per run2.2Questions per day (May 11)191The surprise wasn't volume. It was the 49% follow-up rate. People were having conversations with the data: adjusting scope, drilling into a result, comparing cohorts. The bar for asking dropped, and the long tail of small questions started showing up. This is the gap BI tools have spent years trying to close with drill-downs and other ad-hoc explore primitives.A 24/7 data analytics agent that gets the basics right reshapes how a company decides.Where V1 hit its limitsThe retrieval substrate was Pinecone end-to-end, but the agent's view of it was anything but unified.By the time V1 stabilized, the system had grown to:22 tools across two agents (DataAgent + Curator).6 dedicated retrieval surfaces (Pinecone Assistant + three Pinecone indexes + dbt file reads + historical SQL search).1,300 lines of Airflow code syncing Slack, Gong, and BigQuery logs into Pinecone every day.2,200 lines of Curator code maintaining 18,000 lines of hand-curated markdown across 234 files.A system prompt that grew with the agent, explaining when to use search_kb versus search_slack versus search_query_logs versus grep_dbt, how to dedupe across them, how to handle dbt's ref() macros that don't match a literal grep.Each backend brought its own client, schema, embedding strategy, retry logic, and ETL pipeline. The Curator existed because the KB couldn't maintain itself. There was no layer underneath that compiled the parts into something coherent; cross-source synthesis happened at query time, by the agent, on every question.That cost showed up in the token traces. A multi-part question ("what was the total pipeline amount, opportunity count, and weighted pipeline for opportunities qualified in January") took 9 steps and around 240,000 tokens. The agent fanned out across KB searches, dumped a 292-column schema JSON into context, re-searched twice to find the right date column, ran a DISTINCT query just to learn the vocabulary, and finally got SQL right on its 4th attempt. 7 of those 9 steps were spent orienting (which table, which column, which filter) before the actual analysis could begin.A compiler doesn't re-parse its source on every run. Without a knowledge layer underneath, agent infrastructure was doing exactly that.That's what Nexus had to fix.What Nexus had to bePinecone Nexus was being designed in parallel with V1, and AskData was the workload it had to support first.. A few asks coming directly from V1's pain stuck:One curation pipeline, many sources. A single managed system that takes structured, semi-structured, and unstructured inputs from every source, and produces task-specific views and artifacts the agent can leverage in one call. For AskData that meant natural-language-to-SQL semantics: which table, which column, which pattern. Not five ETL pipelines, not five retrieval surfaces. One.Adaptive knowledge representation. The artifact's schema and representation shouldn't require human design upfront. It should evolve organically based on the task at hand, driven by the eval signal and source data, not by a fixed ontology or hand-authored template.Human-in-the-loop knowledge updates. The Curator agent's actual job (investigate a gap, propose an edit, get it reviewed) had to survive into Nexus, not as a separate agent but as a first-class feedback mechanism.Nexus shipped against those requirements. The architecture and primitives (Context Compiler, KnowQL, and the rest) are their own story; for that, see the Nexus deep dive.The migrationThe migration started with defining the eval, since we needed clear retrieval outcomes for Nexus's build loop. V1 had no clean contract; the agent just drove the context expansion from six tools and stitched them together itself.We built the eval from V1 production traces. Each question had a full call log: which tools the agent fired, what each tool returned, which chunks the agent ultimately used to write SQL. For each question, we extracted the minimum context payload that would have let the agent get the SQL right on the first attempt. Those payloads became the expected outputs in the eval. The eval set was the target Nexus's build loop optimized toward.Sample Eval Question:{

Inside AskData: How We Slashed Token Consumption by Over 90% | Pinecone

Inside AskData: How We Slashed Token Consumption by Over 90% | Pinecone

Other newsrooms on this story

Related reading

Nexus in the Wild: Real Results from Our Early Access Customers | Pinecone

ZoomInfo and Pinecone Bring Real-Time, AI-Powered Contact Recommendations to…

Inside Pinecone: Slab Architecture | Pinecone

Pinecone Nexus: The Knowledge Engine for Agents | Pinecone

How 1up turns sales reps into product experts with Pinecone | Pinecone

Full Observability for Pinecone: Introducing an Open-Source Monitoring Stack…