Data Provenance: The Trust Layer For Agentic AI

Gaurav Aggarwal, Senior Vice President at Onix, Global Head Presales & Solutions Engineering.gettyFor the last two years, most AI conversations in the enterprise have started with the same question: What can the model do? It is a natural question. Generative AI helped teams write faster, search faster, summarize faster and automate pieces of work that once consumed hours.But as AI moves from generating responses to taking action, I believe the more important question is changing. It is no longer only "What can the model do?" It is now "What data made the system act?"That question brings us to data provenance. Data provenance is the story behind the data. Where did it come from? Who changed it? Which system touched it? Was it allowed to be used for this purpose? Which decision did it influence?In a reporting world, poor provenance may lead to a wrong dashboard. In an agentic AI world, poor provenance can lead to a wrong action. That is where the risk becomes far more serious.The New Risk Behind Autonomous AI Generative AI gives an answer. Agentic AI can take a step.An AI agent may retrieve information, call an API, interact with another agent, update a system or complete a workflow. In many enterprise use cases, multiple agents may work together: One gathers data, one interprets it and one acts.That chain is only as trustworthy as the data moving through it. If the first agent pulls stale, incomplete or unauthorized data, the issue does not remain at the first step. It travels into the reasoning layer. Then it travels into the action layer. By the time someone notices, the enterprise may not be dealing with one bad output. It may be dealing with a bad decision already executed.This is why data provenance cannot remain a back-office governance topic. It is becoming one of the most important control points for agentic AI.The scale of this shift matters. Gartner expects agentic AI to be embedded in 33% of enterprise software applications by 2028, compared with less than 1% in 2024. It also expects at least 15% of daily work decisions to be made autonomously by agentic AI by 2028.Even if adoption moves more slowly than expected, the direction is clear: More enterprise decisions will depend on whether the underlying data can be trusted.The Provenance Gap Most Leaders Miss Many organizations think they are AI-ready because they have dashboards, access controls and lineage tools. These are important, but they are not enough. Lineage shows where data moved. Provenance asks whether the data was clean, approved for AI use, transformed correctly, masked properly and which agent used it to take action.That is where many AI programs become fragile. In one enterprise modernization discussion I was part of, leaders were excited about co-pilots, predictive insights and intelligent workflows. But once we moved from ambition to architecture, the real issue became clear. It was not the model. It was the data foundation. Different teams had different versions of the same customer and product data. Some datasets had no clear ownership, and some reports were trusted only because they had been used for years.That experience stayed with me. Enterprise AI does not fail only because the model is weak. It fails when the organization cannot explain the data behind the decision. Before enterprises scale autonomy, they need to scale accountability.The Evidence Behind The Provenance Wake-Up Call The concern is not theoretical.Gartner’s projection shows that agentic AI is moving quickly into mainstream enterprise software. That means data issues that were once contained inside analytics or reporting may soon sit inside automated workflows and decision loops.IBM’s 2025 breach research points to a similar pattern from a security angle: Many organizations are adopting AI faster than they are building the controls around it. IBM reports that 63% of breached organizations studied did not yet have an AI governance policy.For data provenance, that matters because ungoverned AI often means undocumented data paths. If teams use unofficial tools, informal data extracts or unapproved retrieval sources, the enterprise may lose the ability to answer a basic question later: What data led to this action?Regulation is also moving toward greater transparency. In July 2025, the European Commission published a template for general-purpose AI model providers to summarize training content under the AI Act framework. The broader signal is clear: Leaders will increasingly be expected to explain not only what an AI system did but what data shaped its behavior.The Provenance Playbook For Leaders 1. Control how agents access data. Agents should not pull information through informal or unknown paths. They need governed access layers.2. Keep a record of agent activity. Which agent accessed which dataset? When did it happen? What changed afterward? These records should be difficult to alter and easy to review.3. Map data dependencies around AI. This includes training data, retrieval sources, third-party data, synthetic data and agent-to-agent exchanges.4. Turn policy into execution. A PDF policy will not keep pace with autonomous workflows. Wherever possible, rules should be built into systems themselves.5. Take ownership. Someone must be responsible for the quality of data an agent uses, the sources it can retrieve from and the handoffs it makes to other agents. Without clear ownership, provenance becomes everyone’s concern and no one’s responsibility.The Leadership Takeaway In the agentic AI era, the biggest risk may not be a bad model. It may be good-looking automation built on data no one can fully explain.The organizations that lead will not simply deploy the most agents. They will be the ones that can answer (with confidence): Where did the data come from? Was it allowed to be used? What changed along the way? Which decision did it influence?That is the real value of data provenance. Not more paperwork. Not another governance layer. But the confidence to let intelligent systems act because the data behind them can be trusted.Forbes Technology Council is an invitation-only community for world-class CIOs, CTOs and technology executives. Do I qualify?

Data Provenance: The Trust Layer For Agentic AI

Other newsrooms on this story

Related reading

Data Is the Real Model: Governance, Lineage, and Provenance

Technology Innovation Institute: AI agents need proof, not promises | Fortune

Why Trust Is The Bottleneck For Agentic AI—And Governance Solves It

Architecting an Enterprise RAG Platform: Shifting from AI Hype to Production…

The Non-Technical Blueprint For Agentic AI: Navigating History, Risk And Human…

Establishing AI and data sovereignty in the age of autonomous systems