Lev Yatsemyrskyi, Quantitative Technology Director at Qube Research & Technologies. Views are his own.gettyThe numbers from the past couple of years should give every technology leader pause. According to RAND Corporation research, more than 80% of AI projects fail to reach meaningful production deployment, twice the failure rate of traditional IT projects. MIT's Project NANDA found that 95% of organizations saw zero measurable return from generative AI in 2025. And S&P Global reported that 42% of companies abandoned most of their AI initiatives that year, up from just 17% the year before.Read those numbers carefully. The most striking thing isn't the failure rate; it's that the failures are happening even as enterprise AI investment hits historic levels. We are simultaneously spending more on AI than ever before and getting less from it than we expected.Having spent the past several years leading AI-enhanced infrastructure at a major U.S. financial markets operator, I'd argue the conventional explanations—wrong model selection, insufficient training data, organizational change resistance—miss the structural cause. In regulated industries, especially, the binding constraint on production AI performance is not the model. It's the data infrastructure underneath the model.When The Model Gets The Credit, But The Infrastructure Decides The OutcomeThe popular narrative around AI in finance focuses on what's visible: large language models, sophisticated factor models and reinforcement learning for portfolio optimization. The narrative ignores what's structural: the data pipelines feeding those models, the validation layers ensuring inputs are well-formed and the observability infrastructure that catches silent failures before they propagate to outputs.A sophisticated model running on stale, inconsistent or poorly normalized inputs will consistently underperform a simpler model running on clean, timely, deterministically delivered data. This isn't a hypothesis. It's what you see when you sit in front of production telemetry across institutional deployments.The implication is uncomfortable for executives who have invested heavily in model sophistication: Your AI system's ceiling is set by your data infrastructure quality. No model architecture improvement can break through that ceiling.Three Failure Modes That Recur In Regulated EnvironmentsAcross my experience designing AI-enhanced systems in regulated financial markets, three infrastructure failure modes appear consistently:Latency Misalignment: Models trained or validated in batch-oriented environments get deployed against real-time data streams, but the underlying ingestion infrastructure still operates on batch latency assumptions. The model produces outputs within nominal parameters, but those outputs reflect market conditions that are seconds or minutes stale. In regulatory contexts where risk positions can change within milliseconds, this is a silent material deficiency.Data Normalization Debt: Production AI systems typically ingest data from multiple vendors, internal systems and reference data providers—each with distinct identifier schemes, schemas and quality characteristics. Without rigorous normalization governance, inconsistencies in how the same underlying reality is represented across feeds produce systematic noise that no model can fully overcome.Observability Gaps At The Data Layer: Most organizations monitor their AI systems at the output layer—tracking prediction accuracy, risk estimate distributions, portfolio-level outcomes. The data layer producing those outputs is monitored, if at all, through generic infrastructure metrics that don't capture data quality dimensions. Silent degradation—a feed going stale, a normalization rule failing, a vendor changing schema—remains invisible until the problem surfaces downstream, often in regulatory reporting.What 'Infrastructure-First' Looks Like In PracticeClosing the "last mile" between AI potential and production performance requires treating data infrastructure as a first-class engineering concern, with the same rigor and strategic investment applied to model development.Three principles consistently distinguish organizations that succeed at production AI in regulated contexts:First, deterministic validation at ingestion. Every data event entering the system gets validated against a schema registry before becoming available to downstream models. Schema drift at source systems is detected immediately, not after it propagates.Second, audit-grade traceability for every inference. AI-enhanced decisions in regulated environments must be reproducible. That means logging not just the output but the exact inputs, parameter state and computation path, enabling retrospective reconstruction of any historical decision for regulatory examination.Third, input-layer observability as a primary metric. Data freshness, completeness, consistency and schema conformance are tracked with dedicated alerting thresholds. Breaches trigger automated remediation before model outputs are affected.None of these principles require expensive infrastructure additions. What they require is treating the data layer with the seriousness most organizations reserve for the model layer.The Strategic ImplicationFor technology leaders evaluating AI investment priorities, the message is straightforward but counterintuitive: The highest return on AI investment in regulated industries typically comes not from model sophistication but from infrastructure quality.A model that produces a two-point improvement in back-test performance is visible and attributable. A data normalization improvement that prevents a half-point annual drag from stale reference data is invisible until the problem surfaces, which is precisely why it's underinvested in.Firms that treat data infrastructure as a first-class engineering concern—investing in it with the same rigor applied to model development—can achieve compounding advantages in production AI performance, regulatory trust and iteration speed. The companies still focused primarily on model selection are competing on the wrong axis.In regulated finance, infrastructure quality isn't a supporting function. It's the binding constraint.Forbes Technology Council is an invitation-only community for world-class CIOs, CTOs and technology executives. Do I qualify?
Why The 'Last Mile' Of AI In Finance Is An Infrastructure Problem
Your AI system's ceiling is set by your data infrastructure quality. No model architecture improvement can break through that ceiling.








