Chiranjiv Roy, is an enterprise AI leader & fellow advising multinationals on AI that drives decisions, not dashboards.gettyA few weeks ago, I worked through a deployment review at a Fortune 500 consumer goods company that had spent $14 million over 18 months to bring commercial planning AI live across six markets. The CIO's last board update had called the rollout a success and requested another $9 million to extend it.​Then their VP of Trade Marketing pulled out a spreadsheet. The most experienced category manager in the business had been overriding the system on 78% of the SKUs she touched in the previous quarter, with similar patterns observed across 47 category managers in six markets. The system was running. The board was being told it worked. Almost no one who made commercial decisions used the output to inform their decisions.​When the CFO asked her why, her answer was the one I hear at almost every deployment review I sit through. "It doesn't really understand our business."​That sentence has changed how I think about every deployment I touch.​​The engineering explanation is that the infrastructure around the model has been underbuilt. Martin Fowler coined the term "harness engineering" in February to describe the work of fixing that, and every production system I have signed off on rests on it. But the harness keeps the system running. It says nothing about whether the output is calibrated to your business.​​Let me give a CPG example from my recent work. The system had read a 13-week Nielsen demand trend following Ramadan as a genuine post-holiday uplift for a carbonated soft drink SKU. Anyone who has worked the L13W, as the industry calls it, through two or three Ramadan cycles knows the lag structure makes that misread almost inevitable. The model had no way to know. It recommended a promotional elasticity that would have committed roughly $1.8 million in margin-eroding spend on a single SKU in a single market in a single quarter. The category manager overrode it. The next month, on a different SKU, the model made the same error again. The unrealized trade ROI on the top 200 SKUs alone was running at $40 million to $60 million annually.​The same shape showed up in our own work on AI-driven proactive support for enterprise infrastructure. The system was running across roughly 30,000 storage arrays and flagging drive failures from telemetry drift over thirty-day windows. The senior tier-three engineer with eleven years on that platform recognised the drift signature as a known cache-mirroring artefact on a specific firmware revision, not a drive issue. The truck roll the AI was recommending, $2,200 fully loaded, was wasted. The same misread was firing across every array in that firmware band, generating an estimated 1,400 unnecessary dispatches a year and $3 million in avoidable cost on that one signature. The senior engineer was suppressing alerts manually, which meant accuracy metrics looked excellent because suppressed alerts never aged into the false-positive bucket. Across the installed base, the margin leak was $40 million to $80 million annually.​In both cases, the AI was not broken. It was operating without the calibration layer that makes a senior practitioner's judgement reliable, and the EBIT impact promised to the board did not materialize.​What I am doing about this comes down to four things.​Override rate is now a board-level metric. If senior people are overriding the AI on more than 30% of high-stakes decisions twelve months in, that is a calibration problem, not an adoption problem, and the two need different fixes and different budget.​The procurement question has changed. What model and what architecture filtered correctly in 2024 and no longer do. I now ask vendors to show me how their system will absorb the operational judgement of my ten best people, with what mechanism, and how that mechanism improves over the engagement.​Calibration is a capital investment, not a service line. Enterprise AI economics over the next five years will be decided by which firms encoded proprietary judgement into their stack early and which paid generic vendors to deploy generic systems on proprietary problems.​An AI Impact Model runs at every stage of the project, and it is the kill switch. Engineering dashboards report whether the system is functioning. The Impact Model reports whether it is creating measurable value at each stage gate, against the business case the board approved. If the Impact Model misses two consecutive gates, the deployment stops. Without this discipline, programs continue burning capital long after the evidence says they should not.​The harness has taught the industry how to keep AI running. Whether reliable is the same as useful is a different question, and one that shows up in the P&L.​For the CXO walking into next week's AI review:​• Ask for the override rate first. If no one can answer it, you have no basis for evaluating whether the deployment is creating value, regardless of what the engineering dashboards show.​• Add a contract clause (restructure the Scope of Work for outcome and value). This requires vendors to describe, in mechanism rather than marketing, how their system absorbs your operational judgement and improves over the life of the engagement.​• Move calibration onto your capital plan. The firms that make this shift will look, by 2028, structurally different from those that did not.​• Install an AI Impact Model with stage-gate authority. The single most effective protection against runaway AI spend is a measurement layer that can stop the program when the evidence says stop.​The category manager in the CPG example is still overriding the system. So is the storage engineer. The question every CEO and board needs to ask is whether someone in their own organization is doing the same thing right now, and whether they have any way of knowing.​ Forbes Technology Council is an invitation-only community for world-class CIOs, CTOs and technology executives. Do I qualify?