Satyabrat Chowdhury, Field CTO at CoreStack | Mentor, Mentorship EDGE Program—University of Washington Bothell School of Business.gettyA CTO at a mid-market financial services firm showed me his cloud bill last fall. His engineering team had shipped a customer-facing AI feature three months earlier—a smart document summarization tool that was getting strong adoption. The billing surprise wasn’t that the feature was expensive. Nobody knew why the costs kept climbing. The observability tool showed green. Latency was fine. Error rates were low. His observability stack told him the system was healthy. His finance team told him he was $400K over budget.That gap—between “operationally healthy” and “financially visible”—is where I spend most of my time now.The Numbers Are Catching UpFor a long time, AI cloud costs were a rounding error. Training runs were expensive but infrequent. Inference was cheap. That equation has been inverted.IDC projects that G1000 organizations will face up to a 30% rise in underestimated AI infrastructure costs by 2027—not because they’re overspending, but because they’re under-forecasting expenses that don’t fit their existing cost models. The FinOps Foundation’s most recent State of FinOps report makes the scale of the shift concrete: 98% of organizations now manage AI spend as a formal practice, up from 31% just two years ago. AI cost management isn’t an emerging discipline anymore. It’s table stakes—and most organizations are still scrambling to build the muscle.Why Your Observability Stack Is Flying BlindTraditional observability was designed to answer three questions: Is the system up? Is it slow? Is it throwing errors? For a decade, that was enough. APM platforms, distributed tracing and infrastructure monitors—all built around CPU cycles, request latency and error logs.AI inference workloads don’t follow that logic. An agentic workflow that completes successfully might make six redundant API calls to get there, each costing fractions of a cent that compound to thousands of dollars at scale.The financial signal lives in a layer those tools simply weren’t instrumented to read: token consumption patterns, model routing decisions, cache hit rates, batch-versus-real-time inference splits. These aren’t operational metrics. They’re economic ones.FinOps And Observability Have To MergeFinOps teams have cost data. They can tell you GPU spend climbed 40% last month. They can’t tell you which model, which feature or which engineering team drove it. Observability teams have signal data. They know a model call completed in 280 milliseconds. They have no idea what it cost or why that cost mattered.The organizations getting this right have stopped treating FinOps and observability as parallel tracks. They’re building what I’d call cost observability—an instrumentation layer that ties every model invocation to a cost center, a business function and a unit economic metric. Not a dashboard that displays spend. A system that explains spend causality.This isn’t just a rebranding of existing practices. It requires tagging model invocations with the same rigor you tag cloud resources, building token-level cost attribution into your telemetry pipeline and treating cost anomalies with the same urgency as latency anomalies.What This Looks Like In PracticeIn conversations with engineering leaders across financial services, healthcare and retail, a consistent set of patterns is emerging among the teams making real progress.They’ve moved from dollar alerts to rate alerts. They’re not watching total monthly spend—they’re watching token consumption per user session, per workflow, per API endpoint. They treat cache hit rate as a first-class financial metric, not just an engineering efficiency signal. They’ve built cost attribution into their deployment pipelines so every new model version ships with a spend profile alongside its performance profile.Agentic workflows deserve particular attention. A single orchestrated agent task can involve a dozen model calls, tool executions and retrieval operations. At test scale this looks manageable. At production scale, with thousands of concurrent users, the cost multiplier can reach 100x compared to a single API call—a reality the FinOps Foundation now flags as one of the fastest-growing sources of AI budget overrun. If you don’t have token-level attribution across the full chain, you’re not managing cost. You’re discovering it after the fact.The Ask For ExecutivesIf you’re a CTO or vice president of engineering reading this, I’d push you to ask one question in your next architecture review: Can we trace a dollar of AI spend back to a specific product decision?For most organizations, the answer is still no. That’s the gap to close—not by buying another monitoring tool, but by changing how your engineering and finance teams instrument and share signal together. The goal isn’t a prettier cost dashboard. It’s accountability at the model invocation level, so that when AI costs climb, you know exactly what drove them and can make a deliberate choice about whether that spend is justified.The organizations that build this capability in the next 18 months will carry a structural cost advantage into every AI investment that follows. AI economics are only going to get more complex as agentic systems mature and token volumes scale. Building the observability foundation now—before the next surprise bill lands—is the only way to stay ahead of it.Forbes Technology Council is an invitation-only community for world-class CIOs, CTOs and technology executives. Do I qualify?
AI’s Hidden Tax: Why Your Observability Stack Can’t See Your Biggest Cloud Cost
That gap—between “operationally healthy” and “financially visible”—is where I spend most of my time now.















