"Every AI lab is losing money serving your company right now. They know it. And they are doing it on purpose."That was the opening line of an article that landed in my inbox the same week 3 numbers crystallized to make it clear why open inference is no longer optional.A developer built a simple notes app over a weekend using an open source coding agent with a direct API key. One page, one feature. Cost: $50. The next day, a $20/month subscription provided 50x more tokens.One engineer on our inference team consumed 300 million tokens through open-weight models in 2 days, doing the same work that would have cost thousands of dollars through proprietary APIs.The first 2 numbers reveal the subsidy. The third reveals the way out. And the reason we need that way out is agents.The pricing model is in transitionThe frontier model providers have done something remarkable. They have made world-class AI accessible to millions of developers at price points that would have been unthinkable 2 years ago. That accessibility has driven an explosion of adoption. It has also created an economic structure that may not be sustainable at current rates.The Is AI profitable yet? website tracks the AI industry's general financial picture. AI companies currently spend roughly 195% of their revenue. One contributor to the discussion calculated that, in 2024 dollar equivalents, the total AI capital expenditure in 10 years has been roughly 3x the cost of the entire U.S. Interstate Highway System. These are the kinds of investments that will eventually be reflected in service pricing.Signs of that transition are already appearing. Reports circulated that some large enterprises are reconsidering AI coding tool licenses as token-based billing replaces flat-rate subscriptions, with per-engineer costs reaching $500 to $2,000 per month. Newer frontier models deliver modest benchmark improvements while consuming 10-25% more tokens per task. GPU supply is already locked up 3 to 4 years into the future, and several GPU cloud providers are already sold out of capacity.None of this is a criticism of the providers. They're building the future and pricing aggressively to accelerate adoption. Per-token unit costs will likely continue to fall, and Gartner projects a 90% reduction by 2030. But as the same analysis notes, agentic workloads consume so many more tokens per task that total enterprise inference spend is expected to rise despite cheaper units. Goldman Sachs forecasts a 24-fold increase in token consumption by 2030. Enterprises planning for the next 3 to 5 years should prepare for higher aggregate costs, not lower. Open source models running on self-managed infrastructure are the release valve that will help keep AI accessible as consumption scales.
Why agentic AI needs an open inference stack
Learn how open inference stacks can help manage the rising costs of agentic AI workloads, as per-token costs fall but total consumption rises. Discover the benefits of open source models running on self-managed infrastructure for enterprise-scale AI.















