Originally published on lavkesh.com

Our AI feature was humming, then the cloud bill tripled overnight, and nobody had warned us about that part.

For three years the industry chased bigger training runs, bragging about model size, data volume, and benchmark scores. Training makes headlines, but it’s a one‑off expense.

In early 2026 inference spend finally overtook training spend. Fifty‑five percent of AI cloud infrastructure now powers inference, up from about thirty percent three years ago, and analysts expect seventy to eighty percent by year‑end.

During development the API calls are a few hundred a day, the cost looks like noise, and everything feels fine. Deploy to real users and those calls explode from hundreds to millions, and the per‑token rate that looked reasonable on a spreadsheet becomes a massive line item.