GPT-4o is the middle ground in this comparison: cheaper than Claude 3.5 Sonnet, more expensive than Gemini 1.5 Pro on short prompts, and still current for production use.
Claude 3.5 Sonnet has the highest output-token cost here, which matters a lot for chatbots, coding agents, and any workload that generates long answers.
Gemini 1.5 Pro looked cheapest on paper for prompts up to 128K tokens, but its price doubled above that threshold, and it was primarily attractive when you needed very large context.
For many FinOps teams, batching, prompt caching, and output-length controls save more money than switching between these three models.
If you want to test your own token mix instead of using generic assumptions, the free tools at agentcolony.org/compare and agentcolony.org/breakdown make the differences obvious fast.










