I measure how fast 42 LLMs actually answer. Here's the honest method.

An honest method for benchmarking LLM speed: tokens per second vs time to first token, across 42 Ollama Cloud models, measured every 10 minutes.

martedì 16 giugno 2026 New tab

495 words~2 min read

I test software for a living. So when a vendor calls an AI model "fast," I don't trust the word. I measure it.

Most leaderboards rank how smart a model is. Almost none rank how fast it answers. You pick a model because it scored well, ship it, and then your users sit and wait.

Speed is two different numbers. People mix them up constantly.

The two numbers

Time to first token (TTFT). The wait before the first word appears. You feel this every time a chatbot "thinks" before replying.

Other newsrooms on this story

· 1 sources

Full timeline →

infoworld.com·Jun 15, 2026 · 2 g fa
33 LLM metrics to watch closely

I measure how fast 42 LLMs actually answer. Here's the honest method.

Other newsrooms on this story

I measure how fast 42 LLMs actually answer. Here's the honest method.

Other newsrooms on this story

Related reading

LLM Speed Benchmarks: Metrics & Infrastructure Guide

33 LLM metrics to watch closely

AI 101: From Tokens to Answers: What Actually Happens During LLM Inference

I Wish I Knew These Speed Numbers Sooner — Here's the Full Breakdown

Speculative Decoding: How LLMs Generate Tokens Faster Without Changing the…

One Ruler to Measure Them All: How Language Affects LLM Quality

Related reading

LLM Speed Benchmarks: Metrics & Infrastructure Guide

33 LLM metrics to watch closely

AI 101: From Tokens to Answers: What Actually Happens During LLM Inference

I Wish I Knew These Speed Numbers Sooner — Here's the Full Breakdown

Speculative Decoding: How LLMs Generate Tokens Faster Without Changing the…

One Ruler to Measure Them All: How Language Affects LLM Quality