Cerebras reports 981 tokens per second on Kimi K2.6 model, 6.7x faster than GPU cloud

Cerebras Systems is now serving Moonshot AI’s Kimi K2.6, a 1-trillion-parameter open-weight Mixture-of-Experts model, at 981 output tokens per second. That number, verified by independent testing from Artificial Analysis, represents 6.7 times the speed of the next-best GPU cloud provider.

For context, the median inference provider clocks in at roughly 23 times slower.

What the numbers actually look like in practice

On a representative agentic coding workload, with 10,000 input tokens and 500 output tokens, the Cerebras-powered setup delivered a complete response in 5.6 seconds.

The same task on the official Kimi endpoint took 163.7 seconds. That’s a 29x improvement in end-to-end latency.

For context, the median inference provider clocks in at roughly 23 times slower.

What the numbers actually look like in practice

On a representative agentic coding workload, with 10,000 input tokens and 500 output tokens, the Cerebras-powered setup delivered a complete response in 5.6 seconds.

The same task on the official Kimi endpoint took 163.7 seconds. That’s a 29x improvement in end-to-end latency.

Cerebras reports 981 tokens per second on Kimi K2.6 model, 6.7x faster than GPU cloud

Cerebras reports 981 tokens per second on Kimi K2.6 model, 6.7x faster than GPU cloud

Other newsrooms on this story

Related reading

Cerebras achieves record speeds serving trillion-parameter AI model Kimi K2.6

Cerebras says its chips run a trillion-parameter AI model nearly 7 times faster…

Kimi K2.5 runs on RTX 3060 with 768GB Intel Optane memory at 4 tokens per second

Moonshot's open model Kimi K2.7 Code undercuts GPT-5.5 and Claude by up to 12x…

Open-weight Kimi K2.6 takes on GPT-5.4 and Claude Opus 4.6 with agent swarms

Kimi K2.6 for Local AI in 2026: What VRAM and System RAM You Need to Actually…

Other newsrooms on this story

Related reading

Cerebras achieves record speeds serving trillion-parameter AI model Kimi K2.6

Cerebras says its chips run a trillion-parameter AI model nearly 7 times faster…

Kimi K2.5 runs on RTX 3060 with 768GB Intel Optane memory at 4 tokens per second

Moonshot's open model Kimi K2.7 Code undercuts GPT-5.5 and Claude by up to 12x…

Open-weight Kimi K2.6 takes on GPT-5.4 and Claude Opus 4.6 with agent swarms

Kimi K2.6 for Local AI in 2026: What VRAM and System RAM You Need to Actually…