Kimi K2.5 runs on RTX 3060 with 768GB Intel Optane memory at 4 tokens per second

A trillion-parameter AI model just ran on a graphics card that most gamers would consider mid-range.

A Chinese AI enthusiast known as APFrisco demonstrated Moonshot AI’s Kimi K2.5 model, a Mixture-of-Experts (MoE) large language model with 1 trillion total parameters, running on a single Nvidia RTX 3060 GPU paired with 768 GB of Intel Optane Persistent Memory. The setup achieved roughly four tokens per second, which is slow by production standards but remarkable given the hardware involved.

How a mid-tier GPU handles a trillion parameters

Kimi K2.5 doesn’t actually fire up all 1 trillion parameters at once. For each token generated, only 32 billion parameters are activated. The rest sit idle, waiting their turn.

Even with that efficiency trick, the model is enormous. The full Kimi K2.5 weighs in at approximately 630 GB. Quantized versions, which compress the model’s precision to reduce memory requirements, still clock in around 381 GB. That’s why APFrisco needed 768 GB of Intel Optane Persistent Memory: no standard consumer RAM setup comes close to handling that kind of footprint.

A trillion-parameter AI model just ran on a graphics card that most gamers would consider mid-range.

How a mid-tier GPU handles a trillion parameters

Kimi K2.5 doesn’t actually fire up all 1 trillion parameters at once. For each token generated, only 32 billion parameters are activated. The rest sit idle, waiting their turn.

Kimi K2.5 runs on RTX 3060 with 768GB Intel Optane memory at 4 tokens per second

Kimi K2.5 runs on RTX 3060 with 768GB Intel Optane memory at 4 tokens per second

Other newsrooms on this story

Related reading

Cerebras reports 981 tokens per second on Kimi K2.6 model, 6.7x faster than GPU…

Kimi K2.6 for Local AI in 2026: What VRAM and System RAM You Need to Actually…

Cerebras achieves record speeds serving trillion-parameter AI model Kimi K2.6

Kimi AI releases open-source K2.7 Code model with 1 trillion parameters on APIs…

Moonshot's open model Kimi K2.7 Code undercuts GPT-5.5 and Claude by up to 12x…

Cerebras says its chips run a trillion-parameter AI model nearly 7 times faster…

Other newsrooms on this story

Related reading

Cerebras reports 981 tokens per second on Kimi K2.6 model, 6.7x faster than GPU…

Kimi K2.6 for Local AI in 2026: What VRAM and System RAM You Need to Actually…

Cerebras achieves record speeds serving trillion-parameter AI model Kimi K2.6

Kimi AI releases open-source K2.7 Code model with 1 trillion parameters on APIs…

Moonshot's open model Kimi K2.7 Code undercuts GPT-5.5 and Claude by up to 12x…

Cerebras says its chips run a trillion-parameter AI model nearly 7 times faster…