A trillion-parameter AI model just ran on a graphics card that most gamers would consider mid-range.
A Chinese AI enthusiast known as APFrisco demonstrated Moonshot AI’s Kimi K2.5 model, a Mixture-of-Experts (MoE) large language model with 1 trillion total parameters, running on a single Nvidia RTX 3060 GPU paired with 768 GB of Intel Optane Persistent Memory. The setup achieved roughly four tokens per second, which is slow by production standards but remarkable given the hardware involved.
How a mid-tier GPU handles a trillion parameters
Kimi K2.5 doesn’t actually fire up all 1 trillion parameters at once. For each token generated, only 32 billion parameters are activated. The rest sit idle, waiting their turn.
Even with that efficiency trick, the model is enormous. The full Kimi K2.5 weighs in at approximately 630 GB. Quantized versions, which compress the model’s precision to reduce memory requirements, still clock in around 381 GB. That’s why APFrisco needed 768 GB of Intel Optane Persistent Memory: no standard consumer RAM setup comes close to handling that kind of footprint.













