On-Device AI Just Got Real

Apple's newest on-device model carries about 20 billion parameters, and on any given request it fires maybe one to four billion of them. That gap — 20B stored, roughly 3B running — is the whole story of 2026. The model that now ships inside the latest iPhone is no longer a shrunken, lobotomized cousin of the cloud model. It's a different kind of object: large in flash, small in motion, and it never phones home.

For three years the on-device pitch was mostly aspirational. Demos ran, latency was rough, quality trailed the API by a generation, and every serious AI feature still resolved to a per-token bill in someone's datacenter. In mid-2026 that stopped being true. Two releases — Apple's third-generation Foundation Models at WWDC on June 8, and Google's Gemma 4 family on April 2 — quietly moved the floor. Genuinely useful agents now run on hardware you already own, offline, for free.

The economics nobody priced in

Forget benchmarks for a second; the load-bearing fact here is accounting. When the model lives in the cloud, every inference is a metered event — input tokens, output tokens, a line item that scales linearly with usage and explodes the moment you wrap the model in an agent loop. Agentic workloads are the worst case for the token meter: a single "go do this task" can fan out into dozens of model calls as the agent plans, calls tools, retries, and re-reads its own output. The bill grows with your ambition.

The economics nobody priced in

On-Device AI Just Got Real

On-Device AI Just Got Real

Other newsrooms on this story

Related reading

Apple AFM 3 breaks on-device AI memory limits

Apple details the AI models behind the new Siri

iOS 27 On-Device AI and the Hardware-Gated Edge Inference Split

Apple rebuilds Siri on Google AI and Nvidia chips at WWDC

Report: Apple fully relies on local AI models at WWDC

Apple to Renew Push for AI That Runs on Devices, Instead of the Cloud

Related reading

Apple AFM 3 breaks on-device AI memory limits

Apple details the AI models behind the new Siri

iOS 27 On-Device AI and the Hardware-Gated Edge Inference Split

Apple rebuilds Siri on Google AI and Nvidia chips at WWDC

Report: Apple fully relies on local AI models at WWDC

Apple to Renew Push for AI That Runs on Devices, Instead of the Cloud

Other newsrooms on this story