TL;DRApple rebuilt Siri on a custom 1.2T-parameter Gemini model running on Nvidia Blackwell GPUs in Google Cloud. Federighi says requests are never stored. The company unveiled five new AI models and a three-tier privacy architecture.

Apple’s most important AI announcement at WWDC 2026 was not a feature. It was an architecture.

The rebuilt Siri runs on a custom 1.2-trillion-parameter model built on Google’s Gemini technology, hosted on Google Cloud servers powered by Nvidia Blackwell B200 GPUs. For the company that made privacy its premium product, outsourcing AI inference to its largest competitor’s cloud requires an extraordinary amount of trust engineering.

The three-tier system

Apple now routes Siri queries through three layers. Simple tasks stay on-device using Apple’s own models. Moderately complex requests go to Apple’s Private Cloud Compute servers.