Inside Apple's Ground-Up Siri Rebuild at WWDC 2026 With Google's Gemini and Nvidia

Fifty people sat in the developer centre across the road from Apple Park, and the man on stage had come to bury a sentence. "Siri runs on Gemini." Craig Federighi spent the first ten minutes of an exclusive post-keynote session taking it apart, slide by slide. The room was small for the size of the claim. Some of the biggest names in technology writing were in it. So were Apple's marketing chief Greg Joswiak and the SVP of hardware and chief executive-elect John Ternus. And in the seats, watching, sat Tim Cook, taking in his last WWDC as the company's boss.This was not the comeback. The comeback had been the morning's keynote. This was the engine bay, opened up, with Federighi flanked by the three lieutenants who built the thing: Amar Subramanya, who leads artificial intelligence and machine learning; Mike Rockwell, who leads Siri engineering; and Sebastian Marino-Mess, who runs the Intelligence System Experience team. What they walked through was stranger than the headline allowed. How Apple tore Siri to the ground and rebuilt it. What the Google and Nvidia help actually was. And how a privacy promise survives being run on a competitor's servers.Key TakeawaysApple uses none of Google's deployed assistant stack for the new Siri — not the Gemini app, not Google's customer-facing models, not its serving infrastructure, and not Google Search. Federighi's phrase was blunt: "the amount of the Google assistant we use, which is none."The new Siri runs on Apple's own third-generation Apple Foundation Models (AFM), custom-built for Apple Silicon and, in Subramanya's words, "refined using outputs from Gemini frontier models" — distillation, where Gemini teaches and Apple's smaller models learn, rather than Gemini doing the talking.For its single most demanding model, AFM Cloud Pro, Apple extended Private Cloud Compute onto Nvidia GPUs inside Google's cloud, using Nvidia confidential compute, Intel TDX and two separate hardware roots of trust to keep the privacy guarantee intact.A new on-device model, AFM Core Advanced, packs 20 billion parameters but lights up only one to four billion per request through a sparse architecture, which is how features like the new expressive voices run entirely on the phone.Pre-keynote reporting had framed the heaviest model as a licensed Gemini that Apple's own servers could not run fast enough; the executives' on-record account is narrower and more Apple-flattering, and the gap between the two is the story worth watching.Siri AI ships as a gated beta later this year, English first, expected to reach general release with iOS 27 around September.So is the new Siri just Gemini in a trench coat?No, and Federighi built the whole session around proving it. He put up a diagram of how a normal AI assistant works — "this could be any of the things that you're using on your device today, whether from ChatGPT, Claude, and so forth" — a client app talking to large language models on someone's servers, those models reaching out to a web search to ground their answers. Then he mapped Google's own version onto it: the Gemini app, talking to Google's Gemini models, grounded by Google Search. And then he started taking pieces away. "When it comes to our system, well, we use none of those things," he said. No Gemini app. None of the models Google deploys to its customers. None of its serving infrastructure. No Google Search for grounding. "This is the amount of the Google assistant we use, which is none."Think of it as a film with a famous co-writer. The posters all read "from the makers of Gemini," and the lazy assumption is that Google shot the picture. What Federighi described is a film Apple wrote, cast, shot and cut itself, on its own lot, having quietly studied a rival director's footage to sharpen its craft. The cast it hired is entirely its own. The assistant experience is woven into iOS, iPadOS and macOS, summoned from the side button or called by name. Sitting on top is the Siri app, the place a conversation lives so a user can return to it. Beneath that runs the part that matters most, which Federighi called the system orchestrator. It feeds off three sources Apple controls: an App Toolbox for actions inside apps, the Spotlight semantic index for personal content, and on-screen context for whatever a user is looking at. Under that sit Apple's on-device models. Above the device, when a request needs more horsepower, sits Apple's own Private Cloud Compute. Even the grounding in current events comes not from Google Search but from what Federighi called "Apple's world knowledge service," built in-house over years. Every box in the diagram, in other words, has Apple's name on it.What does the system orchestrator actually do?It is the part of the engine that decides where the work goes. The orchestrator is to Siri what an engine control unit is to a modern car: the component that reads the request, weighs the conditions and routes the power, all in the time it takes you to lift your foot. When a request lands, it builds the prompt, judges whether the job can be settled on the device or needs the cloud, and pulls in only the context the task requires. A message thread here. What is on screen there. "It's what coordinates requests against things like the App Toolbox," Federighi explained, "the Spotlight semantic index to access personal content to help fulfill your request, and even things like on-screen context."This sits at the centre of the privacy design rather than off to one side, and for a reason. The orchestrator is what keeps personal data on the device by default and ships only the minimum upward when it must. A petrol engine does not burn all eight cylinders to crawl through a car park; the clever ones shut half of them off. The orchestrator works that same logic on your data. It keeps the heavy, personal material idling privately on the phone, and wakes the cloud only for the slice of a request that genuinely needs it. Hold that idea. It returns when the privacy architecture gets tested.Why did Apple scrap a Siri that already worked?Because a version that merely worked was not the version Apple was willing to ship. This is the detail that explains the two lost years, and Rockwell told it without flinching. A year ago, his team built a first take — the personalised Siri Apple had promised at WWDC 2024 — by bolting new abilities onto the original assistant. "Last year, we had actually built a first version of this that was sort of incremental on top of the original Siri that added tool calling and had these capabilities, and we had it working," he said. It functioned. It demoed. And the team binned it. "We didn't feel it was really delivering on the vision and the experience that we wanted to do," Rockwell went on, "so we went back, and we rebuilt Siri from the ground up. Literally tore it to the ground, rebuilt it from the ground up."It is the decision every honest director faces in the screening room. The first cut tests fine, the studio would ship it, and you know in your gut it is not the film. Scrapping a working version is the most expensive choice in the building. It cost Apple the delay, the pulled advertisement, the year of ridicule. It is also the only choice that produces a foundation rather than a patch. "It's a Siri that has its own application, it's natively multimodal, it's privacy from the ground up, and it's available across all of your platforms," Rockwell said of what replaced it — the satisfaction of a man who got to do it properly the second time audible in the list.So what did Google and Nvidia actually provide?Two precise things: Gemini taught Apple's models, and Nvidia's hardware ran Apple's most demanding one. Subramanya walked through the third-generation Apple Foundation Models, and the through-line was that this is a family Apple built. "Every model is a significant leap both in quality and capability compared to our previous generation," he said, before laying the family out.ModelWhere it runsWhat it is forAFM CoreOn deviceThe next-generation on-device model shipping today, dense architectureAFM Core AdvancedOn deviceA sparse, natively multimodal model powering dictation and the new expressive voicesAFM CloudPrivate Cloud ComputeThe server workhorse, tuned for latency and serving costAFM Cloud ImagePrivate Cloud ComputeImage generation and editing, including spatial reframingAFM Cloud ProPCC, extended to Nvidia GPUs in Google's cloudThe most capable model, for agentic tool use and complex reasoningThe sentence that matters most came when Subramanya described how four of those models were made: "custom built for Apple Silicon, trained using proprietary data with RL, and refined using outputs from Gemini frontier models." That last clause is the real shape of the Google deal. This is distillation. A large, expensive teacher model generates high-quality answers, and a smaller student model learns to imitate them. Gemini is the seasoned cinematographer Apple invited onto the lot to watch and learn from; the footage in the finished film is Apple's. The teacher does not ride along in your pocket.The one place Google's cloud does the running is AFM Cloud Pro, Apple's most capable model, which Subramanya described as having "quality similar to Gemini frontier models." To serve it, he said, "we work with both Google and Nvidia to extend our Private Cloud Compute infrastructure to Nvidia GPUs in Google's cloud, while maintaining Apple's unmatched privacy guarantees." Here honesty demands a second voice in the room. Pre-keynote reporting, led by The Information, had told a less flattering version: that Apple had licensed a large Gemini model, found its own Private Cloud Compute too slow to run something of that scale, and was routing the heaviest queries to Google's data centres on Nvidia's Blackwell B200 chips out of necessity rather than design. The executives' on-record account reframes the same facts. An Apple model, distilled and refined, running on Apple's own deployment software that has been extended onto borrowed hardware. Both descriptions point at the same Nvidia GPUs in the same Google data centre. The argument is over whose brain is doing the thinking when they spin up, and that argument is the one to keep watching as developers pull the system apart in the months ahead.Why is the man who built Google's Gemini now running Apple's AI?Because Apple, hollowed out by an exodus and stung by the delay, hired the director of its comeback from the very studio whose work it is now learning from. The reshuffle behind the rebuild began in December 2025. John Giannandrea, who had steered Apple's AI strategy for seven years and reported directly to Tim Cook, announced he would retire in the spring. His standalone AI organisation was broken up and folded into the software group under Federighi, with the remainder parcelled out to chief operating officer Sabih Khan and services chief Eddy Cue. The message was unmistakable. The experiment of a separate AI-strategy silo was finished, and AI was now simply software, owned by the engineers who ship it.Into that gap walked Amar Subramanya, the engineer whose own words carry much of this story. His route ran from Bangalore University to a doctorate at the University of Washington in graph-based semi-supervised learning, the art of teaching a model from oceans of unlabelled data with only a little that is labelled. For a company that keeps your data off its servers by choice, that skill sits close to a holy grail. Then to Google, where as the engineering lead for Bard, later Gemini, he was the operational hand that scaled it from an English-only experiment to a global product in dozens of languages within months, and that tackled the hallucination problem by having the model write and run code to check its own answers. A brief stop at Microsoft followed. Then Apple, as a vice president under Federighi, leading Apple Foundation Models, machine-learning research and AI safety.Read that arc again and the irony writes itself. Apple poached the man who built the rival's blockbuster to direct its own, and the models he now oversees are sharpened, by his own account on that stage, using the outputs of the very system he once scaled. The teacher's old engineer is now the student's headmaster. He arrived carrying a philosophy that could have been drafted in Cupertino: "it's not possible for us to be bold in the long run if we're not responsible from the get go," he had said during Bard's privacy-delayed European rollout, naming privacy as the non-negotiable pillar.Apple looked outside because the inside had emptied. Through 2025 its Foundation Models team bled talent, most of it to Meta, including Ruoming Pang, the man who had led that team, reportedly lured by a package north of two hundred million dollars, with others leaving for OpenAI. And the partnership now sitting at Siri's core is only the newest strand in a rivalry that was always also a marriage. Apple is Google Cloud's largest customer, nicknamed "Bigfoot" inside Google for the sheer scale of its footprint. Google, in turn, pays Apple around twenty billion dollars a year to remain Safari's default search. Against that backdrop, refining your models on a competitor's outputs and renting its servers for the heaviest reasoning reads less like surrender than the latest step in a fifteen-year dance. Eddy Cue had set the stakes plainly in court a year earlier: "You may not need an iPhone 10 years from now." The hire, and the whole rebuild, is Apple's answer to that sentence.How can a 20-billion-parameter model run on a phone?Through a sparse architecture that switches on only the part of the model each job needs. Subramanya gave the figures plainly: the model shipping on devices today is around three billion parameters; the new AFM Core Advanced is twenty billion, "but depending on the request, it uses anywhere between 1 and 4-billion parameters, so you're getting the power, the generalization of this really large model, but you're paying the cost of a much smaller model."The cleanest way to picture this is an engine with cylinder deactivation. A large V-engine has the muscle of all its cylinders when you floor it, yet shuts most of them down to sip fuel on a motorway cruise. A sparse model does the same with its parameters: vast capacity in reserve, a fraction of it firing on any given request. The catch, Subramanya explained, is that most sparse models route at the level of each individual word, "so you're constantly having to swap parameters in and out," which means keeping the whole engine resident in memory. Fine on a server with power to burn. Ruinous on a phone, where memory and battery are the hard constraints. Apple's answer was to design the routing for the device from the start. "What Core Advanced does is it looks at the entire request, chooses the right set of parameters, and then locks them in for the entire request" — it picks the cylinders once and runs the whole journey on them, rather than re-selecting at every revolution. That is the trick that lets a twenty-billion-parameter model behave, in cost, like a model a fraction of its size.Rockwell tied it back to something a user actually hears. The new voices and dictation, he said, would have been impossible without it: the architecture lets Apple "very optimally use just the parameters for voice and speech synthesis out of a much larger model," as though it had carved a dedicated one-to-three-billion-parameter voice model out of the big one. The result, he claimed, is "the best, highest quality on-device voices that are in existence anywhere." And it runs locally at all only because of the thing Apple keeps coming back to: a model invented in-house, Apple Silicon, and end-to-end vertical integration, working as one. "This really lets us build on the strengths of Apple Silicon with a model that can dynamically scale," Federighi added. "We can drive more and more capability on device, so the road ahead is really exciting here."How is this different from on-device models like Gemini Nano?Apple put a far larger model on the phone than the small, single-purpose ones rivals run there, and made it the front of one system rather than a helper off to the side. The obvious comparison is Gemini Nano, Google's on-device model. Nano is a small language model, distilled and compressed down from the full Gemini, sitting in the low single-digit billions of parameters depending on the version and the handset. It handles the local jobs that suit a small engine: summarising a recording, drafting a reply, flagging a scam, all of it offline. The heavier thinking still travels to Google's cloud. Nano is the frugal city runabout, ideal for short hops, with the motorway miles sent elsewhere.Apple's on-device story is bigger, and it is joined up. AFM Core Advanced carries twenty billion parameters and fires only one to four billion on any request, through the sparse, request-level routing described earlier. That is more capacity living on the phone than a small dense model offers, at close to a small model's running cost. The large engine with cylinder deactivation, rather than a smaller engine altogether. And it is not a standalone offline mode with a cloud button bolted alongside it. The system orchestrator decides, request by request, whether the phone answers or Private Cloud Compute does, so the on-device model is one end of a single continuum rather than a separate tool. Both camps lean on the same core trick, since Nano is distilled from Gemini and Apple's models are refined from Gemini's outputs. They part ways on the architecture, on how the parameters are routed, and on how tightly the local model is sewn into the system. By at least one reckoning after the keynote, Apple runs its intelligence on less device memory than Google's 2026 on-device bar demands. The practical result for a user is simple. More can happen privately on the device, and the moment it cannot, the handoff is a decision the system makes without being asked.How is your data private when it runs on Google's servers?By a security design that treats Google's data centre as hostile ground and works it like a field agent, under its own locks, its own clearance, and a self-destruct. Marino-Mess built the case from the original Private Cloud Compute, the system Apple shipped in 2024 to carry the iPhone's privacy promise into the cloud. Its properties, he said, were absolute and checkable: "your data only goes up with servers, with that operation, nothing is logged, Apple does not have access to any of your data, and all those claims are really by technical design and verifiable by researchers and have been verified."The hard problem was extending that onto hardware Apple does not own. To run AFM Cloud Pro on Nvidia GPUs in Google's cloud, Apple assembled a stack of safeguards — Nvidia confidential compute, Intel TDX, Google Titan — and then added its own. The detail that should reassure a sceptic is the two-key vault. "We have two different hardware roots of trust from two different vendors, Google and Intel," Marino-Mess said, "so in the event one of them was ever compromised, the integrity of PCC is protected." It is the dual-key principle from a missile silo, where no single hand can turn the system on alone. Apple alone decides what software runs on those borrowed nodes — "we, and only we, can deploy software to these nodes that are running in Google's cloud" — and the phones enforce it from the other end, refusing to speak to anything that is not signed by Apple. "Even though that software is running in third-party cloud," he said, "Apple devices will only talk to authentic Apple code running in Private Cloud Compute." The agent operates on foreign soil. It answers only to its own service, and carries only its own equipment.Then there is the self-destruct, and it is where the room laughed. Federighi described the minimisation of data sent to the cloud as, in the end, a courtesy rather than the real protection, "because PCC itself by design from the ground up is going to vaporize any record of that data the moment after it answers your question." Rockwell pressed the point further: "even when it's up there, we can't even look in on what it's doing. Apple has no ability to go look in on it." Federighi, enjoying himself: "We tried to open the case even just a little bit, the thing throws a breaker and everything... can't even debug them. Hopefully that won't turn into a problem." The honest counter, voiced by some analysts after the keynote, is that "we cannot see inside it either" is a promise of opacity as much as privacy, and that the safest data is the data that never leaves the phone at all. Apple's reply is the architecture itself, checkable by outside researchers, sealed by two independent vendors, signed end to end. And the wager underneath it: that a verifiable black box beats a logged-in server every time.What can the new Siri do, and how does one request flow?It can reach across everything on your device, act on what it finds, and carry a conversation forward. Rather than describe it, Rockwell showed it. "What did everyone say they're bringing to the potluck?" he asked, and Siri answered in a markedly more natural voice, naming the watermelon and feta skewers and the zesty summer pasta, having combed his messages, mail, notes and photos for the relevant scraps. He followed up: "What drinks would pair well with that?" It reasoned over the dishes and suggested a crisp sauvignon blanc, then credited its sources, one of which, a site called cookingwithmike. com, drew a grin from Rockwell, who clarified it was a different Mike's website.The walkthrough underneath the demo is the part developers will study, and it is best read as a sequence.StepWhat happensWhere1Speech recognised and converted to textOn device2System orchestrator builds the prompt and decides the routeOn device3Model determines the task needs a personal searchPrivate Cloud Compute4The search runs across all your contentOn device5Only the handful of relevant messages travel up to be reasoned overDevice to PCC6The answer returns and the voice is synthesisedPCC, then on device"You may have hundreds of thousands of messages on your device," Rockwell said, and the search through them happens locally, "and only those couple of messages or just a handful were sent to the server to reason over, and again, they're not stored there." On the follow-up about drinks, the orchestrator sent the open web only the question it needed — what pairs with these two dishes — carrying "no information about who sent it, when they sent it." This is the cylinder-deactivation logic again, now applied to your private life. The full library stays on the phone. Only the slice the answer requires ever moves.Marino-Mess showed the other half, on-screen awareness, by pointing Siri at a photograph and asking, "Why are the clouds like this and where can I see them in the Bay?" Siri read the image, identified a marine layer inversion, and explained it. Then he simply said, "Create a note with a day plan for a trip to Mount Tam," and it did. On-screen context, he explained, takes in text, application interfaces — "labels, buttons, maybe it's got some graphs and charts" — and images, so a user can ask "Am I free that day?" about a concert mentioned in a message without ever spelling out the date. "I don't need to provide the full context because Siri is getting it from what's on screen." And because it is the same Siri across iPhone, iPad, Mac, Apple Watch, Apple Vision Pro, CarPlay and AirPods, the behaviour travels with the user. "It's the same Siri across all of this," Rockwell said, "so you get a common experience."Why build a separate Siri app at all?Because the most natural way for a person to return to a conversation on Apple's platforms is an app on the home screen. One questioner caught Apple in an apparent reversal. A year ago Federighi and Ternus had suggested Apple saw no need for a bolt-on chatbot, and here was a dedicated Siri app. Federighi drew the distinction carefully. "We see Siri not as a separate chatbot, an unintegrated place you go and chit-chat," he said, "but rather as an integral, but conversational tool that you use in the moment" — living inside the document you are proofreading, aware of what is on your screen, rather than walled off in its own room. The app exists for one job the system experience alone could not do well: getting back to a conversation you want to continue or reference. "The most natural affordance for any user to go find something like that is to have an app that they can manage on their home screen," he said, and so the Siri app "just re-embodies those capabilities of that core system experience." The chatbot Apple declined to build is the standalone destination. The app it did build is a doorway back into the assistant that is already everywhere.What does it open up for developers?The same plumbing Apple built for Siri is being handed to the people who build for Apple. Federighi was specific. The Spotlight semantic index that let Siri find Rockwell's messages is open to third-party developers, who can index their own app's content and have it surface in Siri even when their app is closed, pulling the user back in. The actions Siri took in Apple's Notes could just as easily run through a rival; "this could have been Bear Notes or some other notes app," he said, and the messaging search "could have been searching from a different messaging app." On the model layer, the Foundation Models framework, a Swift interface to Apple's on-device model, gained image input and what Apple calls skills this year, and the same interface now extends out to server models, "running locally on a Mac Studio, or a server model running in anyone's cloud." A new Core AI framework lets a developer bring their own or an open-source model onto the device with neural engine and GPU acceleration. And a new Xcode arrived with its agentic coding tools sharply upgraded.For India, two of these matter more than the marquee features. On-device models mean Siri's lighter work runs without a network round-trip, which is the difference between an assistant that feels instant and one that stalls on a patchy connection. And the open Spotlight and App Intents hooks give India's large developer base a route to plug local apps — payments, travel, commerce — straight into the assistant, in a market where the apps people live in are often homegrown rather than imported.Are we finally in the age of agents?Not in the consumer sense, on Apple's own telling, though the foundation is now laid for it. The question came in capital letters from the audience — "We didn't hear much about AGENTS" — and Federighi handled it without overselling. Apple's platforms, he noted, are popular ground for people standing up their own agentic systems, a Mac mini or a Mac Studio serving as the computational base. Then he turned to Rockwell, who offered the cleanest definition of the day: "An agent is something that is operating on a loop of information coming in, making decisions, and then taking action." Siri's, Rockwell said, "is primarily request-based today, but the underpinning architecture for Siri is a completely modern architecture, and so our ability to extend in the future is very strong."Federighi was franker still about how green the whole field is. People had taken agentic coding tools, genuinely useful to developers, "and tried to sort of put those in a consumer context and hope something useful came out of that," he said. "It's very early days in getting to those kinds of helpful, long-horizon agentic tasks, but we're all building on agentic architectures at this point." The unsolved part, he argued, is the experience: an agent "that the user can understand, can control, that they find helpful, and that's safe," which he flatly called a set of "to-dos for the industry." It was a rare thing to hear from an Apple stage during a launch — the admission that the most hyped frontier in the business is still a blueprint, not a building.How does the new Siri differ from ChatGPT, Gemini, Claude and Copilot?Nearly every rival runs its intelligence primarily in the cloud, and most of them live as a destination you visit. Apple's runs on the device first and lives inside whatever you are already doing. That is the one-line difference, and the table fills it in.AssistantWhere the intelligence runsHow you reach itBusiness modelOn-device modelApple SiriOn the device first, with Private Cloud Compute for the restWoven across the system and your devicesHardware sales, with no advertisingAFM Core Advanced, a 20-billion-parameter sparse modelGoogle GeminiCloud-first on Google's servers, with Nano on the deviceThe Gemini app and across AndroidAdvertisingGemini NanoOpenAI ChatGPTCloud onlyIts app and its own browserSubscriptions and API accessNone of noteAnthropic ClaudeCloud onlyIts app and an APISubscriptions and API accessNone of noteMicrosoft CopilotCloud-firstAcross Windows and OfficeSoftware and enterprise licencesLimitedRead across the rows and the pattern is plain. ChatGPT, with its roughly one billion weekly users and its own agentic browser, is a place you go to. You open the app, open the tab, start a conversation in a world of its own. Gemini is woven into Android and carries the local slice on Nano, but its real intelligence runs on Google's servers, and the company beneath it sells advertising, a business that runs on knowing you. Claude reaches users and developers through an app and an API with a pronounced emphasis on safety, and like the others it is a destination rather than a layer of the operating system, with no consumer on-device model. Copilot comes closest to Apple's ambition, since Microsoft has wired it through Windows and Office, yet it too is cloud-first, built largely on OpenAI's models with Microsoft adding its own, and pointed hard at the enterprise.Apple's differences gather into one stance. The assistant runs on the device first, and the cloud half is Private Cloud Compute — stateless, unreadable even to Apple — rather than an ordinary logged server. No advertising business pulls at your data. It is sewn across a fleet of devices rather than parked in a single app. And the models are Apple's own, distilled rather than borrowed whole. The trade is real and worth stating plainly. A cloud-first rival can wield the largest, smartest model going, unbothered by a phone's memory or battery, and on raw model horsepower Apple is unlikely to lead. Apple's wager is that for the things people do all day — the quick question, the message, the glance at the screen — privacy, integration and on-device immediacy beat raw size. That wager is the whole strategy, and it is the one the next section weighs.The no-ads promise, and what Apple is really sellingThe deepest answer to why any of this was built the way it was came on a question about personalisation. If a more personal assistant means a more useful one, where does the personal data live, and who profits from it? Marino-Mess rooted the answer in the device: your content stays on the phone, you control it, and only the moment's necessary slice goes up. Rockwell put the control plainly — "the data is stored on your device, right? So you have control over that... we are not able to get access to it." And Federighi drew the line that separates Apple's model from the advertising houses it competes with. "You're not going to see an ad from us coming based on, 'Hey, we noticed you like Thai food, Mike, here's some Thai food,'" he said. Ask Siri where to eat, and it might weigh your tastes "in service on device of enriching the request you made, using our information on your behalf" — but never to sell you to a third party. "You stay completely in control of your information at all times."That is the moat, and it is worth seeing it for what it is. Apple arrived at the assistant war late, after OpenAI's ChatGPT, after Google folded its Assistant into Gemini, after Amazon rebuilt Alexa for the model era. It could not win on being first. It has chosen not to win on having the single cleverest model. It is betting instead on an architecture only a vertically integrated company could assemble. Models distilled to run on its own silicon. A system orchestrator that keeps your life on your phone. A cloud that vaporises what it touches. A privacy promise it was willing to carry onto a rival's hardware rather than break. The pre-keynote story was that Apple, cornered, outsourced Siri's brain to Google and Nvidia. The story the executives told in that room is that Apple borrowed a teacher and some horsepower, and kept the brain, the boundary and the control for itself. Both can be partly true at once, and the months ahead, when developers and researchers test every claim against the shipping beta, will decide which one history records. What is settled already is the bet itself: that in an industry racing to make AI know everything about everyone, the company that built its fortune on the opposite instinct has decided privacy is the product, and wired an entire assistant to prove it.Frequently Asked QuestionsDoes the new Siri run on Google's Gemini?No, in the way most people mean it. Apple uses none of Google's deployed assistant — not the Gemini app, models, infrastructure or search. Apple's own third-generation Foundation Models were refined using outputs from Gemini frontier models, a process called distillation, where Gemini acts as a teacher and Apple's smaller models learn to match it.What is Private Cloud Compute, and is it really private?Private Cloud Compute is Apple's system for handling AI requests that are too heavy for the phone, designed so that data is never stored, never accessible to Apple, and processed only for the moment of the request. Apple says these properties are verifiable by outside researchers, and for its most demanding model it extended PCC onto Nvidia GPUs in Google's cloud using two independent hardware roots of trust and Apple-signed software.What is AFM Core Advanced and a sparse architecture?AFM Core Advanced is Apple's new on-device model, with 20 billion parameters but using only one to four billion on any request. A sparse architecture activates a slice of the model per task rather than the whole thing, like an engine that runs on fewer cylinders to save fuel, which is what lets a large model run on a phone's limited memory and battery.What did Nvidia provide?Nvidia's GPUs, located in Google's cloud, run Apple's most capable model, AFM Cloud Pro, when a request needs more reasoning power than the phone or Apple's own servers supply. Nvidia's confidential compute technology, which encrypts data while it is being processed, is part of how Apple says the privacy guarantee holds on that borrowed hardware.Can Apple or Google see my Siri data?Apple says it cannot, by design — engineers cannot attach a debugger or look inside Private Cloud Compute while it runs, and the data is discarded immediately after a request is answered. Google's role is to host the hardware for the heaviest model; Apple controls the software on those nodes and the phones speak only to Apple-signed code, so Google should not be able to link requests to individual users.When and where can I get the new Siri?The new Siri AI is available to developers in beta now and is expected to reach the public as a gated beta later this year, in English first, aligning with the general release of iOS 27 around September. India is in line to receive it for English-set devices, while it is held back from China entirely and from iPhone and iPad in the European Union at launch.end of article

Inside Apple's Ground-Up Siri Rebuild at WWDC 2026 With Google's Gemini and Nvidia

Other newsrooms on this story

Related reading

Apple unveils redesigned Siri AI with Gemini technology, beta launch later this…

Apple finally ships its AI do-over: Siri AI, a standalone app, and a three-tier…

Apple rebuilds Siri on Google AI and Nvidia chips at WWDC

WWDC 2026: What To Expect As Apple Hands Siri's Brains To Google

Apple’s New Siri AI Is Ready to Get Personal

Apple to use NVIDIA's Blackwell chips for revamped Siri