FORWARD-LOOKING: Apple's push to make Siri more capable is starting to look less like a purely in-house effort and more like a concession to the realities of modern AI. To close the gap, the company is expected to split Siri's workload between on-device processing and the cloud, including Google's Gemini models.

Apple has spent years emphasizing the privacy benefits of keeping computation on-device. Its custom silicon, including the Neural Engine, has been steadily tuned for machine learning workloads. But even with those gains, phones remain limited by memory and processing ceilings. The largest AI models now operate at a scale that simply doesn't fit within those constraints.

Smaller models designed for local use can help, but they come with trade-offs. On-device systems typically run with only a few billion parameters and are often compressed using techniques like quantization to improve speed and efficiency. That makes them usable on a phone, but it also reduces accuracy and depth. In practice, they tend to feel less capable than their cloud-based counterparts, especially in open-ended conversations.

That gap is part of what Apple is now trying to bridge.

After striking a deal with Google, Apple reportedly began working on distilling Gemini's larger models into smaller versions that could run on the iPhone. Distillation allows a compact model to mimic the behavior of a much larger one, capturing useful patterns without the full computational load. It's a way to bring some level of advanced AI onto the device, even if it cannot match the original model's performance.