TL;DRPerplexity AI announced a platform at Computex that dynamically routes AI inference between PCs and cloud servers in real time, acting as an “air-traffic controller” for AI tasks. The chip-agnostic system targets the cost crisis of centralised inference as Perplexity’s revenue hits $500 million.

Perplexity AI has developed a platform that dynamically splits AI workloads between personal computers and cloud servers, deciding in real time which tasks can run locally on a PC’s processor and which need the power of data centre hardware. CEO Aravind Srinivas announced the system at Computex in Taipei on Tuesday, describing it as an “air-traffic controller for AI tasks” designed to reduce the cost of inference, the process of running trained AI models to generate responses.

“You don’t want all your compute centralised in servers and everything running through the largest models,” Srinivas said in a Bloomberg Television interview. “You’re already reading reports of how people are freaking out about their cost. Some people are spending half a billion dollars per month. What you actually want is efficient value per watt per user.”

How it works

The system evaluates each AI task and routes it to the most efficient compute layer. Simple operations that modern PC processors can handle, such as summarisation, formatting, or lightweight classification, run locally without touching the cloud. More complex tasks that require large model inference, such as multi-step reasoning or retrieval-augmented generation across large datasets, get routed to cloud servers. The routing decision happens in real time, invisible to the user.