For ninety minutes on the morning of the first of June 2026, in a side room at the Humble House on Taipei's Songshou Road - a half-kilometre east of the Computex floor, a half-world from the data centres he is about to remake - Anil Nanduri made the most honest CPU-versus-GPU argument the industry has heard in a year. Nanduri is Vice President at Intel, head of the AI Acceleration Office, twenty-five-plus-year company veteran, lifelong cricket fan, the kind of executive who pulls out his own chair for a late-arriving correspondent rather than waving over a press handler. His thesis, delivered the same morning Intel filed its Xeon 6+ on Intel 18A announcement from the same hotel, is this: as inference becomes agentic, the binding constraint on a real-world data centre shifts from raw GPU throughput to CPU orchestration, memory access and software flexibility, and the silicon Intel just disclosed is the expression of that argument. The constraints driving customers toward hybrid CPU-led inference are, in Nanduri's telling, doing more for Intel's positioning than any marketing pitch could. The pitch is, in other words, writing itself.Key Takeaways • Anil Nanduri, Vice President at Intel leading the AI Acceleration Office and the AI Business Unit's product, GTM and customer-success function, used the Computex 2026 Humble House round table to argue that agentic AI shifts the binding constraint in the data centre from GPU throughput to CPU orchestration, memory access and software flexibility.• The Intel Xeon 6+ processor on Intel's own 18A process node - the first data-centre CPU on 18A, with up to 288 Efficient-cores, 12-channel DDR5 and 96 lanes of PCIe Gen 5 - is the silicon Nanduri positions as the host platform for the agentic workload pattern.• Intel's Arc Pro B70 workstation GPU, with 32 GB of VRAM and a four-card scaling configuration, can run a 120-billion-parameter model locally. Nanduri cited a real example where a customer's $3,000-per-month cloud-token bill paid off a roughly $5,000 workstation in two months.• Intel's 800 Series Ethernet portfolio scales from 10GbE to 200GbE in the E835 family, with 400GbE and 800GbE solutions positioned as the cost-and-latency premium tier against NVIDIA's NVLink and ConnectX networking stack - different price-performance points for different segments.• Nanduri's strategic thesis in one line: when supply and cost constraints force customers to extract more from what they already have, the demand for orchestration efficiency, hybrid CPU-led inference and small-language-model deployments rises naturally - without Intel having to argue for it.The Cricket Fan The first thing Nanduri said at the round table had nothing to do with silicon. He had not watched the IPL final the previous night. He had not needed to. Gujarat Titans had batted first; Nanduri had glanced at the first-innings total before turning in; and in the practised, half-bored way of a cricket man who has been reading scoreboards since the Doordarshan years, he had known. The Titans had not put enough on the board. The Royal Challengers would chase it. He went to bed irritated, because Anil Nanduri is not a Royal Challengers Bengaluru fan, and the next morning's coverage was about to confirm what he had calmly resented through the night. By half-past nine in Taipei, the trophy had been lifted, the highlights were already cut for the morning shows, and his mood was not improving. The favourite team in the room was Rajasthan Royals - eliminated by the same Gujarat Titans in the semi-finals, still raw. Nanduri filed the news with a small headshake. Then he leaned forward. “What's that guy, the kid? He's good,” he asked. Vaibhav Suryavanshi. The fourteen-year-old who had spent the IPL season turning bowlers into footnotes. Nanduri nodded the way Indian cricket fans nod when a teenager has done something the senior pros could not - half-pride, half-disbelief, the small unconscious tilt of the head that the BCCI selection committee would be wise to read. Then he caught himself, glanced at the recorders, and laughed. The room laughed with him. The recorders kept rolling. He has been at Intel since before the data centre was a separate business segment with its own profit and loss to defend. Twenty-five-plus years. The warmth was earned, not performed. The CPU's Quiet Return - Why Agentic AI Rewires the MathThe CPU comes back because agentic AI workloads stop looking like training jobs and start looking like an iterative software-engineering loop, and that loop runs on the CPU, not the GPU.Nanduri walked the room through the mechanics directly, using coding as the running example.“The AI model is a thinker, and it gives me - a simple example is coding, and you create an application,” he said. “But then the code application just doesn't get generated completely. It's an iterative process. You code, you get an output, and then you have to test it. And the testing happens on CPUs. You compile, test, verify, execute, and then you find out that the code did not work. You get a bunch of error logs. You take the error logs, feed them back to the model, and the model then says, 'Okay, oh, never mind, I made a mistake. Here's a fix, here's a change.' And then this updated code then goes back, again you test it, verify it. And so this iterative process goes on, and then you pass that code. And just - it's module after module, and then you have a UI, you have an application, you have a use case.”The thinker generates the artefact. The compiler runs the artefact. The verifier checks the artefact. The orchestration layer holds the state between iterations. Every stage of that loop that is not generation runs on x86 silicon - on a CPU. As agents get bigger and more complex, Nanduri said, the stress migrates to layers the GPU was never asked to handle.“Now, as these agents get bigger and more complex, and you are now bringing in security, you're bringing in capabilities for, you know, stressing the memory because your memory is now going to spill over,” he explained. “You need to maintain context or your history, so you need your KV cache stored. So you're going to have storage requirements. So the data center is now being stressed at different levels and layers as you're thinking about agent performance.”KV cache. Context history. Security primitives. Storage spillover. None of these are GPU-shaped problems. All of them sit on the CPU side of the rack. The ratio of CPU to GPU in a modern agentic deployment is therefore moving in a direction the Xeon scoreboard reads as a structural tailwind.The CPU-to-GPU ratio in an agentic data centre is getting more and more at parity - which means every GPU shipped pulls more Xeon silicon along with it.The point lands harder when Nanduri broke down what is happening inside the GPU itself. Inference splits into two phases - prefill and decode, and they stress the silicon differently.“Prefill is computationally - you need a lot of matrix flops. And then the decode needs a lot of memory bandwidth,” he said. “You still need compute flops, but you're spending a lot more time getting data back from memory. So you're stressing the GPU system very differently. And again, back to cost efficiency - you're paying for compute where half the GPU is not being used when you're doing decode, and the other half of the memory and bandwidth is not being used if you're doing prefill.”A GPU paid for at premium rates and used at half-capacity is, in TCO terms, a problem. The orchestration software stack - LLMD, SGLang, NVIDIA's Dynamo - is being built to route prefill and decode to the right silicon at the right cost. Specialised accelerators like Groq and SambaNova add a third class to the routing decision. The orchestration runs on the CPU. Every routing call, every workload assignment, every cost-aware scheduling decision - Xeon silicon, in Nanduri's framing, is doing the work the slide decks credit to the GPU. Xeon 6+ On Intel 18A - The Reunion Season Nanduri reads the Xeon 6+ on Intel 18A bundle as the reunion-season payoff for a company that has been waiting through three difficult production cycles to get its own foundry to leadership maturity, and the strategic value of owning both the silicon and the process is what makes the moment count.If Intel is the long-running American sitcom that has survived every cancellation rumour, Nanduri is the senior cast member who has been in the writers' room since the pilot episode. Pentium era. Itanium era. Atom years. Mobileye exit. Foundry losses. The Gelsinger pivot. The Tan recovery. He has seen the show ride out every season-finale crisis - and the Xeon 6+ launch is the cold-open of a season where the ensemble is back together and the cast is reading the next script properly for the first time in a while.“It's also super exciting because it gives, you know, kind of choice in the ecosystem and be able to, you know, have solutions that can get both from like the process all the way to the silicon to the open software that we can enable,” he said. “I think that's what Intel was all about, and I think good to see that.”“To see Xeon 6 Plus-like solution coming on Intel 18A process technology, you know, the maturity of our process getting back in the game, building leadership there. I think it's also super exciting,” he added.On the strategic value of running Xeon 6+ on Intel's own 18A line versus an external foundry, his framing was unsentimental - a competitive lever rather than a contingency.“We are able to get more capacity out. Generally, if you just look at the economics of that, it should be, you know, over time, most efficient - because you're able to leverage,” he explained. “But we definitely want to capture the value both of the foundry and of the product. So, ultimately, the market will decide on, you know, how that is accepted. But I think clearly there's clearly benefits to us in having to have our ability to own both sides of the supply chain and the product.”Owning both the foundry and the product is the single structural advantage Intel holds against a market where every competitor is sharing a wafer queue at the same TSMC fab.The 12-channel memory subsystem, a specific bandwidth lift that pulls Xeon 6+ ahead of the Xeon 6 baseline, got the same matter-of-fact treatment in conversation. Yes, twelve channels. Yes, it matters for memory-bandwidth-bound workloads. The detail Nanduri added is that the bandwidth question is workload-shaped rather than universally relevant.“Memory bandwidth, while, yes, as a general constraint - but the implication of it, you have to really profile the workload,” he said.The agentic workload is not one workload. A high-performance-computing simulation stresses memory bandwidth in one direction. A web-services API stresses it in another. A vector database stresses it in a third. The Xeon 6+'s bandwidth uplift, on the 12-channel design, is most consequential precisely where the agentic ecosystem is going to spend its 2026 and 2027 procurement budget - orchestration, vector lookup, KV cache management, mixed compute-and-memory loads.The Indian Reading - Pick The Eleven For The Pitch Indian buyers gain because the Xeon 6+ and the Arc Pro line make a local-language AI agent running on a CPU-plus-workstation-GPU stack viable without the premium of a top-tier data-centre GPU that may not even be available to procure, and the small-language-model trend gives that stack enough headroom to do real work.Press him on how Xeon 6+ on 12-channel memory might help an Indian company running a local-language AI agent directly on the CPU without buying an expensive and hard-to-get AI graphics card, and his answer arrives without a flinch.“There is going to be - this is why I said AI compute is not just one size fit all. It's going to be a gradient,” he said. “And depending on the models, as you can see, models are getting distilled, SLMs are coming, small language models. And so constraints are going to help people decide, especially when you can't afford or you can't get it. You're going to come back and say, 'Wait, I am not getting the best bandwidth, but I'm getting the best cost.' So you're going to start to make trade-offs.”The Indian-buyer reality - capex-conservative, multi-vendor, software-defined, lifecycle-led, is the exact procurement profile that rewards a workload-shaped choice rather than a top-of-stack default. An Indian banking customer running a Hindi-language customer-service agent does not need to put the model on an H200. It needs to put the model in the rack the bank already has, with the cooling envelope the bank already supports, on the software stack the bank's engineers already know. Xeon 6+ on 12-channel DDR5, plus an Arc Pro workstation tier for local inference, lands inside that envelope. An HBM-hungry top-tier GPU does not.“Vector DB runs really well on CPUs, so you don't need to shift that on a GPU. So, again, this is back to constraints,” Nanduri said. “You want to run the best compute for the right cost, the best application or workload for the right cost on the best compute.”Vector databases, the storage-and-retrieval layer underneath every retrieval-augmented-generation system the Indian IT services majors are deploying for their global clients, run well on CPUs. Recommendation engines run well on CPUs. A meaningful slice of the classical machine-learning estate that Indian enterprises still rely on for line inspection, fraud detection, demand forecasting and credit scoring runs well on CPUs.“And for machine learning, or even for some of recommendation engines, CPU is pretty strong,” he added. “But again, if generative AI is giving you a productivity benefit or a cost benefit, then it makes sense to migrate over. But a lot of real-world applications will still run the way they were running because why fix something that's not broken?”The Indian buyer reads the announcement the way a captain reads a turning wicket on day three of a Test match. The conditions favour patience. The right shot is the one the pitch is asking for, not the one the highlight reel rewards. Picking the eleven for the conditions - Xeon 6+ for orchestration, vector DB on CPU, SLM on Arc Pro, premium tokens to the cloud for the queries that earn them, is exactly the kind of innings selection that wins a series rather than a single over.The Hybrid AI Economics - Arc Pro B70 And The $3,000 Versus $5,000 Trade The Intel Arc Pro B70 workstation GPU, with 32 GB of VRAM and a four-card scaling configuration, lets a developer or a small enterprise run a 120-billion-parameter model locally and handle 80 per cent of their agentic workload without sending tokens to a frontier API, and the example Nanduri shared puts the payback at roughly two months for customers who had been spending $3,000 a month on cloud tokens.The Arc Pro B70 sits on the Xe architecture lineage Intel has been building since the PC client GPUs of 2020. Xe became Xe2. Xe2 became Xe3. Xe3 ships in the Panther Lake client products. Xe3P is the data-centre-capable variant being extended to the Crescent Island GPU that Intel disclosed at the same Computex round table. The workstation tier, Arc Pro B70, is the developer-and-prototyping anchor.“We've had our Xe architecture from our PCs. And we then built a workstation product called Arc Pro,” Nanduri said. “Arc Pro, which is our B70, has 32 GB of VRAM, and you can put four of them. And it's actually, you know, very well received. Where it starts to become is a workstation-like product where I can run this hybrid AI, like a Hermes-like or Open-CL-like solution, where I can run a 120-billion parameter model locally and be able to manage 80% of my agentic work locally, and then just send the premium tokens to the cloud or to a foundational model.”The economics Nanduri walked the room through were specific. A customer he knows had been running roughly $3,000 a month in cloud-token costs. A workstation with four Arc Pro B70 cards, each carrying 32 GB of VRAM, ran around $5,000 in capex. The payback period sits at roughly two months.“It pays itself off in a couple of months. And so you can now - it's no longer about how many tokens am I using, it's how is my productivity changing, how useful are those tokens,” he explained. “And I think this is where, for a lot of the basic RAG-like queries, the open-source models running locally, you may start to become good enough. And then you start to say, 'Okay, where do I need a really frontier model like API and intelligence?' You use that precious resources, and the cost aspect to it for those specific needs.”A workstation that pays off a developer's cloud-token bill in two months is not a workstation pitch - it is a procurement memo writing itself.Stack the Indian context on top: GST-inclusive landed costs on a $5,000 imported workstation arrive at roughly Rs 4.5 lakh to Rs 5 lakh. The equivalent monthly cloud-token spend at $3,000 a month, at the same dollar conversion, lands around Rs 2.5 lakh a month. The payback math is intuitively obvious to a Bengaluru fintech CFO or a Mumbai analytics-services partner watching margins compress as cloud bills compound.This is the part of the hybrid-AI story the press releases generally do not surface, because it sounds like a slow-grind enterprise pitch rather than a frontier-AI moment. It is also the part that builds Intel's installed-base advantage one developer workstation at a time.The Heterogeneous Compute Future - Ensemble Cast The agentic data centre of 2027 will look more like an ensemble cast than a star vehicle - CPU, GPU, specialised accelerator, networking, storage, all carrying weight, none able to carry the show alone - and Nanduri's reading of the next eighteen months is that the customer will increasingly assemble the cast for the workload rather than buy the whole stack from one vendor.The argument lands cleanly when laid out by compute class. Compute class Strongest at Latency profile Programming flexibility Cost profile Ecosystem maturity CPU (Xeon 6+) Orchestration, vector DB, classical ML, recommendation engines, tool calls, security context Variable, software-tunable Highest - full x86 software ecosystem Lowest per-throughput-class Most mature GPU (Crescent Island, MI400, Rubin) Prefill, training, mixed inference, model-loaded-once-queried-many Low-to-moderate, hardware-fixed High via CUDA, ROCm, OneAPI Premium Mature on the Nvidia side; building elsewhere Specialised accelerator (Groq, SambaNova) Decode at scale, deterministic-latency inference, throughput-led production deployments Lowest, deterministic Lowest — purpose-built Premium at the top SKU; cost-effective at high utilisation Emerging Networking (E835 200GbE, NVLink, ConnectX-9) Data movement between compute classes Sub-microsecond at the top tier Standards-based for Ethernet; proprietary for NVLink Tiered by performance Mature on Ethernet; proprietary lock-in on NVLink Compute classStrongest atLatency profileProgramming flexibilityCost profileEcosystem maturity CPU (Xeon 6+)Orchestration, vector DB, classical ML, recommendation engines, tool calls, security contextVariable, software-tunableHighest - full x86 software ecosystemLowest per-throughput-classMost mature GPU (Crescent Island, MI400, Rubin)Prefill, training, mixed inference, model-loaded-once-queried-manyLow-to-moderate, hardware-fixedHigh via CUDA, ROCm, OneAPIPremiumMature on the Nvidia side; building elsewhere Specialised accelerator (Groq, SambaNova)Decode at scale, deterministic-latency inference, throughput-led production deploymentsLowest, deterministicLowest - purpose-builtPremium at the top SKU; cost-effective at high utilisationEmerging Networking (E835 200GbE, NVLink, ConnectX-9)Data movement between compute classesSub-microsecond at the top tierStandards-based for Ethernet; proprietary for NVLinkTiered by performanceMature on Ethernet; proprietary lock-in on NVLinkThe reason the table matters is the argument it makes implicitly: no single column wins the rack. The orchestration software decides which workload lands in which column, and the customer pays for the right mix rather than the same column repeated.“You have to work in this heterogeneous computing environment. You're going to have these computational elements that are good for what they're good at. You need to have an ecosystem that can integrate that,” Nanduri said.“If you want super premium tokens and you want low latency, you need the highest-end GPU and you need an accelerator, whether it's a Groq or a SambaNova. You need to put them together to make that throughput at scale come in. If you want flexibility of programming, bursty workloads, then it's more like a GPU programming, general purpose flexibility. So I don't think it's one size fit all, and you'll see there's going to be mix and match,” he explained.The sitcom analogue is durable. The shows that survive - Frasier across two networks, Friends across a decade, The Big Bang Theory across twelve seasons, Scrubs across cancellations and revivals - are the ones where the ensemble carries the weight together. The episode where one character tries to handle every arc is the episode the writers regret. The compute analogue is the data centre where the GPU is asked to do prefill, decode, orchestration, security context and tool calls - the episode the operations team regrets when the per-token cost arrives.The agentic data centre that works in 2027 is the data centre that lets each compute class play its scene and leave when the scene is done.The networking layer Intel pitched at the same Humble House round table - the E835 at up to 200GbE, the broader 800 Series Ethernet portfolio running 400GbE and 800GbE at the cost-and-latency premium tier - is the staging that lets the ensemble work. Nanduri's read of the networking choice was as unsentimental as the rest of the conversation. NVLink is proprietary. ConnectX is the Ethernet alternative. The market for both will fragment along the latency-and-cost axis, because the customer base has fragmented along that axis already.Constraints Make You Creative - Nanduri's Strategic ThesisConstraints will do more for Intel's positioning than any marketing pitch could, because when customers cannot get the compute they want at the cost they want, they get creative - and the creative solution looks more like Intel's heterogeneous, hybrid, CPU-led architecture than the GPU-only narrative that dominated 2024 and 2025.The single most lift-able strategic line of the morning was Nanduri's matter-of-fact framing of how the AI procurement conversation is shifting.“Constraints do that to the best of us. When you are constrained, you get creative,” he said.The cricket analogue lands cleanly. Pressure-cricket - the last over, the chase tightening, the field set to deny conventional shots - is when the ramp, the scoop, the reverse-sweep, the inside-out drive get invented. The shots that look impossible in a net session become the shots the captain calls because the conditions stop allowing anything else. AB de Villiers did not invent 360-degree batting in a coaching manual. He invented it because the bowling, the field and the over-count left him nowhere conventional to score.The AI procurement conversation in 2026 is in exactly that over. The premium GPU is the conventional cover drive. The cost spike on cloud tokens, the supply queue on top-tier silicon, the cooling envelope the data centre cannot exceed, the budget the CFO has tightened - these are the field placements the captain is reading. The unconventional shot is the hybrid stack. CPU for orchestration. Specialised accelerator for decode-at-scale. Workstation-class GPU for local inference. Vector DB on Xeon silicon. Premium frontier tokens only for the queries that earn the premium.“Up until now, people were just moving fast. By the way, and it's no different from any new technology. When you're trying to get excited, move fast, you say, 'Okay, throw more compute, throw more memory, throw more storage.' Cost was not the criteria,” Nanduri said. “But now, suddenly, 'Oh wait, first, prices have gone up. Oh, but even if I could pay, I'm not getting it.' So, okay, so what do you do? And so this will then bring in like, okay, the hybrid approach, or can I run some models on the CPU and leverage what I have?”On NeoClouds - whether the cost-led GPU-rental specialists were a growing share of the AI compute market - his answer arrived in the same vein. Yes, growing. Yes, primarily a cost play. Yes, the TCO conversation is reshaping who sells compute to whom.“NeoClouds started to form to just make that simpler stack easily available. It's primarily a cost play. They're much more cost-effective than getting compute at a hyperscaler,” he added. “But I think this is going to evolve. NeoClouds are trying to figure out how do you go up the stack. Hyperscalers are looking at how do I cater to the broader ecosystem. And these hybrid models are coming in and saying, 'How do I manage my costs?' So I think this is just a - it'll be a TCO cost-performance game.”The shift from 'throw more compute at it' to 'extract more from what I have' is the macro-thesis Intel's product line has been waiting for the market to catch up to.The Roadmap Nanduri Held Back - And The Hot Chips Window Ahead Nanduri held back on the Xeon 7 Diamond Rapids confirmation that landed at the same Computex 2026 morning, gave no date on the Crescent Island data-centre GPU's customer ship window, and committed to no specific Hermes pairing with the Arc Pro line - because the Computex round table was scoped to Xeon 6+ on Intel 18A, and the deeper architectural detail across the broader 2027 roadmap is teed up for Hot Chips at Stanford in late summer 2026.His sidestep on the roadmap was direct and gracious.“At this point today, we are primarily focusing on the products that are coming out this year. And we are not speaking about any of our roadmap today because the highlight of today is about Xeon 6 Plus. So as we go forward, we'll share more about our roadmap and our capabilities,” he said.What he did surface, repeatedly, was the orchestration-software layer as the next site of competitive innovation.“That's the orchestration software innovation. I think that's where some of the things that are coming. Whether it's NVIDIA's Dynamo, LLMD, SGLang, I think that's where the innovation is happening. Next year, you'll see a lot more of this. This is where, in fact, software that says, 'I will run your infrastructure most efficiently' - I think that has a lot of opportunity in this market as we look at this constraint-driven problem,” he explained.The forward-look anchor for the next twelve months sits there. Orchestration software that routes prefill and decode and tool calls and vector lookups and recommendation queries to the right compute class at the right cost - that is the layer where the agentic data centre is going to be won and lost in 2027. Intel's positioning in that layer rides on x86 ecosystem breadth, Xeon 6+ telemetry and energy reporting, the hybrid Arc Pro local-inference tier, and the partnerships with SambaNova, NVIDIA DGX hosts and Google Cloud that the broader Intel Computex week put on the slide.But the close of the conversation pulled back from the silicon roadmap into a register the rest of the round table had been waiting for.“We all wanted to be software engineers, and then you're like - but I always loved hardware because you could touch it, you could feel it. It's a great time to be - to see the renaissance of the hardware innovation. So for personally, that's very fulfilling,” he said.Twenty-five-plus years at Intel showed in that line. Hardware as something you touch. Hardware as something you can hold the responsibility for. Hardware as the renaissance Intel was waiting to see again. Then the philosophical close.“It all breaks down to first principles of computing. There is - everything is math. CPUs have been like calculators, scalar, it's sequential. GPUs have been about parallel processing. And in fact, it started with rendering, so you could render full screens, and then it became matrix multiplication, which is AI. And then the next future will be spatial, which is when becomes more important when physical AI comes in,” Nanduri said.Four math paradigms in one breath. Scalar. Parallel. Matrix. Spatial. CPUs handle the first. GPUs handle the second and third. Neuromorphic and quantum, eventually, the fourth and whatever comes after. Each is a compute class with a specific math underneath it. The argument cuts against the GPU-or-nothing narrative the AI investor class has been telling itself since 2023 - because no single class of math has ever been able to do all the math.And then the line that the entire room paused on. The line Nanduri delivered with the calm of a man who has been thinking about it for two decades.“Our brain runs at 20 W of power. It's one of the most efficient computational engines. And here we're putting hundreds of megawatts of power and yet not reaching that level of efficiency. So how do you recreate that is still, I think, the innovation yet to be done. The forefront of biology and computing and how do you mimic that, you know, still has a long way. We as humans haven't figured it out yet, is how I see it,” he said.Twenty watts. The whole human brain. Hundreds of megawatts at hyperscale. Both running their own versions of intelligence. Only one of them runs the lights, the speech, the dream, the joke, the cricket fan's instinctive read of a fourteen-year-old kid's footwork against a quick. The frontier sitting beyond every Xeon-versus-GPU procurement decision in 2026 is the same frontier Nanduri walked into the room with - the gap between what the rack consumes and what the brain achieves, still measured in three orders of magnitude, still unsolved.Twenty watts.Frequently Asked Questions Who is Anil Nanduri and what is his role at Intel? Anil Nanduri is Vice President at Intel, leading the AI Acceleration Office and serving as part of the leadership of Intel's AI Business Unit with responsibility for AI product management, go-to-market and customer-success functions. He has been at Intel for more than two decades, with prior responsibilities spanning client computing, accelerator products and Intel's broader AI strategy. He is a CES 2026 listed speaker and a regular voice on Intel's open-AI and Gaudi ecosystem direction.What is Intel Xeon 6+ and why is it strategic for agentic AI? Intel Xeon 6+ is the next-generation Intel Xeon data-centre processor announced at Computex 2026, built on Intel's own 18A process node, with up to 288 Efficient-cores, a 12-channel DDR5 memory subsystem and 96 lanes of PCIe Gen 5 with CXL support. It is strategic for agentic AI because the workload pattern of agentic systems - iterative coding loops, tool calls, vector retrieval, orchestration, security context, KV-cache management - runs on the CPU rather than the GPU. The CPU-to-GPU ratio in agentic deployments is moving toward parity, which translates directly into Xeon volume.What is the Intel Arc Pro B70 and how does it lower AI cloud costs? The Intel Arc Pro B70 is a workstation-class GPU on Intel's Xe architecture lineage, carrying 32 GB of VRAM per card with a four-card-per-workstation scaling configuration. Used in a hybrid AI stack via Hermes-style or Open-CL-style orchestration, it can run a 120-billion-parameter model locally and handle roughly 80 per cent of an agentic workload on the workstation, sending only premium queries to frontier cloud APIs. Anil Nanduri cited a customer example where a $3,000-per-month cloud-token bill paid off a roughly $5,000 workstation in two months.Why is Intel arguing that the CPU is becoming more relevant to AI, not less? Intel argues the CPU is becoming more relevant because agentic AI shifts the binding constraint in the data centre from raw floating-point throughput to orchestration, concurrency, data movement, memory access, security context and software flexibility - all of which run on the CPU rather than the GPU. As agents become more complex, the CPU-to-GPU ratio in the rack moves toward parity, with the CPU executing the iterative software loop, holding the KV cache, routing prefill and decode to the right silicon, and managing the hybrid local-and-cloud inference economics. The Xeon 6+ on Intel 18A is the silicon Intel positions for that workload pattern.How does Intel's hybrid AI strategy compare to NVIDIA's GPU-first approach? Intel's hybrid AI strategy treats the data centre as a heterogeneous-compute ensemble - CPU for orchestration and classical workloads, GPU for prefill and training, specialised accelerators (Groq, SambaNova) for decode-at-scale, all coordinated by orchestration software like LLMD, SGLang and NVIDIA's Dynamo. NVIDIA's approach, particularly under the Rubin platform with Spectrum-6 networking and BlueField-4 DPUs, is to sell every layer of the rack as an integrated NVIDIA stack. The two strategies converge on the same point - that no single compute class wins the rack alone - but Intel's pitch foregrounds the cost flexibility of the heterogeneous architecture, while NVIDIA's foregrounds the integration economy of the unified stack.end of article
Intel: EXCLUSIVE: Anil Nanduri On Why the CPU Is Coming Back - Inside Intel's Xeon Revival at Computex
Intel's veteran AI lead opened a Computex 2026 sit-down with cricket and Vaibhav Suryavanshi, then spent the next ninety minutes making the most honest CPU-versus-GPU argument the industry has heard this year - that agentic AI is reshaping data-centre economics in a way GPU tonnage alone cannot solve, that the Xeon 6+ on Intel 18A bundle is the silicon expression of that argument, and that the constraints customers are already living with will do more for Intel's positioning than any marketing p...













