Systems

Reveals decent new homegrown accelerator and tiny production volumes

Chinese tech giant Alibaba has revealed a new accelerator and accompanying rack-scale server rig without offering much detail about their performance – and also admitted it’s only been able to make chips in trivial quantities.The new chip is called the Zhenwu M890, and comes from Alibaba’s semiconductor design business T-Head. Neither company has said much about it other than stating it includes 144GB of on-chip memory, possesses “800 GB per second of inter-chip bandwidth” and natively supports precision formats from FP32 down to FP4.The Chinese giant didn’t offer any info about performance other than to say it delivers “three times the performance of its predecessor, Zhenwu 810E.” Based on the specs of the old and new devices, we think the M890 might give Nvidia’s 2024-vintage H200 a run for its money.

That means the most interesting figure in Alibaba’s announcement is 560,000 – the number of Zhenwu chips Alibaba says T-Head has made to date.

By way of contrast, Nvidia says AWS alone will rack and stack one million of its GPUs this year. AWS’s spending on AI infrastructure is at similar levels to Microsoft, Meta, and Google, so it’s conceivable that Nvidia will make and sell three or four million GPUs to satisfy those four customers alone.Alibaba’s announcement doesn’t offer any information about production volumes for the M890.The company did talk up the machines the M890 will run inside – a new beast called the Panjiu AL128 Supernode Server Alibaba described as “a rack-scale system that packs 128 AI accelerators into a single unit and delivers petabyte-per second internal bandwidth … designed specifically for the concurrency patterns that agents generate: unpredictable, high-frequency bursts of inference requests that overwhelm conventional compute clusters.”It seems Alibaba intends racks packed with M890s plus Panjiu AL128s to handle agentic workloads.T-Head has also created a new networking chip called the “ICN Switch 1.0,” which we’re told “delivers up to 25.6 Tbps of aggregate bandwidth and enables congestion-free communication across clusters of 64 accelerators.” Those are specs that Broadcom and Nvidia reached years ago.