Of the world’s most powerful supercomputers, nine of the top 10 are powered by GPUs, but that might not be the case for much longer. As chipmakers like Nvidia prioritize AI FLOPS over the ultra-precise floating point calculations used in scientific computing, US National Labs are turning to new chip architectures to get their FP64 fix.Among the candidates is NextSilicon’s Maverick-2, a dataflow processor designed explicitly with the 64-bit floating point mathematics that dominate the Department of Energy’s most important simulations.

Despite its name, the Department of Energy is concerned with far more than the US’ power grid. It operates some of the largest publicly known supercomputers in the world, which are responsible for everything from simulating the physics of nuclear weapons at the moment of criticality and bioweapons defense to public health and safety.

Since the Titan Supercomputer made its debut in 2012, a growing number of these supercomputers have been powered by GPUs from Nvidia, and more recently AMD.But that’s not the case for Sandia National Laboratory’s new Spectra supercomputer, which was built in collaboration with Penguin Solutions and NextSilicon.Compared to exascale systems like Frontier or El Capitan, Spectra is tiny. The machine counts 64 nodes and 128 of NextSilicon’s “runtime-configurable” accelerators.But scale isn’t the point. Spectra is a test bed for NextSilicon’s Maverick-2. This week, Sandia gave the chips the thumbs up, announcing that the big iron had met all of its system acceptance requirements, opening the door for the chips to be deployed in larger systems in the future.Not another GPUDespite some similarities to Nvidia’s B200, Maverick-2 is a very different beast. Instead of the standard von Neumann compute architecture that underpins most CPUs and GPUs today, NextSilicon’s chips employ a reconfigurable dataflow architecture.The processor’s two compute dies comprise a grid of arithmetic logic units interconnected in a graph. Each unit is configured at runtime to perform a specific operation, whether it be addition, multiplication, or some other logic operation. But the chip’s real trick is overlapping data flow and compute. As soon as data reaches the next unit in the pipeline, it’s computed immediately, no waiting for load-store operations to shuffle data around.