Qualcomm unveils HBC near-memory AI architecture, claims it has broken the memory wall.

(Image credit: Qualcomm)

The so-called memory wall is a major performance limiter for many AI workloads, and high bandwidth memory (HBM) is not always a panacea since compute capability is growing faster than memory bandwidth. Qualcomm on Wednesday introduced its HBC near-memory compute architecture called high-bandwidth compute (HBC) that is designed to break the memory wall and enable the performance of certain AI workloads to scale linearly.Qualcomm's approach to near-memory compute is pretty much straightforward: the company disaggregates the AI accelerator from the system-on-chip (SoC) and puts it under the LPDDR DRAM stack. The HBC accelerator connects to the LPDDR stack using through-silicon vias to provide maximum bandwidth and capacity without using expensive HBM memory and advanced packaging. Qualcomm does not disclose the actual bandwidth HBC provides, though the company claims that it offers 6X higher bandwidth-per-watt compared to HBM and over 200X capacity compared to on-chip SRAM.

(Image credit: Qualcomm)"We have separated the AI accelerator from the XPU and placed the XPU directly beneath a DRAM stack," said Tony Pialis, Executive Vice President and General Manager of Data Center Business at Qualcomm. "This is very important because it gives us the performance advantages of SRAM with the density and capacity of stacked memory. In effect, the congestion associated with HBM is gone. The value to the industry is lower power consumption, less heat, and the elimination of the costly silicon interposer used by HBM solutions. We can also deploy multiple HBC stacks within a single compute device using standard packaging, which delivers a significant performance-per-cost advantage."Putting DRAM on logic or next to logic is nothing new. All DRAM makers have experimented with near-memory compute architectures, but have failed to make them popular. More recently, GUC, a fabless ASIC design service company, proposed its DRAM-on-Logic (DoL) technology that places one to four DRAM layers on top of logic to get around 5 TB/s of memory bandwidth and offer higher performance than some HBM3E memory subsystems without using expensive advanced packaging and HBM3E stacks.Since Qualcomm does not disclose actual performance numbers, it is hard to compare its HBC to GUC's offering. However, the biggest caveat about HBC is that Qualcomm does not tell us what the HBC accelerator actually does. In theory, it could be everything: a transformer-specific near-memory engine, a more general array of tensor cores, or some kind of preprocessing logic for AI inference or training.