Exclusive: Mindbeam touts dramatic performance improvements in CPU-based AI inference

Two-year-old startup Mindbeam AI Inc. today released an open-source artificial intelligence inference framework designed to make large language models run more efficiently on standard consumer processors, a move the company says could reduce reliance on expensive graphics processing units for some AI workloads.

Litespark-Inference is a software library that enables ternary large language models to run on central processing units from Apple Inc., Intel Corp., Advanced Micro Devices Inc. and Arm Holdings plc with significantly improved performance compared with conventional CPU-based inference. The company published benchmarks showing that the framework delivers throughput improvements ranging from 17- to 96-fold over standard PyTorch implementations while reducing memory requirements by more than 80%.

Mindbeam, whose Litespark LLM pretraining frameworks accelerate training and inference workloads for generative AI applications, focuses on a class of neural networks known as ternary models. Those constrain weights to three values: -1, 0 and +1, thereby drastically reducing the overhead of large multiplication operations normally required during inference, although at the loss of some precision.