Exclusive: Mindbeam touts dramatic performance improvements in CPU-based AI inference

Two-year-old startup Mindbeam AI Inc. today released an open-source artificial intelligence inference framework designed to make large language models run more efficiently on standard consumer processors, a move the company says could reduce reliance on expensive graphics processing units for some AI workloads.

Litespark-Inference is a software library that enables ternary large language models to run on central processing units from Apple Inc., Intel Corp., Advanced Micro Devices Inc. and Arm Holdings plc with significantly improved performance compared with conventional CPU-based inference. The company published benchmarks showing that the framework delivers throughput improvements ranging from 17- to 96-fold over standard PyTorch implementations while reducing memory requirements by more than 80%.

Mindbeam, whose Litespark LLM pretraining frameworks accelerate training and inference workloads for generative AI applications, focuses on a class of neural networks known as ternary models. Those constrain weights to three values: -1, 0 and +1, thereby drastically reducing the overhead of large multiplication operations normally required during inference, although at the loss of some precision.

Exclusive: Mindbeam touts dramatic performance improvements in CPU-based AI inference - SiliconANGLE

Exclusive: Mindbeam touts dramatic performance improvements in CPU-based AI inference - SiliconANGLE

Other newsrooms on this story

Related reading

New SemiAnalysis InferenceX Data Shows NVIDIA Blackwell Ultra Delivers up to…

Why On-Device AI Is Quietly Winning Over Cloud Inference — Three Reasons You…

Together AI delivers fastest inference for the top open-source models

Together AI Delivers Top Speeds for DeepSeek-R1-0528 Inference on NVIDIA…

[AINews] The Inference Inflection

Cerebras says its chips run a trillion-parameter AI model nearly 7 times faster…