This article is part of our coverage of the latest in AI research.
Researchers at Nvidia have developed a new approach to train large language models (LLMs) in 4-bit format while preserving their stability and accuracy. The new technique, called NVFP4, makes it possible to train quantized models that match the performance of larger 8-bit models at half the memory and a fraction of the compute costs.
The success of NVFP4 shows a path toward cutting the costs of AI by running leaner models that match the performance of larger ones. It can also pave the way for a future where the costs of training LLMs will drop to a point where training custom models becomes more accessible.
The challenge of quantization
At their core, neural networks are massive mathematical functions that process information by performing calculations on large matrices of numbers, or tensors. The precision of these numbers (the number of bits used to store each one) directly affects the model’s performance and resource requirements.






