Nvidia's NVFP4 Trains a 12B-Parameter Model 2-3x Faster Using 4-Bit Precision

Nvidia's NVFP4 format trains large AI models 2-3x faster with 50% less memory, matching FP8 accuracy on a 12B-parameter, 10T-token run.

Contents

2-3x Faster Throughput with Half the Memory of FP8
Stability Techniques Behind NVFP4's 10-Trillion-Token Training Run

Nvidia researchers have unveiled NVFP4, a new 4-bit floating-point training format that could significantly cut the cost and time of building large language models. The team successfully trained a 12-billion-parameter model on 10 trillion tokens entirely at 4-bit precision using Nvidia Blackwell GPUs - a milestone for low-precision AI training at scale.

2-3x Faster Throughput with Half the Memory of FP8

The headline numbers are hard to ignore. Compared to FP8 - currently the standard for efficient LLM training - NVFP4 delivers two to three times faster mathematical throughput while cutting memory usage by roughly 50%.

Accuracy held up well: MMLU-Pro scores came in at 62.58% for NVFP4 versus 62.62% for FP8, a gap that is essentially negligible in practice. The research also pairs NVFP4 with Nvidia's NVIDIA Launches Nemotron-3 Super 120B-Parameter Open AI Model with 1M-Token Context, underscoring the company's broader push toward large-scale, efficient AI systems.

Stability Techniques Behind NVFP4's 10-Trillion-Token Training Run

Training at 4-bit precision is notoriously tricky. Nvidia addressed this with a stack of stabilization techniques: Random Hadamard transforms, stochastic rounding, and two-dimensional scaling - all working together to prevent numerical blowups during forward and backward passes. The result is one of the largest publicly documented training runs at this precision level. If broadly adopted, NVFP4 could meaningfully reduce compute costs, energy use, and infrastructure spend for future frontier models. That aligns with growing sector investment, including the Brookfield Launches $10B AI Infrastructure Fund with Nvidia Backing, which signals how seriously the industry is betting on AI infrastructure efficiency.

The economic implications extend beyond hardware. Cheaper training unlocks a wider pool of organizations that can build competitive models - a shift that could accelerate the pace of AI development across the board. It also supports the labor market trends explored in AI Job Growth: Roles Most Exposed to Language Models Grew 93% Since ChatGPT Launch, as more accessible AI training lowers barriers for teams building with these technologies.

News Source

#AI #AI News #Nvidia's NVFP4

Eseandre Mordi E-mail

Eseandre Mordi - writer covering crypto, blockchain, and AI with a global perspective and a strong voice for women in tech.