Ronchaiwan
March 20, 2025 03:29
Nvidia introduces Blackwell Ultra, a platform designed in the age of AI inference, to improve performance in training, post-training and test time scaling.
Nvidia has announced the launch of Blackwell Ultra, a new accelerated computing platform tailored to the evolving needs of AI inference. According to NVIDIA, the platform is designed to enhance the capabilities of AI systems by optimizing training, post-training and testing time scaling.
Advances in AI scaling
Over the past five years, pre-AI removal requirements have skyrocketed 50 million times, bringing significant advances. However, we are currently focusing on improving our model to improve our inference capabilities. This includes post-training scaling that utilizes domain-specific synthetic data to improve AI conversational skills and understanding of nuanced contexts.
New scaling methods have emerged, known as “test time scaling” or “long thinking.” This approach dynamically increases computational resources during AI inference, allowing for deeper inference. Unlike traditional models that generate responses in a single path, these advanced models allow you to think and refine your responses in real time and get closer to autonomous intelligence.
Blackwell Ultra Platform
The Blackwell Ultra Platform is at the heart of NVIDIA’s GB300 NVL72 system and consists of a liquefied rack-scale solution that connects 36 Nvidia Grace CPUs and 72 Blackwell Ultra GPUs. This setup forms a large GPU domain with a total NVLink bandwidth of 130 TB/s, significantly improving AI inference performance.
With up to 288 GB of HBM3E memory per GPU, Blackwell Ultra supports large AI models and complex tasks, providing improved performance and reduced latency. Its tensor core provides 1.5 times more AI computational flops than previous models, optimizing memory usage and enabling breakthroughs in AI research and real-time analytics.
Enhanced inference and networking
Nvidia’s Blackwell Ultra also features a PCIEGEN6 connection with the Nvidia ConnectX-8 800G SuperNic, increasing the network bandwidth to 800 GB/s. This increased bandwidth will increase the performance at scale supported by Nvidia Dynamo, an open source library that expands AI services and efficiently manages workloads across GPU nodes.
Dynamo’s decomposed service optimizes performance, reduces costs and improves scalability by separating the context and generation phase of large-scale language model (LLM) inference. With a total data throughput of 800 GB/s per GPU, the GB300 NVL72 is seamlessly integrated with NVIDIA’s Quantum-X800 and Spectrum-X platforms to meet the demands of modern AI factories.
Impact on AI factories
The introduction of Blackwell Ultra is expected to significantly increase AI factory output. The NVIDIA GB300 NVL72 system promises a 10x increase in throughput per user and a 5x improvement in throughput per megawatt.
This advance in AI inference will promote real-time insights, enhance predictive analytics, and improve AI agents in a variety of industries, including finance, healthcare, and e-commerce. Organizations will be able to handle larger models and workloads without compromising speed, making advanced AI capabilities more practical and accessible.
Nvidia Blackwell Ultra products are expected to be available from partners in the second half of 2025 and have support from leading cloud service providers and server manufacturers.
Image source: ShutterStock