NVIDIA’s NCCL 2.24 increases network reliability and observability

Jog Hiller
March 14th, 2025 02:22

NVIDIA’s latest NCCL 2.24 release introduces new features to enhance multi-GPU and multi-node communication, including RAS subsystems, NIC fusion, and FP8 support, to optimize deep learning training.

The NVIDIA Collective Communications Library (NCCL) has introduced the latest version 2.24, bringing significant advances in reliability and observability of networking for multi-GPU and Martindo (MGMN) communications. As reported in the NVIDIA Developer Blog, this release is optimized specifically for NVIDIA GPUs and networking, making it an essential component for multi-GPU deep learning training.

New features for NCCL 2.24

This update includes several new features aimed at improving performance and reliability.

Reliability, Availability, and Serviceability (RAS) Subsystem User Buffer (UB) Martindo Collective Registration Optional fusion fusion to support strict enforcement of nicl_algo and nccl_proto

RAS Subsystem

The RAS subsystem is one of the outstanding additions of NCCL 2.24. It is designed to help users diagnose application issues such as crashes and hangs, especially in large-scale deployments. This low-overhead infrastructure provides a global view of the execution of applications that allow for the detection of anomalies such as unresponsive nodes and delayed processes. It works by creating a network of threads throughout the NCCL process that monitors each other’s health through regular keepalive messages.

Enhancement of user buffer registration

NCCL 2.24 introduces user buffer (UB) registration for multi-node populations, enabling more efficient data transfer and reduced GPU resource consumption. The library supports UB registration for multiple ranked aggregation networks per node and standard peer-to-peer networks, which provide significant performance improvements, especially for operations such as Allgather and Broadcast.

Nick Fusion

NCCL has adapted to optimize network communications as many NIC systems grow. New Nic Fusion features allow you to logically merge multiple NICs into a single entity, ensuring efficient use of network resources. This feature is particularly beneficial for systems with multiple NICs per GPU that address issues such as crashes and inefficient resource allocation.

Additional features and fixes

This update also introduces complete reception of options for the LL and LL128 protocols, allowing for reduced overhead and congestion. NCCL 2.24 supports Native FP8 reductions for Nvidia Hopper and new architectures, increasing processing power. Additionally, stricter enforcement of NCCL_ALGO and NCCL_Proto is implemented to ensure more accurate tuning and error handling for users.

The update also includes a variety of bug fixes and minor improvements, including various bug fixes and more efficient, including tweaking pad tuning and expanding memory allocation functions, which increases overall robustness and efficiency of the NCCL library.

Image source: ShutterStock

What's Hot

Cardano Coinbase file to launch natural gas futures contracts

Tether to make great profits from current US stability regulations

Major leadership shift at HK Asia Holdings, as Bitcoin Magazine takes the helm

RLNC Technology can enhance Web3 adoption – Professor MIT

Blockchain and Federation Learning: A New Era of AI Governance and Privacy

solana cme futures tips deduct ETF approval – exec

Cardano Coinbase file to launch natural gas futures contracts

Tether to make great profits from current US stability regulations

Major leadership shift at HK Asia Holdings, as Bitcoin Magazine takes the helm

Cardano Coinbase file to launch natural gas futures contracts

Tether to make great profits from current US stability regulations

Major leadership shift at HK Asia Holdings, as Bitcoin Magazine takes the helm

Most Popular

Cardano Coinbase file to launch natural gas futures contracts

Tether to make great profits from current US stability regulations

Major leadership shift at HK Asia Holdings, as Bitcoin Magazine takes the helm

Latest News

Cardano Coinbase file to launch natural gas futures contracts

Tether to make great profits from current US stability regulations

Major leadership shift at HK Asia Holdings, as Bitcoin Magazine takes the helm

Subscribe to Updates

What's Hot

NVIDIA’s NCCL 2.24 increases network reliability and observability

New features for NCCL 2.24

RAS Subsystem

Enhancement of user buffer registration

Nick Fusion

Additional features and fixes

Related Posts

Subscribe to Updates