NVIDIA GB300 NVL72

72 Blackwell Ultra GPUs and 36 Grace CPUs in a single liquid-cooled rack delivering over 1,000 PFLOPS of dense FP4 compute operating as one massive GPU. Up to 10X lower inference latency and up to 5X higher throughput per megawatt than Hopper. Deployed, managed, and supported by CUDO.

NVIDIA GB300 NVL72

Infrastructure and technology partners

Perfect for a range of workloads

Up to 6,000 tokens per second per GPU on DeepSeek R1-671B, 45% faster than GB200 NVL72 and approximately 5X faster than Hopper. 20 TB of HBM3e across the rack keeps the largest models in memory without offloading for low-latency reasoning at scale.

Up to 1.5X faster dense FP4 training per GPU than GB200 NVL72. 130 TB/s NVLink bandwidth across a 72-GPU domain, enabling all GPUs to communicate as a single system, eliminating intra-rack communication bottlenecks. Grace CPUs handle data loading, preprocessing, and host-side compute with 2X the energy efficiency of leading x86 server processors.

Up to 20 TB of HBM3e per rack, plus 17 TB of Grace CPU memory as an extended capacity tier, enough to keep massive KV caches in memory for long-context inference with 128K+ token inputs. GB300 NVL72 delivers up to 1.5X higher throughput than GB200 NVL72 in latency-sensitive long-context scenarios.

Managed GB300 NVL72 systems, deployed and operated by CUDO

Deployed across select ISO-certified, liquid-cooled data centres

Scale to hundreds of GPUs across multi-rack deployments

Fully liquid-cooled with direct-to-chip cooling — up to 90% heat captured by liquid

NVIDIA Quantum-X800 InfiniBand or Spectrum-X Ethernet with ConnectX-8 SuperNICs — 800 Gb/s per GPU

Expert installation, liquid cooling commissioning, and benchmarking before handoff

Compatible with Slurm and Kubernetes

Available at the most cost-effective pricing

Launch your AI products faster with on-demand GPUs and a global network of data center partners

Bare metal

Complete control over a physical machine for more control.

Powered by renewable energy

No noisy neighbors

SpectrumX local networking

300Gbps external connectivity

NVMe SSD storage

Enterprise

We offer a range of solutions for enterprise customers.

Powerful GPU clusters

Scalable data center colocation

Large quantities of GPUs and hardware

Optimize to your requirements

Expert installation

Scale as your demand grows

Specifications

NVIDIA GB300 NVL72 specifications

Starting from

Contact us for pricing

Architecture

NVIDIA Blackwell

GPU

72 NVIDIA Blackwell Ultra GPUs

GPU memory

20 TB | Up to 576 TB/s

FP4 tensor core performance

1440 | 1080 PFLOPS

FP8 tensor core performance

720 PFLOPS

NVIDIA NVSwitch

9x L1 NVIDIA NVLink Switches (Fifth-generation)

NVIDIA NVLink bandwidth

130 TB/s

System power usage

~135 kW TDP per rack (up to 155 kW peak depending on workload and EDP behavior)

CPU

36x NVIDIA Grace CPUs (2,592 Arm Neoverse V2 cores total)

System memory

17 TB LPDDR5X, up to 18.4 TB/s aggregate bandwidth

Networking

72x OSFP single-port NVIDIA ConnectX-8 VPI (Up to 800 Gb/s NVIDIA InfiniBand/Ethernet) 18x dual-port NVIDIA BlueField-3 VPI DPUs (Up to 200 Gb/s NVIDIA InfiniBand/Ethernet)

Management network

Host baseboard management controller (BMC) with RJ45 per tray, 2x Top-of-Rack (TOR) out-of-band management switches

Storage

Per compute tray (18 trays total): 8x E1.S Gen5 NVMe drive bays for internal storage, 1x M.2 NVMe SSD for OS boot

Software

NVIDIA AI Enterprise (optimized AI software), NVIDIA Mission Control (AI data center operations/orchestration), NVIDIA DGX OS / Ubuntu

Rack units (RU)

48

Operating temperature

Requires dedicated Direct-to-Chip (D2C) liquid cooling (Hybrid architecture: CPUs, GPUs, and NVSwitches are liquid-cooled via a Coolant Distribution Unit, while networking modules and storage are air-cooled).

Where GB300 NVL72 systems deliver the biggest impact

Explore uses cases for the NVIDIA GB300 including AI reasoning and test-time scaling, Large-scale inference, frontier model training, and Sovereign and regulated AI.

AI reasoning and test-time scaling at rack scale

The 72-GPU NVLink domain operates as a single massive GPU for reasoning workloads. Blackwell Ultra's enhanced Transformer Engine doubles attention throughput versus GB200, and 20 TB of HBM3e is enough to keep full model states in memory. MLPerf Inference v5.1 results show 45% higher DeepSeek R1 throughput per GPU than the previous generation.

Large-scale inference at lower cost per token

Up to 10X lower inference latency per user and up to 5X higher throughput per megawatt than Hopper. The 72-GPU NVLink domain eliminates the need to shard models across separate nodes, reducing serving complexity and infrastructure cost.

Frontier model training and fine-tuning

Up to 1.5X faster dense FP4 training per GPU than GB200 NVL72. 130 TB/s NVLink bandwidth eliminates intra-rack communication bottlenecks, and up to 20 TB of HBM3e per rack provides the memory capacity for multi-trillion-parameter models.

Sovereign and regulated AI

Deploy GB300 NVL72 racks in ISO-certified data centres globally. Meet data residency and regulatory requirements with dedicated, liquid-cooled infrastructure under your control, managed by CUDO.

Blog

Browse alternative GPU solutions for your workloads

Access a wide range of performant NVIDIA and AMD GPUs to accelerate your AI, ML & HPC workloads

Discuss your infrastructure requirements

Scroll to Top