NVIDIA GB300 NVL72
72 Blackwell Ultra GPUs and 36 Grace CPUs in a single liquid-cooled rack delivering over 1,000 PFLOPS of dense FP4 compute operating as one massive GPU. Up to 10X lower inference latency and up to 5X higher throughput per megawatt than Hopper. Deployed, managed, and supported by CUDO.
NVIDIA GB300 NVL72
Infrastructure and technology partners
Perfect for a range of workloads
Up to 6,000 tokens per second per GPU on DeepSeek R1-671B, 45% faster than GB200 NVL72 and approximately 5X faster than Hopper. 20 TB of HBM3e across the rack keeps the largest models in memory without offloading for low-latency reasoning at scale.
Up to 1.5X faster dense FP4 training per GPU than GB200 NVL72. 130 TB/s NVLink bandwidth across a 72-GPU domain, enabling all GPUs to communicate as a single system, eliminating intra-rack communication bottlenecks. Grace CPUs handle data loading, preprocessing, and host-side compute with 2X the energy efficiency of leading x86 server processors.
Up to 20 TB of HBM3e per rack, plus 17 TB of Grace CPU memory as an extended capacity tier, enough to keep massive KV caches in memory for long-context inference with 128K+ token inputs. GB300 NVL72 delivers up to 1.5X higher throughput than GB200 NVL72 in latency-sensitive long-context scenarios.
Managed GB300 NVL72 systems, deployed and operated by CUDO
Deployed across select ISO-certified, liquid-cooled data centres
Scale to hundreds of GPUs across multi-rack deployments
Fully liquid-cooled with direct-to-chip cooling — up to 90% heat captured by liquid
NVIDIA Quantum-X800 InfiniBand or Spectrum-X Ethernet with ConnectX-8 SuperNICs — 800 Gb/s per GPU
Expert installation, liquid cooling commissioning, and benchmarking before handoff
Compatible with Slurm and Kubernetes
Available at the most cost-effective pricing
Launch your AI products faster with on-demand GPUs and a global network of data center partners
Bare metal
Powered by renewable energy
No noisy neighbors
SpectrumX local networking
300Gbps external connectivity
NVMe SSD storage
Enterprise
Powerful GPU clusters
Scalable data center colocation
Large quantities of GPUs and hardware
Optimize to your requirements
Expert installation
Scale as your demand grows
Specifications
NVIDIA GB300 NVL72 specifications
Starting from
Contact us for pricing
Architecture
NVIDIA Blackwell
GPU
72 NVIDIA Blackwell Ultra GPUs
GPU memory
20 TB | Up to 576 TB/s
FP4 tensor core performance
1440 | 1080 PFLOPS
FP8 tensor core performance
720 PFLOPS
NVIDIA NVSwitch
9x L1 NVIDIA NVLink Switches (Fifth-generation)
NVIDIA NVLink bandwidth
130 TB/s
System power usage
~135 kW TDP per rack (up to 155 kW peak depending on workload and EDP behavior)
CPU
36x NVIDIA Grace CPUs (2,592 Arm Neoverse V2 cores total)
System memory
17 TB LPDDR5X, up to 18.4 TB/s aggregate bandwidth
Networking
72x OSFP single-port NVIDIA ConnectX-8 VPI (Up to 800 Gb/s NVIDIA InfiniBand/Ethernet) 18x dual-port NVIDIA BlueField-3 VPI DPUs (Up to 200 Gb/s NVIDIA InfiniBand/Ethernet)
Management network
Host baseboard management controller (BMC) with RJ45 per tray, 2x Top-of-Rack (TOR) out-of-band management switches
Storage
Per compute tray (18 trays total): 8x E1.S Gen5 NVMe drive bays for internal storage, 1x M.2 NVMe SSD for OS boot
Software
NVIDIA AI Enterprise (optimized AI software), NVIDIA Mission Control (AI data center operations/orchestration), NVIDIA DGX OS / Ubuntu
Rack units (RU)
48
Operating temperature
Requires dedicated Direct-to-Chip (D2C) liquid cooling (Hybrid architecture: CPUs, GPUs, and NVSwitches are liquid-cooled via a Coolant Distribution Unit, while networking modules and storage are air-cooled).
Where GB300 NVL72 systems deliver the biggest impact
Explore uses cases for the NVIDIA GB300 including AI reasoning and test-time scaling, Large-scale inference, frontier model training, and Sovereign and regulated AI.
AI reasoning and test-time scaling at rack scale
The 72-GPU NVLink domain operates as a single massive GPU for reasoning workloads. Blackwell Ultra's enhanced Transformer Engine doubles attention throughput versus GB200, and 20 TB of HBM3e is enough to keep full model states in memory. MLPerf Inference v5.1 results show 45% higher DeepSeek R1 throughput per GPU than the previous generation.
Large-scale inference at lower cost per token
Up to 10X lower inference latency per user and up to 5X higher throughput per megawatt than Hopper. The 72-GPU NVLink domain eliminates the need to shard models across separate nodes, reducing serving complexity and infrastructure cost.
Frontier model training and fine-tuning
Up to 1.5X faster dense FP4 training per GPU than GB200 NVL72. 130 TB/s NVLink bandwidth eliminates intra-rack communication bottlenecks, and up to 20 TB of HBM3e per rack provides the memory capacity for multi-trillion-parameter models.
Sovereign and regulated AI
Deploy GB300 NVL72 racks in ISO-certified data centres globally. Meet data residency and regulatory requirements with dedicated, liquid-cooled infrastructure under your control, managed by CUDO.
Blog
Resources
- Emmanuel Ohiri
Resources
- Emmanuel Ohiri
Resources
- Emmanuel Ohiri
Resources
- Emmanuel Ohiri
Resources
- Emmanuel Ohiri
Resources
- Emmanuel Ohiri
Browse alternative GPU solutions for your workloads
Access a wide range of performant NVIDIA and AMD GPUs to accelerate your AI, ML & HPC workloads
NVIDIA H100 PCIe
Price on request
Scale with high performance H100 GPUs on our reserved cloud.