NVIDIA GB300 NVL72

72 Blackwell Ultra GPUs and 36 Grace CPUs in a single liquid-cooled rack delivering over 1,000 PFLOPS of dense FP4 compute operating as one massive GPU. Up to 10X lower inference latency and up to 5X higher throughput per megawatt than Hopper. Deployed, managed, and supported by CUDO.

NVIDIA GB300 NVL72

Infrastructure and technology partners

Perfect for a range of workloads

Up to 6,000 tokens per second per GPU on DeepSeek R1-671B, 45% faster than GB200 NVL72 and approximately 5X faster than Hopper. 20 TB of HBM3e across the rack keeps the largest models in memory without offloading for low-latency reasoning at scale.

Up to 1.5X faster dense FP4 training per GPU than GB200 NVL72. 130 TB/s NVLink bandwidth across a 72-GPU domain, enabling all GPUs to communicate as a single system, eliminating intra-rack communication bottlenecks. Grace CPUs handle data loading, preprocessing, and host-side compute with 2X the energy efficiency of leading x86 server processors.

Up to 20 TB of HBM3e per rack, plus 17 TB of Grace CPU memory as an extended capacity tier, enough to keep massive KV caches in memory for long-context inference with 128K+ token inputs. GB300 NVL72 delivers up to 1.5X higher throughput than GB200 NVL72 in latency-sensitive long-context scenarios.

Managed GB300 NVL72 systems, deployed and operated by CUDO

Deployed across select ISO-certified, liquid-cooled data centres

Scale to hundreds of GPUs across multi-rack deployments

Fully liquid-cooled with direct-to-chip cooling — up to 90% heat captured by liquid

NVIDIA Quantum-X800 InfiniBand or Spectrum-X Ethernet with ConnectX-8 SuperNICs — 800 Gb/s per GPU

Expert installation, liquid cooling commissioning, and benchmarking before handoff

Compatible with Slurm and Kubernetes

Available at the most cost-effective pricing

Launch your AI products faster with on-demand GPUs and a global network of data center partners

Bare metal

Complete control over a physical machine for more control.

No noisy neighbors

SpectrumX local networking

300Gbps external connectivity

NVMe SSD storage

Enterprise

We offer a range of solutions for enterprise customers.

Powerful GPU clusters

Scalable data center colocation

Large quantities of GPUs and hardware

Optimize to your requirements

Expert installation

Scale as your demand grows

Specifications

NVIDIA GB300 NVL72 specifications

Starting from

Architecture

NVIDIA Blackwell

GPU

72 NVIDIA Blackwell Ultra GPUs

GPU memory

20 TB | Up to 576 TB/s

FP4 tensor core performance

1440 | 1080 PFLOPS

FP8 tensor core performance

720 PFLOPS

NVIDIA NVSwitch

9x L1 NVIDIA NVLink Switches (Fifth-generation)

NVIDIA NVLink bandwidth

130 TB/s

System power usage

~135 kW TDP per rack (up to 155 kW peak depending on workload and EDP behavior)

CPU

36x NVIDIA Grace CPUs (2,592 Arm Neoverse V2 cores total)

System memory

17 TB LPDDR5X, up to 18.4 TB/s aggregate bandwidth

Networking

72x OSFP single-port NVIDIA ConnectX-8 VPI (Up to 800 Gb/s NVIDIA InfiniBand/Ethernet) 18x dual-port NVIDIA BlueField-3 VPI DPUs (Up to 200 Gb/s NVIDIA InfiniBand/Ethernet)

Management network

Host baseboard management controller (BMC) with RJ45 per tray, 2x Top-of-Rack (TOR) out-of-band management switches

Storage

Per compute tray (18 trays total): 8x E1.S Gen5 NVMe drive bays for internal storage, 1x M.2 NVMe SSD for OS boot

Software

NVIDIA AI Enterprise (optimized AI software), NVIDIA Mission Control (AI data center operations/orchestration), NVIDIA DGX OS / Ubuntu

Rack units (RU)

Operating temperature

Requires dedicated Direct-to-Chip (D2C) liquid cooling (Hybrid architecture: CPUs, GPUs, and NVSwitches are liquid-cooled via a Coolant Distribution Unit, while networking modules and storage are air-cooled).

Where GB300 NVL72 systems deliver the biggest impact

Explore uses cases for the NVIDIA GB300 including AI reasoning and test-time scaling, Large-scale inference, frontier model training, and Sovereign and regulated AI.

AI reasoning and test-time scaling at rack scale

The 72-GPU NVLink domain operates as a single massive GPU for reasoning workloads. Blackwell Ultra's enhanced Transformer Engine doubles attention throughput versus GB200, and 20 TB of HBM3e is enough to keep full model states in memory. MLPerf Inference v5.1 results show 45% higher DeepSeek R1 throughput per GPU than the previous generation.

Large-scale inference at lower cost per token

Up to 10X lower inference latency per user and up to 5X higher throughput per megawatt than Hopper. The 72-GPU NVLink domain eliminates the need to shard models across separate nodes, reducing serving complexity and infrastructure cost.

Frontier model training and fine-tuning

Up to 1.5X faster dense FP4 training per GPU than GB200 NVL72. 130 TB/s NVLink bandwidth eliminates intra-rack communication bottlenecks, and up to 20 TB of HBM3e per rack provides the memory capacity for multi-trillion-parameter models.

Sovereign and regulated AI

Deploy GB300 NVL72 racks in ISO-certified data centres globally. Meet data residency and regulatory requirements with dedicated, liquid-cooled infrastructure under your control, managed by CUDO.

Blog

Resources

AI hardware installation & maintenance: from GPU racks to memory and storage

Hardware bottlenecks strand expensive compute. We detail the precise site readiness, cooling, and storage configurations needed to scale AI racks

Emmanuel Ohiri

March 18, 2026

Resources

Key considerations for optimizing power efficiency with sustainable energy sources

Power is no longer a background variable in AI infrastructure. It is a first-order constraint that sets the ceiling on

Emmanuel Ohiri

February 10, 2026

Resources

Building for 70% AI-driven demand: Planning for the coming capacity surge

Global data center capacity will nearly triple by 2030, with AI driving most demand. Traditional infrastructure planning no longer works

Emmanuel Ohiri

January 20, 2026

Resources

NVIDIA H100 versus H200: how do they compare?

Read the comprehensive comparison between NVIDIA's H100 and H200 GPUs. Discover the expected improvements and performance gains for AI and

Emmanuel Ohiri

January 16, 2026

Resources

NVIDIA’s Blackwell architecture: breaking down the B100, B200, and GB200

NVIDIA introduced a pivotal breakthrough in AI technology by unveiling its next-gen Blackwell-based GPUs at the NVIDIA GTC 2024.

Emmanuel Ohiri

January 15, 2026

Resources

What is ensemble learning?

Ensemble learning combines the strengths of different algorithms to achieve greater accuracy and solve complex problems.

Emmanuel Ohiri

January 15, 2026

Browse alternative GPU solutions for your workloads

Access a wide range of performant NVIDIA and AMD GPUs to accelerate your AI, ML & HPC workloads

Products

NVIDIA GB300 NVL72

NVIDIA GB300 NVL72

Infrastructure and technology partners

Perfect for a range of workloads

Managed GB300 NVL72 systems, deployed and operated by CUDO

Deployed across select ISO-certified, liquid-cooled data centres

Scale to hundreds of GPUs across multi-rack deployments

Fully liquid-cooled with direct-to-chip cooling — up to 90% heat captured by liquid

NVIDIA Quantum-X800 InfiniBand or Spectrum-X Ethernet with ConnectX-8 SuperNICs — 800 Gb/s per GPU

Expert installation, liquid cooling commissioning, and benchmarking before handoff

Compatible with Slurm and Kubernetes

Available at the most cost-effective pricing

Bare metal

Enterprise

Specifications

Where GB300 NVL72 systems deliver the biggest impact

AI reasoning and test-time scaling at rack scale

Large-scale inference at lower cost per token

Frontier model training and fine-tuning

Sovereign and regulated AI

Blog

Resources

AI hardware installation & maintenance: from GPU racks to memory and storage

Resources

Key considerations for optimizing power efficiency with sustainable energy sources

Resources

Building for 70% AI-driven demand: Planning for the coming capacity surge

Resources

NVIDIA H100 versus H200: how do they compare?

Resources

NVIDIA’s Blackwell architecture: breaking down the B100, B200, and GB200

Resources

What is ensemble learning?

Browse alternative GPU solutions for your workloads

NVIDIA H100 SXM

Price on request

NVIDIA H100 PCIe

Price on request

NVIDIA H100 SXM

Pricing on request

NVIDIA L40S

Pricing on request

NVIDIA H200

Pricing on request

NVIDIA H100

Pricing on request

Discuss your infrastructure requirements