GB200 NVL72

GB200 NVL72 connects 36 Grace CPUs and 72 Blackwell GPUs in a rack-scale design. The GB200 NVL72 is a liquid-cooled, rack-scale solution that boasts a 72-GPU NVLink domain that acts as a single massive GPU and delivers 30X faster real-time trillion-parameter LLM inference.

GB200 NVL72

Infrastructure and technology partners

Perfect for a range of workloads

Deploying AI based workloads on CUDO Compute is easy and cost-effective. Follow our AI related tutorials.

Deploying rendering based workloads on CUDO Compute is easy and cost-effective.

From video editing to image generation, virtualization is ideal for your content creation needs.

Purpose-built GB200 clusters, designed and managed by CUDO

Deployed across 16 ISO-certified data centres

From 8 to 1,000+ GPUs in a single deployment

NVIDIA Quantum-X800 InfiniBand or Spectrum-X Ethernet networking

Expert rack-level design, installation, and benchmarking before handoff

24/7 monitoring, management, and engineering support

Compatible with Slurm, Kubernetes, and NVIDIA Base Command

Available at the most cost-effective pricing

Launch your AI products faster with on-demand GPUs and a global network of data center partners

Bare metal

Complete control over a physical machine for more control.

No noisy neighbors

SpectrumX local networking

300Gbps external connectivity

NVMe SSD storage

Enterprise

We offer a range of solutions for enterprise customers.

Powerful GPU clusters

Scalable data center colocation

Large quantities of GPUs and hardware

Optimize to your requirements

Expert installation

Scale as your demand grows

Specifications

Browse specifications for the NVIDIA GB200 GPU

Starting from

Architecture

NVIDIA Blackwell

GPU

72x NVIDIA Blackwell GPUs

GPU memory

13.5 TB total HBM3e, 576 TB/s aggregate bandwidth

FP4 tensor core performance

1,440 PFLOPS

FP8 tensor core performance

720 PFLOPS

NVIDIA NVSwitch

9x L1 NVIDIA NVLink Switches (Fifth-generation)

NVIDIA NVLink bandwidth

130 TB/s aggregate intra-rack bandwidth (1.8 TB/s per GPU, allowing all 72 GPUs to communicate as one)

System power usage

~120 kW rack TDP

CPU

36x NVIDIA Grace CPUs (2,592 Arm Neoverse V2 cores total)

System memory

Up to 17 TB LPDDR5X, up to 18.4 TB/s aggregate bandwidth

Networking

Up to 72x OSFP single-port NVIDIA ConnectX-7 or ConnectX-8 VPI (400 Gb/s to 800 Gb/s NVIDIA InfiniBand/Ethernet) Up to 18x dual-port NVIDIA BlueField-3 VPI DPUs

Management network

Host baseboard management controller (BMC) with RJ45 per tray, 2x Top-of-Rack (TOR) out-of-band management switches

Storage

Per compute tray (18 compute trays total): 8x E1.S Gen5 NVMe drive bays for internal storage, 1x M.2 NVMe SSD for OS boot

Software

NVIDIA AI Enterprise (optimized AI software), NVIDIA Mission Control (AI data center operations/orchestration), NVIDIA Base Command, Ubuntu / RHEL

Rack units (RU)

Operating temperature

Requires dedicated Direct-to-Chip (D2C) liquid cooling (CPUs, GPUs, and NVSwitches are liquid-cooled via an in-rack or in-row Coolant Distribution Unit, while networking modules and storage are air-cooled).

Ideal uses cases for the NVIDIA GB200 NVL72 GPU

Explore uses cases for the NVIDIA GB200 NVL72 including Supercharging next-generation AI and accelerated computing, Energy-efficient infrastructure, Massive-scale training.

Supercharging next-generation AI and accelerated computing

GB200 NVL72 introduces cutting-edge capabilities and a second-generation Transformer Engine which enables FP4 AI and when coupled with fifth-generation NVIDIA NVLink, delivers 30X faster real-time LLM inference performance for trillion-parameter language models.

Energy-efficient infrastructure

Liquid-cooled GB200 NVL72 racks reduce a data center’s carbon footprint and energy consumption. Liquid cooling increases compute density, reduces the amount of floor space used, and facilitates high-bandwidth, low-latency GPU communication with large NVLink domain architectures.

Massive-scale training

GB200 NVL72 includes a faster second-generation Transformer Engine featuring FP8 precision, enabling a remarkable 4X faster training for large language models at scale.

Blog

Resources

AI hardware installation & maintenance: from GPU racks to memory and storage

Hardware bottlenecks strand expensive compute. We detail the precise site readiness, cooling, and storage configurations needed to scale AI racks

Emmanuel Ohiri

March 18, 2026

Resources

Key considerations for optimizing power efficiency with sustainable energy sources

Power is no longer a background variable in AI infrastructure. It is a first-order constraint that sets the ceiling on

Emmanuel Ohiri

February 10, 2026

Resources

Building for 70% AI-driven demand: Planning for the coming capacity surge

Global data center capacity will nearly triple by 2030, with AI driving most demand. Traditional infrastructure planning no longer works

Emmanuel Ohiri

January 20, 2026

Resources

NVIDIA H100 versus H200: how do they compare?

Read the comprehensive comparison between NVIDIA's H100 and H200 GPUs. Discover the expected improvements and performance gains for AI and

Emmanuel Ohiri

January 16, 2026

Resources

NVIDIA’s Blackwell architecture: breaking down the B100, B200, and GB200

NVIDIA introduced a pivotal breakthrough in AI technology by unveiling its next-gen Blackwell-based GPUs at the NVIDIA GTC 2024.

Emmanuel Ohiri

January 15, 2026

Resources

What is ensemble learning?

Ensemble learning combines the strengths of different algorithms to achieve greater accuracy and solve complex problems.

Emmanuel Ohiri

January 15, 2026

Browse alternative GPU solutions for your workloads

Access a wide range of performant NVIDIA and AMD GPUs to accelerate your AI, ML & HPC workloads

NVIDIA H100 SXM

Price on request

Deploy performant H100s on-demand with CUDO Compute.

NVIDIA H100 PCIe

Price on request

Scale with high performance H100 GPUs on our reserved cloud.

NVIDIA H100 SXM

Pricing on request

Deploy performant H100s on-demand with CUDO Compute.

NVIDIA L40S

Pricing on request

Deploying AI based workloads on CUDO Compute.

NVIDIA H200

Pricing on request

Deploying AI based workloads on CUDO Compute

NVIDIA H100

Pricing on request

Deploying AI based workloads on CUDO Compute

Discuss your infrastructure requirements

First name*

Last name*

Company name*

Phone*

Business email address*

What do you use CUDO Compute for?*

How can we help?

Products

GB200 NVL72

GB200 NVL72

Infrastructure and technology partners

Perfect for a range of workloads

Purpose-built GB200 clusters, designed and managed by CUDO

Deployed across 16 ISO-certified data centres

From 8 to 1,000+ GPUs in a single deployment

NVIDIA Quantum-X800 InfiniBand or Spectrum-X Ethernet networking

Expert rack-level design, installation, and benchmarking before handoff

24/7 monitoring, management, and engineering support

Compatible with Slurm, Kubernetes, and NVIDIA Base Command

Available at the most cost-effective pricing

Bare metal

Enterprise

Specifications

Ideal uses cases for the NVIDIA GB200 NVL72 GPU

Supercharging next-generation AI and accelerated computing

Energy-efficient infrastructure

Massive-scale training

Blog

Resources

AI hardware installation & maintenance: from GPU racks to memory and storage

Resources

Key considerations for optimizing power efficiency with sustainable energy sources

Resources

Building for 70% AI-driven demand: Planning for the coming capacity surge

Resources

NVIDIA H100 versus H200: how do they compare?

Resources

NVIDIA’s Blackwell architecture: breaking down the B100, B200, and GB200

Resources

What is ensemble learning?

Browse alternative GPU solutions for your workloads

NVIDIA H100 SXM

Price on request

NVIDIA H100 PCIe

Price on request

NVIDIA H100 SXM

Pricing on request

NVIDIA L40S

Pricing on request

NVIDIA H200

Pricing on request

NVIDIA H100

Pricing on request

Discuss your infrastructure requirements