1. GPU Specs (A100 - B200)

BF16 Tensor Core (Dense)FP8 Tensor Core (Dense)GPU MemoryGPU Memory BandwidthNVLink (Unidirectional)Host InterfaceArithmetic Intensity$$BW_\frac{memory}{comm}$$
A100 80GB SXM312 TFLOPS-80 GB HBM2e2,039 GB/s300 GB/sPCIe Gen41536.8
H100 SXM989 TFLOPS1,979 TFLOPS80 GB HBM33.35 TB/s450 GB/sPCIe Gen52957.4
H200 SXM989 TFLOPS1,979 TFLOPS141 GB HBM3e4.8 TB/s450 GB/sPCIe Gen520610.6
GB200 NVL722.5 PFLOPS5 PFLOPS196 GB HBM3e8 TB/s900 GB/sNVLink-C2C3128.9
HGX B2002.25 PFLOPS4.5 PFLOPS180 GB HBM3e7.7 TB/s900 GB/sPCIe Gen62928.6
HGX B3002.25 PFLOPS4.5 PFLOPS288 GB HBM3e8 TB/s900 GB/sPCIe Gen62818.9

Note:

  • The Host Interface (PCIe) limits the GPU-to-NIC throughput. Ensure your simulation checks if Port Speed > PCIe BW to identify potential host-side bottlenecks (e.g., A100 on NDR).
  • PCIe (Unidirectional) Bandwidth: Gen4 - 31.5 GB/s; Gen5 - 63 GB/s; Gen6 - 121 GB/s.

2. Switch Specs (InfiniBand)

RoleGenerationPort SpeedPort Count
Quantum QM8700Leaf / SpineA100 (HDR)25 GB/s40
Quantum CS8500DirectorA100 (HDR)25 GB/sUp to 800
Quantum-2 QM9700Leaf / SpineH100 (NDR)50 GB/s64
Quantum-2 CS9500DirectorH100 (NDR)50 GB/sUp to 2,048
Quantum-X800 Q3400Leaf / SpineB200 (XDR)100 GB/s144
Quantum-X800 DirectorDirectorB200 (XDR)100 GB/sScalable