Roofline Analysis of LLMs on H200: Performance Modeling and Recomputation Strategies

A quantitative Roofline analysis of LLMs on NVIDIA H200. We derive compute-bound thresholds, analyze the 1:10 communication bottleneck, and propose optimal strategies for activation recomputation and operator fusion to maximize hardware efficiency.

GPU & Network Constants

A quick reference of Dense FLOPS and Unidirectional Bandwidth for A100, H100, H200, and Blackwell.