Roofline Analysis of LLMs on H200: Performance Modeling and Recomputation Strategies
A quantitative Roofline analysis of LLMs on NVIDIA H200. We derive compute-bound thresholds, analyze the 1:10 communication bottleneck, and propose optimal strategies for activation recomputation and operator fusion to maximize hardware efficiency.