Demystifying FlashAttention: Forward, Backward, and Triton Implementation
A breakdown of FlashAttention’s forward and backward passes, including Online Softmax, LogSumExp materialization, gradient recomputation, and core Triton implementations.
A breakdown of FlashAttention’s forward and backward passes, including Online Softmax, LogSumExp materialization, gradient recomputation, and core Triton implementations.
A step-by-step guide to optimizing GEMM in Triton, covering Tiling, Autotuning, L2 Cache Optimizations, and Hopper TMA.