Demystifying FlashAttention: Forward, Backward, and Triton Implementation
A breakdown of FlashAttention’s forward and backward passes, including Online Softmax, LogSumExp materialization, gradient recomputation, and core Triton implementations.
A breakdown of FlashAttention’s forward and backward passes, including Online Softmax, LogSumExp materialization, gradient recomputation, and core Triton implementations.
A rigorous breakdown of FLOPs in Llama-style architectures, deriving the relationship between linear projections and quadratic attention overhead, with insights into sample packing efficiency.