CP | Yunsheng Ni

Loss Reduction in Distributed Training

An analysis of how Data Parallelism (DP) and Context Parallelism (CP) affect loss reduction, and how to maintain mathematical equivalence between distributed and single-device training for LLM and Video DiT models.