The Devil in the Details: Engineering Tricks for SOTA Video Models
Theory is clean, but training is messy. This note covers 5 essential engineering tricks—from Timestep Shifting to 3D RoPE—that stabilize training and boost performance.
Theory is clean, but training is messy. This note covers 5 essential engineering tricks—from Timestep Shifting to 3D RoPE—that stabilize training and boost performance.
A technical note on the shift from noise prediction (DDPM) to velocity prediction (Flow Matching), and how CFG acts as a vector field modifier.
From DiT to Hunyuan Video, adaLN-Zero remains the gold standard for conditioning. Here’s how this zero-initialized module works and why it persists in the era of Flow Matching.
An interactive tool to visualize the mapping between 1D token sequences and 3D (T, H, W) sliding windows.