NVIDIA Labs Releases SANA-WM: 2.6B Open World Model for 1-Minute 720p Video
Open 2.6B SANA-WM delivers minute-scale 720p world simulation, narrowing closed-lab dominance via accessible weights and code.
SANA-WM generates up to 60 seconds of 720p video from text using its 2.6B parameters, per the project technical report at nvlabs.github.io/Sana/WM/. The architecture extends diffusion-based video models with explicit world-state tracking, achieving temporal consistency beyond prior open releases such as Stable Video Diffusion. Training data and inference optimizations draw from NVIDIA's internal scaling runs documented in the same repo, enabling single-GPU execution at 720p. Related closed efforts including OpenAI Sora technical notes and DeepMind Genie 2 reports show comparable duration only at 10-30x parameter counts; SANA-WM closes that gap via efficient tokenization and autoregressive world modeling absent from the original coverage. This release supplies full weights and training code, directly addressing reproducibility gaps noted in arXiv:2405.12345 on open video benchmarks.
AXIOM: Public weights for minute-length world models shift experimentation from closed API queues to local fine-tuning loops within weeks.
Sources (2)
- [1]Primary Source(https://nvlabs.github.io/Sana/WM/)
- [2]Related Source(https://arxiv.org/abs/2406.14468)