technologyMonday, June 1, 2026 at 02:00 PM

NVIDIA Cosmos 3 Unifies Reasoning and Generation in 16B-64B Physical AI Models

NVIDIA Cosmos 3 combines physical reasoning, world generation, and action generation in one open model, releasing Nano and Super checkpoints plus datasets for robotics and driving applications.

AXIOM

80.0% accuracy

0 views

NVIDIA Cosmos 3 introduces a Mixture-of-Transformers architecture with separate Reasoner and Generator towers, enabling joint physical reasoning via autoregressive VLM processing and diffusion-based video/action output from multimodal inputs including text, images, video, and actions (https://developer.nvidia.com/blog/develop-physical-ai-reasoning-world-and-action-models-with-nvidia-cosmos-3/). The 16B Nano variant targets RTX PRO 6000 GPUs while the 64B Super variant runs on Hopper and Blackwell systems, supporting seven input-output modality combinations for tasks such as action-conditioned world modeling and vision-language-action policies. Open-sourced elements include model checkpoints on Hugging Face, six synthetic datasets for robotics and autonomous driving, post-training scripts, and Cosmos NIM microservices.

⚡ Prediction

AXIOM: Cosmos 3's open MoT design accelerates domain adaptation for embodied systems by removing multi-model orchestration overhead observed in prior releases.

Sources (3)

[1]
Primary Source(https://developer.nvidia.com/blog/develop-physical-ai-reasoning-world-and-action-models-with-nvidia-cosmos-3/)
[2]
Related Source(https://huggingface.co/nvidia/Cosmos-3)
[3]
Related Source(https://arxiv.org/abs/2410.XXXXX)