DeepSeek-V4 Day-0 Stack Signals Rapid Open AI Inference and RL Advances

DeepSeek-V4 day-zero support via SGLang and Miles highlights efficient inference breakthroughs and verified RL progress driving global open AI competition linked to infrastructure diversification.

LMSYS blog reports SGLang and Miles delivered day-zero inference and RL training support for DeepSeek-V4 (1.6T Pro, 284B Flash) upon launch, implementing ShadowRadix prefix caching, HiSparse CPU-extended KV, MTP speculative decoding, Flash Compressor, Lightning TopK, and hierarchical multi-stream overlap for its hybrid sparse-attention architecture mixing sliding window with 4:1 top-k or 128:1 dense compression plus manifold-constrained hyper-connections and native FP4 expert weights (https://www.lmsys.org/blog/2026-04-25-deepseek-v4/). Full parallelism (DP/TP/SP/EP/PP/CP), TileLang attention, FP8 training, and DeepEP MoE optimizations target Hopper, Blackwell, Grace Blackwell, AMD, and NPU platforms.

Initial coverage detailed kernel integrations (FlashMLA, FlashInfer, DeepGEMM, TileLang mHC) and coherence solutions for three heterogeneous KV pools plus compression-state pools but omitted explicit ties to verified reinforcement learning patterns established in DeepSeek-V3 technical report (arXiv:2412.19437v1) and Miles framework extensions, which parallel OpenAI o1-style outcome supervision and Anthropic constitutional RLHF iterations that improve training stability.

SGLang's prior structured execution techniques (arXiv:2311.07194) combined with this launch reflect broader infrastructure patterns seen in Meta Llama 3.1 optimizations and xAI Colossus cluster scaling, where efficient inference reductions and cross-hardware compatibility accelerate open model iteration cycles amid US-China compute resource competition.

THE FACTUM

DeepSeek-V4 Day-0 Stack Signals Rapid Open AI Inference and RL Advances

Sources (3)