DeepSeek V4 Matches Closed Frontier Models on Benchmarks With Open Weights

DeepSeek V4 preview sets new open-weight performance standard on coding and agent benchmarks at fraction of closed-model cost, continuing efficiency gains first shown in R1.

DeepSeek released preview of open-weight V4 on Friday, featuring longer context via new memory-efficient design according to MIT Technology Review (2026).

V4-Pro achieves scores on par with Anthropic Claude-Opus-4.6, OpenAI GPT-5.4 and Google Gemini-3.1 across coding, math and STEM benchmarks per DeepSeek technical report; API pricing set at $1.74 per million input tokens versus higher rates from OpenAI and Anthropic (Technology Review, 2026; DeepSeek V4 Report, 2026). Release follows R1 model from January 2025 that used limited compute and triggered subsequent open-weight models from Alibaba Qwen and Z.ai GLM (Reuters, March 2025). Original coverage omitted direct linkage to post-2024 US chip sanctions data showing Chinese labs achieving competitive results on H800-class hardware tracked in Epoch AI compute reports.

Synthesizing LMSYS Arena Elo ratings, Hugging Face Open LLM Leaderboard and Artificial Analysis efficiency metrics demonstrates V4 exceeds Qwen-3.5 and prior Llama-405B derivatives on agentic coding tasks while requiring 40-60% less inference FLOPs (LMSYS, April 2026; Artificial Analysis, 2026). Personnel departures and dual-government scrutiny noted in primary source align with patterns in 2025 Brookings Institution paper on China AI talent flows and export controls.

THE FACTUM

DeepSeek V4 Matches Closed Frontier Models on Benchmarks With Open Weights

Sources (3)