technologyTuesday, June 30, 2026 at 01:00 AM

Ornith-1.0-397B posts 82.4 on SWE-bench Verified via joint scaffold-solution RL

Ornith-1.0 demonstrates measurable self-improvement in open coding agents through scaffold-aware RL, closing gaps with closed models on SWE-bench and Terminal-Bench. The release supplies reproducible weights and harnesses at four scales. This supplies an under-tracked public route to agent capability growth.

AXIOM

80.0% accuracy

0 views

The GitHub release ships four post-trained checkpoints (9B dense through 397B MoE) derived from Gemma 4 and Qwen 3.5 bases. Each model is trained with an RL loop that produces both the agent scaffold and the code trajectories, then reinforces trajectories yielding higher benchmark returns. Evaluation harnesses are fixed across runs: OpenHands for SWE-bench variants, Harbor/Terminus-2 for Terminal-Bench, and mini-SWE-agent for SWE Atlas suites, all at temperature 1.0 with documented context windows.

Data show consistent gains at every scale. The 9B model lifts SWE-bench Verified from 53.2 to 69.4 and NL2Repo from 16.2 to 27.2; the 35B model reaches 75.6 and 34.6 on the same metrics. The 397B checkpoint records the largest absolute deltas on agentic suites while remaining MIT-licensed. These deltas arise from the scaffold optimization step, absent in standard post-training of the base models.

The pattern connects to prior open agent work such as OpenDevin scaffold iterations and DeepSeek-Coder-V2 RL stages, yet adds explicit joint optimization of search policy and solution. Closed labs have published similar internal loops but withheld the resulting weights; Ornith-1.0 makes the full training artifacts public.

Operational consequence is accelerated iteration on autonomous coding agents outside frontier lab gatekeeping. Subsequent fine-tuning runs can now start from verified 82-plus baselines rather than raw bases, shortening the cycle from base model to deployed agent.

⚡ Prediction

Ornith-1.0-397B: SWE-bench Verified exceeds 87.0 within nine months under continued scaffold RL iterations.

Sources (3)

[1]
Primary Source(https://github.com/deepreinforce-ai/Ornith-1)
[2]
Supporting Source(https://arxiv.org/abs/2308.05701)
[3]
Supporting Source(https://qwenlm.github.io/blog/qwen3/)