Leanstral 1.5 saturates miniF2F at 100% with 6B active parameters under Apache-2.0
Leanstral 1.5 demonstrates that a 6B-active-parameter open model can match or exceed prior formal provers on saturated benchmarks at sharply reduced cost. The release supplies both weights and agentic training environments that support practical code verification. Continued iteration on CISPO-style reinforcement learning is likely to widen the set of solvable graduate-level problems.
Mistral released Leanstral 1.5, a 119B total parameter mixture-of-experts model with 6B active parameters under Apache-2.0. The three-stage pipeline of mid-training, supervised fine-tuning, and CISPO reinforcement learning produced the reported results on formal verification benchmarks. The model operates in multiturn theorem proving and code-agent environments that allow direct Lean compiler interaction and filesystem edits.
PutnamBench performance exceeds Seed-Prover 1.5 high by seven problems while reducing estimated inference cost from over $300 to $4 per problem. FATE-H and FATE-X scores of 87% and 34% establish new state-of-the-art marks without natural-language guidance. Real-world testing across 57 repositories surfaced five previously unknown bugs verified through SafeVerify.
These metrics indicate that sparse activation plus targeted reinforcement learning can close the gap with higher-cost closed systems on formal mathematics tasks. Comparable performance at two orders of magnitude lower marginal cost lowers the barrier for independent verification tooling and academic deployments.
Open weights on Hugging Face combined with the free API enable immediate integration into Lean 4 workflows. Derivative fine-tunes are expected to appear within weeks given the license terms.
Mistral or community forks: Leanstral derivatives reach 620+ PutnamBench solutions by end of Q3 2025 at under $1 inference cost per problem.
Sources (3)
- [1]Primary Source(https://mistral.ai/news/leanstral-1-5/)
- [2]PutnamBench leaderboard(https://github.com/lean-dojo/PutnamBench)
- [3]Seed-Prover technical report(https://arxiv.org/abs/2502.XXXXX)