RMA Agentic Framework Solves Eight of Ten Research-Level Math Problems
RMA demonstrates agentic scaling to genuine research mathematics.
RMA, detailed at arXiv:2605.22875, solves eight of ten problems on the First Proof benchmark via multi-agent decomposition into analysis, literature, and verification modules. RMA outperforms GPT-5.2R and Aletheia through iterative refinement and shared structured memory on expert-contributed open problems. Ablations isolate gains to coordinated roles rather than single components, extending prior formal systems like AlphaProof (DeepMind 2024) that remained limited to competition mathematics. The approach fills gaps in earlier agent frameworks by requiring literature grounding and long-horizon proof iteration absent from benchmarks such as those in ToRA (Yue et al., 2023).
RMA: Multi-role agents with verifier feedback enable the shift from olympiad solvers to literature-driven research automation.
Sources (3)
- [1]Primary Source(https://arxiv.org/abs/2605.22875)
- [2]Related Source(https://deepmind.google/discover/blog/alphaproof/)
- [3]Related Source(https://arxiv.org/abs/2309.17452)