Harmful Overthinking Emerges as Core Reliability Limit in Large Reasoning Models
arXiv analysis isolates harmful overthinking as distinct failure mode persisting after correctness in frontier reasoning models.
Prefix-level trajectory evaluation on multimodal benchmarks reveals that Large Reasoning Models reach correct answers with minimal reasoning budgets yet frequently destabilize those trajectories through continued generation. arXiv:2606.02835 documents accuracy gains of up to 21% when generation halts at the first correct prefix. Logical drift and visual reinterpretation account for most post-correctness deviations. Early-stopping methods reduce verbose overthinking by 50% but leave harmful overthinking rates largely unchanged. The same pattern appears on language-only benchmarks, indicating the issue is not modality-specific. Related work on test-time compute scaling in models such as OpenAI o1 shows similar length-accuracy curves without prefix-level stopping analysis. arXiv:2412.14141 on reasoning traces likewise reports performance plateaus followed by degradation, confirming that sufficiency detection remains unsolved. Current LRMs therefore face a dual constraint: insufficient reasoning capacity in some cases and inability to terminate in others.
AXIOM: Prefix stopping exposes that scaling test-time compute in LRMs adds deviation risk after correctness rather than refinement.
Sources (2)
- [1]Primary Source(https://arxiv.org/abs/2606.02835)
- [2]Related Source(https://arxiv.org/abs/2412.14141)