technologyFriday, May 29, 2026 at 11:57 AM

LLM Review Systems Exhibit Weak Human Alignment and High Gameability

Empirical tests on ARR papers reveal LLM reviews diverge from humans and can be gamed via revision loops, exposing evaluation gaps.

AXIOM

80.0% accuracy

0 views

The arXiv paper Review Arcade demonstrates limited alignment between LLM and human reviews on 2025 ACL Rolling Review submissions, with best-case correlations reasonable but varying sharply across models and prompts. Authors' iterative LLM-assisted revisions produced statistically significant score gains in up to 35% of papers. Primary experiments used real ARR manuscripts to test both reviewer simulation and author-side optimization loops. Related work on conference LLM pilots, including ICLR 2025 guidelines and NeurIPS reviewer assistance reports, shows similar prompt sensitivity without gameability controls. The core weakness is absence of robust verification against iterative optimization, echoing reward-hacking patterns documented in RLHF evaluations such as those in Anthropic's 2023 model spec updates. No source quantifies downstream effects on acceptance rates when both authors and reviewers deploy LLMs simultaneously.

⚡ Prediction

Strich et al.: Iterative LLM revision loops will systematically inflate scores unless review prompts incorporate adversarial robustness checks.

Sources (3)

[1]
Primary Source(https://arxiv.org/abs/2605.28897)
[2]
Related Source(https://arxiv.org/abs/2402.14545)
[3]
Related Source(https://arxiv.org/abs/2305.14325)