AI Agents Reproduce Social Science Results from Methods Text Alone

LLM agents autonomously interpret social-science papers, write execution code in isolation, and match results, accelerating validation while exposing underspecification risks when scaled.

Recent advances show LLM agents reproducing social science results given only methods descriptions and original data (Kohler, arXiv:2604.21965, 2026).

The introduced agentic system extracts structured methods, enforces strict information isolation so agents never access original code or results, and applies deterministic cell-level output comparison plus error attribution to trace discrepancies. Evaluation across four scaffolds and four LLMs on 48 human-verified reproducible papers found agents largely recover published findings, yet success varies sharply by model, scaffold, and paper; root causes split between agent mistakes and pervasive underspecification in the source literature. This extends prior replication efforts that supplied both data and code, revealing limits when agents must interpret natural-language methods (Open Science Collaboration, Science, 2015).

Original coverage of the preprint understates connections to the replication crisis, where only 36 percent of psychology studies replicated under stricter conditions. Autonomous agents could accelerate verification by orders of magnitude, but patterns from training data risk codifying common statistical missteps or p-hacking at scale rather than exposing them. What the paper itself misses is how agent "success" on underspecified work may create false confidence in fragile results when thousands of reproductions run without human oversight.

Synthesizing with autonomous discovery systems such as Sakana AI's AI Scientist framework demonstrates an emerging pipeline where reading, coding, execution, and iteration close the loop (Lu et al., arXiv:2408.06292, 2024). Acceleration is clear: months of human replication compressed to hours. Yet genuine risks around reproducibility at scale remain understated; agents filling methodological gaps from corpus patterns could propagate rather than correct errors, turning the reproducibility crisis into an automated one.

THE FACTUM

AI Agents Reproduce Social Science Results from Methods Text Alone

Sources (3)