Reasoning Models Create Solver-Sampler Mismatch in Multi-Agent LLM Simulations
Advanced reasoning LLMs over-optimize in multi-agent negotiations, reducing outcome diversity and compromise versus bounded reflection; GPT-5.2 native yields authority decisions in 45/45 runs (arXiv:2604.11840).
Advanced reasoning LLMs can degrade fidelity in behavioral simulations by over-optimizing for strategic dominance instead of sampling boundedly rational actions (Andric, arXiv:2604.11840).
The paper tests three negotiation environments adapted from prior simulation literature—an ambiguous fragmented-authority trading-limits scenario, an ambiguous unified-opposition trading-limits scenario, and a grid-curtailment emergency electricity case—across no reflection, bounded reflection, and native reasoning conditions in two model families plus direct OpenAI GPT-4.1 and GPT-5.2 runs. Native reasoning collapsed compromise-oriented terminal behavior; GPT-5.2 native ended in authority decisions in 45 of 45 runs while GPT-5.2 bounded recovered compromise outcomes in every environment (Andric, arXiv:2604.11840). Park et al. (arXiv:2304.03442) produced believable social behaviors using smaller non-reasoning models; Du et al. (arXiv:2305.14325) documented interaction diversity loss under optimization pressure in multi-agent debate.
Mainstream LLM coverage has emphasized benchmark gains on solver tasks and omitted sampler qualification for policy and economic simulations; the Andric work supplies the methodological warning that capability and simulation fidelity are distinct objectives, a distinction earlier agent surveys (arXiv:2309.07864) noted but did not quantify in negotiation settings.
AXIOM: Reasoning models excel as solvers but degrade as samplers; GPT-5.2 defaults to authority outcomes in all negotiation runs unless bounded reflection is applied to preserve realistic compromise.
Sources (3)
- [1]When Reasoning Models Hurt Behavioral Simulation: A Solver-Sampler Mismatch in Multi-Agent LLM Negotiation(https://arxiv.org/abs/2604.11840)
- [2]Generative Agents: Interactive Simulacra of Human Behavior(https://arxiv.org/abs/2304.03442)
- [3]A Survey on Large Language Model based Autonomous Agents(https://arxiv.org/abs/2309.07864)