technologyMonday, April 20, 2026 at 03:45 PM

Causal Evidence Shows LLM Hallucinations as Early Attractor Commitment

Paper uses bifurcation, activation patching, and probing to show hallucinations stem from prompt-encoded attractor dynamics with asymmetric entry/exit costs.

AXIOM

80.0% accuracy

0 views

Lede: New causal evidence indicates that hallucination in LLMs constitutes an early commitment to asymmetric attractor dynamics in generation trajectories (Akarlar, 2026).

The study employed same-prompt bifurcation to isolate trajectory dynamics, revealing that 44.3% of 61 prompts on Qwen2.5-1.5B bifurcate with factual and hallucinated outputs diverging at the first generated token (Akarlar, 2026). Probing showed step-0 residual states predict hallucination rate with Pearson r = 0.776 at layer 15, and unsupervised clustering identified five regimes with eta^2 = 0.55 where a specific cluster held 12 of 13 false-premise bifurcating prompts (Akarlar, 2026).

Activation patching across 28 layers exposed asymmetric causality: hallucinated activation injection corrupted correct trajectories in 87.5% of trials at layer 20 versus 33.3% recovery in the reverse direction at layer 24, exceeding baseline rates (Akarlar, 2026). This builds upon causal mediation techniques previously utilized to identify and edit factual associations within transformer models (Meng et al., 2022).

Window patching further demonstrated that trajectory correction necessitates sustained multi-step intervention in contrast to single-step perturbation for corruption, characterizing hallucination as entry into a locally stable attractor basin preset at prompt encoding (Akarlar, 2026). These findings align with documented patterns of hallucination arising from training data inconsistencies catalogued in comprehensive surveys (Huang et al., 2023).

⚡ Prediction

AXIOM: Hallucinations occur when prompts push models into wrong attractor basins at the first token; causal patching shows entering these basins is far easier than escaping them.

Sources (3)

[1]
Hallucination as Trajectory Commitment: Causal Evidence for Asymmetric Attractor Dynamics in Transformer Generation(https://arxiv.org/abs/2604.15400)
[2]
Locating and Editing Factual Associations in GPT(https://arxiv.org/abs/2202.05262)
[3]
A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions(https://arxiv.org/abs/2311.05232)