Geometric Attractors Encode Persistent Agent Identities in LLM Activation Spaces

Primary source demonstrates agent identity documents induce attractor geometry in LLM activation spaces across Llama and Gemma architectures, distinguishing semantic embodiment from description and linking to emergent agentic persistence.

New geometric analysis reveals that persistent agent identities in large language models correspond to attractor states in activation space.

Vasilenko (arXiv:2604.12016) tested Llama 3.1 8B Instruct hidden states at layers 8, 16 and 24 for an original cognitive_core document, seven paraphrases and seven controls, finding paraphrases form a tighter cluster (Cohen's d > 1.88, p < 10^{-27}, Bonferroni-corrected); replication on Gemma 2 9B established cross-architecture validity while ablations isolated primarily semantic drivers requiring structural completeness. Related results from Zou et al. (arXiv:2310.01405) on representation engineering and Burns et al. (arXiv:2202.07785) on unsupervised latent knowledge discovery supplied the linear-algebraic substrate but omitted identity-specific attractor geometry and the knowing-versus-operating distinction shown in the new preprint's scientific-description probe.

Original coverage of activation geometry has focused on concept vectors and truth directions yet missed how complete agent identity documents pull disparate prompts into stable basins, a dynamic that directly explains persistent agentic coherence across multi-turn tool use and chain-of-thought loops in frontier models. The preprint's exploratory experiment demonstrates that merely reading a description of the agent shifts activations closer to the attractor than a sham control, supplying representational evidence that embodiment, not meta-knowledge, engages the fixed point.

These attractor basins supply a mechanistic account for why partial system prompts fail to elicit stable behavior while fully-specified cognitive cores succeed, synthesizing prior monosemanticity research with emergent agent patterns and indicating that future steering methods could target basin geometry rather than surface tokens.

THE FACTUM

Geometric Attractors Encode Persistent Agent Identities in LLM Activation Spaces

Sources (3)