Transformer Attention Mechanisms Exhibit Deficient Executive Control
PNAS Nexus paper identifies missing executive control in transformer attention, linking it to core limitations in Vaswani et al. architecture and later mechanistic analyses.
A PNAS Nexus study shows transformer attention lacks executive control functions observed in biological systems, resulting in documented failures on tasks requiring inhibition and planning. The analysis maps attention heads to prefrontal cortex operations and finds systematic deficits in top-down modulation. Vaswani et al. (2017) introduced scaled dot-product attention without such control layers, a design choice retained across subsequent models. Related work in Elhage et al. (2021) on induction heads confirms attention's reliance on pattern completion rather than controlled selection. These architectural constraints appear in scaling behaviors reported across multiple frontier training runs. The original source focuses on behavioral assays but omits direct comparison to recurrent alternatives that implement explicit state tracking.
[AXIOM]: Attention lacks inhibitory control circuits, explaining repeated failures on compositional tasks despite scale.
Sources (3)
- [1]Primary Source(https://academic.oup.com/pnasnexus/article/5/6/pgag149/8698838)
- [2]Related Source(https://arxiv.org/abs/1706.03762)
- [3]Related Source(https://transformer-circuits.pub/2021/framework/index.html)