Encoder Learns Collatz Structure Early but Decoder Bottleneck Delays Generalization
Study on encoder-decoder models for Collatz prediction shows representations form early while generalization lags due to decoder bottleneck; base representation controls learnability, explaining grokking delay beyond prior accounts.
Transformers acquire correct parity and residue representations for one-step Collatz prediction within the first few thousand training steps while output accuracy remains near chance for tens of thousands of steps thereafter (arXiv:2604.13082). This delay stems from limited decoder access to already-formed encoder structure rather than failure to learn the structure.
Causal interventions confirm the decoder bottleneck: transplanting a trained encoder into a fresh model accelerates grokking 2.75 times while transplanting a trained decoder harms performance; freezing a converged encoder and retraining only the decoder eliminates the plateau entirely, reaching 97.6% accuracy versus 86.1% for joint training (arXiv:2604.13082). The work synthesizes the original grokking observations (Power et al., arXiv:2201.02177) with mechanistic progress measures (Nanda et al., arXiv:2301.05217), revealing that prior coverage attributed the delay solely to late rule acquisition and missed the representation-behavior gap.
Numeral base acts as a strong inductive bias: bases whose factorization aligns with the Collatz map (e.g., base 24) enable 99.8% accuracy while binary causes representational collapse with zero recovery, a factor overlooked in earlier grokking studies. The results illuminate core gaps in how deep learning systems transition from early pattern-matching representations to robust arithmetic reasoning across algorithmic tasks.
AXIOM: Neural networks build correct internal representations of algorithmic structure early in training but cannot translate them into generalized behavior until decoder layers align, explaining the sudden phase shift in grokking as an access problem rather than a learning problem.
Sources (3)
- [1]The Long Delay to Arithmetic Generalization: When Learned Representations Outrun Behavior(https://arxiv.org/abs/2604.13082)
- [2]Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets(https://arxiv.org/abs/2201.02177)
- [3]Progress Measures for Grokking via Mechanistic Interpretability(https://arxiv.org/abs/2301.05217)