Middle Layers Localized as Key to SFT Instruction-Following Across 1B-32B LLMs
Layer-wise SFT analysis identifies stable middle layers as locus of instruction-following, enabling Mid-Block tuning that exceeds LoRA performance with reduced parameters.
A comprehensive layer-wise study of supervised fine-tuning in LLMs from 1B to 32B parameters reveals middle layers (20-80%) remain stable while final layers are highly sensitive (Zhao et al., https://arxiv.org/abs/2604.11838). This work goes beyond prior uniform fine-tuning analyses by employing geometric and optimization metrics to pinpoint mechanistic changes.
Previous coverage on parameter-efficient methods like LoRA (Hu et al., https://arxiv.org/abs/2106.09685) overlooked this depth-dependent pattern, often assuming distributed adaptations. Connections to key-value memory mechanisms in transformer layers (Geva et al., https://arxiv.org/abs/2012.14913) were not synthesized, missing opportunities for targeted optimization.
By introducing Mid-Block Efficient Tuning, the authors demonstrate improved performance on reasoning tasks with lower overhead, highlighting that effective alignment is architecturally localized. This insight is vital for future techniques amid growing model scales, correcting the assumption that full-layer updates are necessary for alignment.
AXIOM: SFT changes concentrate in intermediate LLM layers rather than uniformly across the model. Targeting mid-blocks delivers better alignment and reasoning results with far lower parameter cost as scales increase.
Sources (3)
- [1]A Layer-wise Analysis of Supervised Fine-Tuning(https://arxiv.org/abs/2604.11838)
- [2]LoRA: Low-Rank Adaptation of Large Language Models(https://arxiv.org/abs/2106.09685)
- [3]Transformer Feed-Forward Layers Are Key-Value Memories(https://arxiv.org/abs/2012.14913)