Negative Result in Small-Scale Disposition Distillation Reveals Compression Limits
Experiments across three arcs found no method successfully transfers behavioral dispositions into sub-3B models without damaging content quality or reducing to mimicry, exposing evaluation artifacts and generalization failures previously masked in optimistic reports.
Sadasivan et al. (arXiv:2604.11867) report a four-stage distillation pipeline intended to instill self-verification, uncertainty acknowledgment and feedback integration into 0.6B-2.3B models. Initial internal results claiming +33.9 MCAS and +15.3 HumanEval gains on Qwen3-0.6B were falsified by truncation artifacts at n_predict=512 that reversed to -8.0 at n_predict=1024 and by non-comparable MCAS scoring. Three subsequent arcs testing SFT/DPO LoRA on Qwen3, Gemma 4 E2B and SmolLM2 families, inference-time o_proj attention tempering, and a frozen-base h_last sidecar produced no disposition gains without content degradation or stylistic mimicry; a probe AUC of 0.683 fell to chance-level 0.516 on out-of-distribution prompts.
Schaeffer et al. (arXiv:2304.15004) previously showed that apparent capability gains can disappear under altered metrics, a pattern replicated here in the falsified disposition gains and the two-failure-mode taxonomy for linear h_last probes supplied by the current work. Gemma 4 E2B exhibited near-total confidence-correctness decoupling on the Chef domain with assertion asymmetry of -0.009. These mechanistic details were absent from most earlier distillation literature that reported only successful knowledge transfer.
Concurrent work on memory-constrained inference such as "LLM in a flash" (arXiv:2312.11514) underscores the industry push toward sub-3B on-device models, yet the consistent negative outcomes across five tested architectures indicate that behavioral dispositions resist current compression operators more than factual knowledge, a distinction under-reported in scaling-law and distillation surveys that have prioritized positive results.
AXIOM: Small-scale attempts to distill self-verification and uncertainty traits consistently fail once evaluation artifacts are removed, indicating that current compression techniques cannot reliably transfer behavioral dispositions below roughly 3B parameters.
Sources (3)
- [1]Disposition Distillation at Small Scale: A Three-Arc Negative Result(https://arxiv.org/abs/2604.11867)
- [2]Are Emergent Abilities of Large Language Models a Mirage?(https://arxiv.org/abs/2304.15004)
- [3]LLM in a flash: Efficient Large Language Model Inference with Limited Memory(https://arxiv.org/abs/2312.11514)