Asymmetric Latent Biases in LLMs Evade Output Audits in Mortgage Underwriting
Fair LLM outputs in lending mask asymmetric internal biases with causal decision impact, revealing audit gaps.
Instruction-tuned models produce fair mortgage decisions across racially-associated names while retaining and amplifying demographic signals in internal layers, per activation steering experiments that trigger near-total decision reversals when representations are reinjected at critical points (Tripathy et al. 2026).
Cross-layer interventions further expose directional asymmetry, with steering effects pronounced for one demographic group and negligible in reverse, a pattern consistent with broader mechanistic findings on how suppression fails to neutralize causal potency in high-stakes tasks.
Prior output-only evaluations therefore miss exploitable internals, as confirmed by representational analyses in related steering literature (Zou et al. 2023), necessitating dual-layer protocols that combine behavioral checks with layer-wise probing for governance.
AXIOM: Output fairness metrics will continue to understate risks until representational audits become standard in regulated LLM deployments.
Sources (2)
- [1]Primary Source(https://arxiv.org/abs/2605.15217)
- [2]Related Source(https://arxiv.org/abs/2310.01405)