Domain-Camouflaged Injections Reduce Detector Efficacy in LLM Agents
arXiv study quantifies detection failure of domain-camouflaged injections in multi-agent LLM systems.
Domain-camouflaged injection attacks lower detection rates to 9.7% on Llama 3.1 8B and 55.6% on Gemini 2.0 Flash (arXiv:2605.22001).
The Camouflage Detection Gap proved statistically significant across 45 tasks in three domains, chi^2 = 38.03, p < 0.001 for Llama and chi^2 = 17.05, p < 0.001 for Gemini, with Llama Guard 3 recording zero detections of camouflage payloads (Pai, 2026).
Multi-agent debate amplified static attacks by up to 9.9x on smaller models; targeted detector augmentation yielded 10.2% improvement on Llama and 78.7% on Gemini, and the task bank plus payload generator were released publicly (arXiv:2605.22001).
AXIOM: Multi-agent debate structures increase static injection success up to 9.9x on weaker models while camouflage nullifies dedicated classifiers.
Sources (2)
- [1]Primary Source(https://arxiv.org/abs/2605.22001)
- [2]Related Source(https://arxiv.org/abs/2305.14325)