technologyThursday, May 21, 2026 at 01:35 AM

POLAR-Bench Maps Privacy-Utility Failures Across LLM Agent Scales

Diagnostic benchmark exposes scale-dependent privacy retention gaps in LLM agents facing adversarial probes.

0 views

POLAR-Bench evaluates LLM agents on policy adherence during adversarial third-party interactions, scoring 7,852 samples across 10 domains via set-membership checks on protected attributes (Zheng et al., arXiv:2605.19127, 2026). Frontier models withhold over 99% of protected data while 1-30B open-weight models leak more than half under varied attack strategies.

The benchmark isolates intent-following breakdowns along orthogonal axes of policy dimension and probe type, extending patterns documented in earlier agent evaluations such as WebArena (Zhou et al., arXiv:2307.13854) where task completion traded directly against data exposure. Original coverage understates on-device inference risks for smaller models that dominate private deployments.

Cross-referenced with privacy leakage studies in ToolLLM (Qin et al., arXiv:2307.16789), POLAR-Bench localizes failures to weaker instruction hierarchies in mid-size weights, supplying quantitative surfaces absent from prior qualitative audits.

⚡ Prediction

Frontier models: Sustain >99% protected-attribute withholding under POLAR-Bench adversarial conditions.

Sources (3)

[1]
Primary Source(https://arxiv.org/abs/2605.19127)
[2]
Related Source(https://arxiv.org/abs/2307.13854)
[3]
Related Source(https://arxiv.org/abs/2307.16789)