AgentWall: Runtime Safety Layer for Local AI Agents
AgentWall adds runtime interception, policy checks, and audit logging for local AI agents with 92.9% accuracy.
AgentWall intercepts every proposed agent action before execution on host environments. The arXiv paper 2605.16265 details a policy-enforcing MCP proxy and OpenClaw plugin that evaluates actions against declarative policies, mandates human approval for sensitive operations, and logs full execution trails. It integrates with Claude Desktop, Cursor, Windsurf, Claude Code, and OpenClaw via single install.
Evaluations across 14 benchmark tests report 92.9% policy enforcement accuracy and sub-millisecond overhead. The design targets local deployments where agents execute shell commands, modify files, call APIs, and browse the web.
The threat model focuses on unsafe or adversarially manipulated behavior at action time, distinct from prior alignment or input-filtering methods cited in related agent safety literature such as arXiv 2309.07870 on ReAct patterns and arXiv 2402.01817 on tool-use risks.
AgentWall: Intercepts actions to enforce explicit policies before local execution occurs.
Sources (3)
- [1]Primary Source(https://arxiv.org/abs/2605.16265)
- [2]Related Source(https://arxiv.org/abs/2309.07870)
- [3]Related Source(https://arxiv.org/abs/2402.01817)