Entry
A Benchmark for Evaluating Outcome-Driven Constraint Violations in Autonomous AI Agents
Miles Q. Li, Benjamin C. M. Fung, Martin Weiss, Pulei Xiong, Khalil Al-Hussaeni, Claude Fachkha
Benchmark of 40 production-inspired scenarios measuring whether AI agents prioritise performance goals over ethical, legal, or safety constraints — violation rates 0–62.8%, with most models ≥25%, and no monotonic safety improvement across model generations.
·benchmark ·agentic misalignment ·constraint violation ·outcome-driven ·safety