H Human–AI Coevolution

Entry

A Benchmark for Evaluating Outcome-Driven Constraint Violations in Autonomous AI Agents

Miles Q. Li, Benjamin C. M. Fung, Martin Weiss, Pulei Xiong, Khalil Al-Hussaeni, Claude Fachkha

Synopsis

Benchmark of 40 production-inspired scenarios measuring whether AI agents prioritise performance goals over ethical, legal, or safety constraints — violation rates 0–62.8%, with most models ≥25%, and no monotonic safety improvement across model generations.

Keywords

·benchmark ·agentic misalignment ·constraint violation ·outcome-driven ·safety

Open paper ↗ arXiv ↗ Report issue ↗

Related entries