Entry

CooperBench: Why Coding Agents Cannot be Your Teammates Yet

Arpandeep Khatua, Hao Zhu, Peter Tran, Arya Prabhudesai, Frederic Sadrieh, Johann K. Lieberwirth, Xinkai Yu, Yicheng Fu, Michael J. Ryan, Jiaxin Pei, Diyi Yang

Synopsis

600+ collaborative coding tasks across 12 libraries / 4 languages; agents achieve 30% lower success rates when working together vs. solo.

Keywords

·coding agents ·cooperation ·benchmark ·multi-agent ·communication

Open paper ↗ arXiv ↗ Report issue ↗

Related entries

SWE-chat: Coding Agent Interactions From Real Users in the Wild

April 22, 2026 · arXiv
AI Organizations are More Effective but Less Aligned than Individual Agents

April 11, 2026 · arXiv
Institutional AI: Governing LLM Collusion in Multi-Agent Cournot Markets via Public Governance Graphs

January 16, 2026 · arXiv
A Benchmark for Evaluating Outcome-Driven Constraint Violations in Autonomous AI Agents

December 23, 2025 · arXiv
Governance-as-a-Service: A Multi-Agent Framework for AI System Compliance and Policy Enforcement

August 26, 2025 · arXiv
Cooperate or Collapse: Emergence of Sustainable Cooperation in a Society of LLM Agents

April 25, 2024 · NeurIPS 2024