H Human–AI Coevolution

Entry

Interactive Learning from Policy-Dependent Human Feedback

James MacGlashan, Mark K. Ho, Robert Loftin, Bei Peng, Guan Wang, David Roberts, Matthew E. Taylor, Michael L. Littman

Synopsis

Empirically shows human feedback is policy-dependent; introduces COACH (Convergent Actor-Critic by Humans) algorithm.

Keywords

·policy-dependent feedback ·COACH ·actor-critic ·interactive RL

Open paper ↗ arXiv ↗ Report issue ↗

Related entries