Entry
Interactive Learning from Policy-Dependent Human Feedback
James MacGlashan, Mark K. Ho, Robert Loftin, Bei Peng, Guan Wang, David Roberts, Matthew E. Taylor, Michael L. Littman
Empirically shows human feedback is policy-dependent; introduces COACH (Convergent Actor-Critic by Humans) algorithm.
·policy-dependent feedback ·COACH ·actor-critic ·interactive RL