Entry

Policy Shaping: Integrating Human Feedback with Reinforcement Learning

Shane Griffith, Kaushik Subramanian, Jonathan Scholz, Charles L. Isbell, Andrea L. Thomaz

Synopsis

Treats human feedback as direct policy labels; introduces Advise, a Bayesian approach robust to infrequent and inconsistent feedback.

Keywords