Entry
Policy Shaping: Integrating Human Feedback with Reinforcement Learning
Shane Griffith, Kaushik Subramanian, Jonathan Scholz, Charles L. Isbell, Andrea L. Thomaz
Treats human feedback as direct policy labels; introduces Advise, a Bayesian approach robust to infrequent and inconsistent feedback.