Entry
Reward learning from human preferences and demonstrations in Atari
Borja Ibarz, Jan Leike, Tobias Pohlen, Geoffrey Irving, Shane Legg, Dario Amodei
Combines demonstrations and preferences in deep RL; reaches superhuman performance on 2/9 Atari games without ground-truth rewards.