H Human–AI Coevolution

Entry

Reward learning from human preferences and demonstrations in Atari

Borja Ibarz, Jan Leike, Tobias Pohlen, Geoffrey Irving, Shane Legg, Dario Amodei

Synopsis

Combines demonstrations and preferences in deep RL; reaches superhuman performance on 2/9 Atari games without ground-truth rewards.

Keywords

·preferences ·demonstrations ·Atari ·reward modeling

Open paper ↗ arXiv ↗ Report issue ↗

Related entries