H Human–AI Coevolution

Entry

Reward learning from human preferences and demonstrations in Atari

Borja Ibarz, Jan Leike, Tobias Pohlen, Geoffrey Irving, Shane Legg, Dario Amodei

Synopsis

Combines demonstrations and preferences in deep RL; reaches superhuman performance on 2/9 Atari games without ground-truth rewards.

Keywords

·preferences ·demonstrations ·Atari ·reward modeling

Open paper ↗ arXiv ↗ Report issue ↗

Related entries

Deep TAMER: Interactive Agent Shaping in High-Dimensional State Spaces

2018 · AAAI 2018
Deep Reinforcement Learning from Human Preferences

June 12, 2017 · NeurIPS 2017