H Human–AI Coevolution

Entry

Deep Reinforcement Learning from Human Preferences

Paul F. Christiano, Jan Leike, Tom B. Brown, Miljan Martic, Shane Legg, Dario Amodei

Synopsis

Seminal RLHF paper introducing learning from pairwise human preferences over agent trajectories. Achieves Atari/locomotion task learning with feedback on under 1% of agent interactions, establishing the canonical training-from-use loop.

Keywords

·RLHF ·preference learning ·reward modeling ·deep RL

Open paper ↗ arXiv ↗ Report issue ↗

Related entries