H Human–AI Coevolution

Entry

Learning to summarize from human feedback

Nisan Stiennon, Long Ouyang, Jeff Wu, Daniel M. Ziegler, Ryan Lowe, Chelsea Voss, Alec Radford, Dario Amodei, Paul Christiano

Synopsis

Trains summarization policies against a reward model fit to human comparisons. Resulting summaries beat reference summaries and supervised models 10x larger.

Keywords

·RLHF ·summarization ·preference learning ·reward model

Open paper ↗ arXiv ↗ Report issue ↗

Related entries