H Human–AI Coevolution

Entry

Training Language Models to Follow Instructions with Human Feedback

Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Christiano, Jan Leike, Ryan Lowe

Synopsis

Uses labeler demonstrations and ranked preferences to fine-tune GPT-3 into InstructGPT; the 1.3B InstructGPT is preferred to the 175B GPT-3.

Keywords

·RLHF ·instruction tuning ·alignment ·InstructGPT

Open paper ↗ arXiv ↗ Report issue ↗

Related entries