H Human–AI Coevolution

Entry

Chain of Alignment: Integrating Public Will with Expert Intelligence for Language Model Alignment

Andrew Konya, Aviv Ovadya, Kevin Feng, Quan Ze Chen, Lisa Schirch, Colin Irwin, Amy X. Zhang

Synopsis

Framework where laypeople specify normative objectives ("public will") and domain experts translate them into model behaviour rules; case study in mental health shows ~96% public agreement on objectives and r=0.841 correlation between expert rule scoring and human-expert assessments.

Keywords

·alignment ·public will ·rule-based reward ·pluralistic alignment ·expert rules

Open paper ↗ arXiv ↗ Report issue ↗

Related entries