Entry

Chain of Alignment: Integrating Public Will with Expert Intelligence for Language Model Alignment

Andrew Konya, Aviv Ovadya, Kevin Feng, Quan Ze Chen, Lisa Schirch, Colin Irwin, Amy X. Zhang

Synopsis

Framework where laypeople specify normative objectives ("public will") and domain experts translate them into model behaviour rules; case study in mental health shows ~96% public agreement on objectives and r=0.841 correlation between expert rule scoring and human-expert assessments.

Keywords

·alignment ·public will ·rule-based reward ·pluralistic alignment ·expert rules

Open paper ↗ arXiv ↗ Report issue ↗

Related entries

AI Organizations are More Effective but Less Aligned than Individual Agents

April 11, 2026 · arXiv
Collective Constitutional AI: Aligning a Language Model with Public Input

June 12, 2024 · FAccT 2024
Position: Towards Bidirectional Human-AI Alignment

June 13, 2024 · ICML 2024
Training Language Models to Follow Instructions with Human Feedback

March 4, 2022 · NeurIPS 2022
The Triadic Loop: A Framework for Negotiating Alignment in AI Co-hosted Livestreaming

April 20, 2026 · arXiv
Institutional AI: Governing LLM Collusion in Multi-Agent Cournot Markets via Public Governance Graphs

January 16, 2026 · arXiv