Entry
Chain of Alignment: Integrating Public Will with Expert Intelligence for Language Model Alignment
Andrew Konya, Aviv Ovadya, Kevin Feng, Quan Ze Chen, Lisa Schirch, Colin Irwin, Amy X. Zhang
Framework where laypeople specify normative objectives ("public will") and domain experts translate them into model behaviour rules; case study in mental health shows ~96% public agreement on objectives and r=0.841 correlation between expert rule scoring and human-expert assessments.
·alignment ·public will ·rule-based reward ·pluralistic alignment ·expert rules
- AI Organizations are More Effective but Less Aligned than Individual AgentsApril 11, 2026 · arXiv
- Collective Constitutional AI: Aligning a Language Model with Public InputJune 12, 2024 · FAccT 2024
- Position: Towards Bidirectional Human-AI AlignmentJune 13, 2024 · ICML 2024
- Training Language Models to Follow Instructions with Human FeedbackMarch 4, 2022 · NeurIPS 2022
- The Triadic Loop: A Framework for Negotiating Alignment in AI Co-hosted LivestreamingApril 20, 2026 · arXiv
- Institutional AI: Governing LLM Collusion in Multi-Agent Cournot Markets via Public Governance GraphsJanuary 16, 2026 · arXiv