Entry

Collective Constitutional AI: Aligning a Language Model with Public Input

Saffron Huang, Divya Siddarth, Liane Lovitt, Thomas I. Liao, Esin Durmus, Alex Tamkin, Deep Ganguli

Synopsis

End-to-end pipeline eliciting constitutional principles from ~1,000 Americans via Polis and training a CCAI model on the resulting public-input constitution; lower bias across nine social dimensions while maintaining equivalent task performance.

Keywords

·Constitutional AI ·public input ·Polis ·RLAIF ·alignment

Open paper ↗ arXiv ↗ Report issue ↗

Related entries

AI Organizations are More Effective but Less Aligned than Individual Agents

April 11, 2026 · arXiv
Chain of Alignment: Integrating Public Will with Expert Intelligence for Language Model Alignment

October 10, 2024 · NeurIPS 2024 Workshop on Pluralistic Alignment
Position: Towards Bidirectional Human-AI Alignment

June 13, 2024 · ICML 2024
Constitutional AI: Harmlessness from AI Feedback

December 15, 2022 · arXiv
Training Language Models to Follow Instructions with Human Feedback

March 4, 2022 · NeurIPS 2022
The Triadic Loop: A Framework for Negotiating Alignment in AI Co-hosted Livestreaming

April 20, 2026 · arXiv