H Human–AI Coevolution

Entry

Collective Constitutional AI: Aligning a Language Model with Public Input

Saffron Huang, Divya Siddarth, Liane Lovitt, Thomas I. Liao, Esin Durmus, Alex Tamkin, Deep Ganguli

Synopsis

End-to-end pipeline eliciting constitutional principles from ~1,000 Americans via Polis and training a CCAI model on the resulting public-input constitution; lower bias across nine social dimensions while maintaining equivalent task performance.

Keywords

·Constitutional AI ·public input ·Polis ·RLAIF ·alignment

Open paper ↗ arXiv ↗ Report issue ↗

Related entries