H Human–AI Coevolution

Entry

Large Language Model Influence on Diagnostic Reasoning: A Randomized Clinical Trial

Ethan Goh, Robert Gallo, Jason Hom, Eric Strong, Yingjie Weng, Hannah Kerman, Joséphine A. Cool, Zahir Kanjee, Andrew S. Parsons, Neera Ahuja, Eric Horvitz, Daniel Yang, Arnold Milstein, Andrew P. J. Olson, Adam Rodman, Jonathan H. Chen

Synopsis

Single-blind RCT with 50 physicians on 244 clinical vignettes; conventional + LLM did not significantly outperform conventional alone (76% vs 74%), while GPT-4 alone scored 92% — evidence physicians did not effectively integrate LLM output.

Keywords

·clinical RCT ·GPT-4 ·diagnostic reasoning ·physician decision support ·integration failure

Open paper ↗ Report issue ↗

Related entries