Entry

Large Language Model Influence on Diagnostic Reasoning: A Randomized Clinical Trial

Ethan Goh, Robert Gallo, Jason Hom, Eric Strong, Yingjie Weng, Hannah Kerman, Joséphine A. Cool, Zahir Kanjee, Andrew S. Parsons, Neera Ahuja, Eric Horvitz, Daniel Yang, Arnold Milstein, Andrew P. J. Olson, Adam Rodman, Jonathan H. Chen

Synopsis

Single-blind RCT with 50 physicians on 244 clinical vignettes; conventional + LLM did not significantly outperform conventional alone (76% vs 74%), while GPT-4 alone scored 92% — evidence physicians did not effectively integrate LLM output.

Keywords

·clinical RCT ·GPT-4 ·diagnostic reasoning ·physician decision support ·integration failure

Open paper ↗ Report issue ↗

Related entries

Homogenizing effect of large language models (LLMs) on creative diversity: An empirical comparison of human and ChatGPT writing

September 2024 · ScienceDirect 2025
Navigating the Jagged Technological Frontier: Field Experimental Evidence of the Effects of AI on Knowledge Worker Productivity and Quality

September 15, 2023 · Organization Science 2026
Examining University Students' Engagement with ChatGPT in English Essay Writing: Interaction Patterns and Perceptions

2026 · The Asia-Pacific Education Researcher 2026
Enhancing School Students' Self-Regulated Learning through Generative AI Support: A Randomized Controlled Trial

2026 · Educational Psychology Review 2026
SWE-chat: Coding Agent Interactions From Real Users in the Wild

April 22, 2026 · arXiv
The Path to Conversational AI Tutors: Integrating Tutoring Best Practices and Targeted Technologies to Produce Scalable AI Agents

February 22, 2026 · arXiv