AI Overestimates Behavioral Health Risks Compared to Human Advice in Experimental Study
This study conducted two controlled experiments involving 60 gender-balanced participants and 30 GPT-4 samples to examine how advice source influences judgment of behavioral health risks. Participants received risk assessments for behavioral health scenarios from either AI algorithms or human peer groups, with comparisons made between these advice sources. The primary finding was that AI systematically overestimated health risks compared to human advice, though specific effect sizes and statistical measures were not reported. In lower-threat conditions, participants showed a preference for human advice over AI, though this preference disappeared in higher-threat scenarios. Interestingly, participants who accepted AI-disagreeing advice showed greater belief updates than those following human advice.
Safety and tolerability data were not reported in this experimental study. The research focused on judgment and preference measures rather than clinical outcomes or adverse events associated with AI use in healthcare settings.
Key limitations include the experimental nature of the study, which measured perceptions and judgments in controlled tasks rather than real-world health decisions. The AI component (GPT-4) was used to generate risk ratings in a research context, not as a clinical decision aid. The findings relate to judgment processes and source preference, not diagnostic accuracy or patient management outcomes.
For practice, this research offers preliminary insights for the design of AI-based health decision support systems, suggesting that users may perceive AI risk assessments differently than human advice. However, the study does not test AI in clinical practice or measure actual health outcomes, so applications to patient care remain speculative. Clinicians should recognize these findings as early experimental evidence about judgment processes rather than guidance for clinical AI implementation.