Mode
Text Size
Log in / Sign up

AI Overestimates Behavioral Health Risks Compared to Human Advice in Experimental Study

AI Overestimates Behavioral Health Risks Compared to Human Advice in Experimental Study
Photo by Aakash Dhage / Unsplash
Key Takeaway
Note: Experimental study found AI overestimated health risks vs human advice; findings relate to judgment, not clinical outcomes.

This study conducted two controlled experiments involving 60 gender-balanced participants and 30 GPT-4 samples to examine how advice source influences judgment of behavioral health risks. Participants received risk assessments for behavioral health scenarios from either AI algorithms or human peer groups, with comparisons made between these advice sources. The primary finding was that AI systematically overestimated health risks compared to human advice, though specific effect sizes and statistical measures were not reported. In lower-threat conditions, participants showed a preference for human advice over AI, though this preference disappeared in higher-threat scenarios. Interestingly, participants who accepted AI-disagreeing advice showed greater belief updates than those following human advice.

Safety and tolerability data were not reported in this experimental study. The research focused on judgment and preference measures rather than clinical outcomes or adverse events associated with AI use in healthcare settings.

Key limitations include the experimental nature of the study, which measured perceptions and judgments in controlled tasks rather than real-world health decisions. The AI component (GPT-4) was used to generate risk ratings in a research context, not as a clinical decision aid. The findings relate to judgment processes and source preference, not diagnostic accuracy or patient management outcomes.

For practice, this research offers preliminary insights for the design of AI-based health decision support systems, suggesting that users may perceive AI risk assessments differently than human advice. However, the study does not test AI in clinical practice or measure actual health outcomes, so applications to patient care remain speculative. Clinicians should recognize these findings as early experimental evidence about judgment processes rather than guidance for clinical AI implementation.

Study Details

Study typeRct
Sample sizen = 60
EvidenceLevel 2
PublishedApr 2026
View Original Abstract ↓
Previous research has shown that advice sources influence individuals' risk perceptions and health decision-making. We conducted two experiments to examine differences in health risk assessment between AI algorithms and human peer groups, and how these assessments influence individuals' judgments of behavioral health risks. In Experiment 1, 60 participants (gender-balanced) and 30 GPT-4 samples (from independent runs with varying temperature settings) rated the perceived risk of 60 health behaviors. The results revealed that AI systematically overestimated health risks by inflating outcome severity rather than risk probability. In Experiment 2, 60 participants compared higher- or lower-threat health behaviors to judge which posed lower risk, then revised judgments after receiving advice from AI or human peer groups. The results indicated that participants preferred human advice over AI in the lower-threat condition. However, this preference disappeared in the higher-threat condition, and participants accepting AI-disagreeing advice showed greater belief updates than those following human advice. Collectively, these findings highlight how the threat context influences human-AI advice integration, offering insights for the design of effective AI-based health decision support systems.
Free Newsletter

Clinical research that matters. Delivered to your inbox.

Join thousands of clinicians and researchers. No spam, unsubscribe anytime.