Mode
Text Size
Log in / Sign up

AI Overestimates Behavioral Health Risks Compared to Human Advice in Experimental StudyWhen AI gives health advice, does it make us more cautious than people do?

AI-generated summary of the cited source, checked by automated accuracy review. How we work

Key Takeaway
Note: Experimental study found AI overestimated health risks vs human advice; findings relate to judgment, not clinical outcomes.

This study conducted two controlled experiments involving 60 gender-balanced participants and 30 GPT-4 samples to examine how advice source influences judgment of behavioral health risks. Participants received risk assessments for behavioral health scenarios from either AI algorithms or human peer groups, with comparisons made between these advice sources. The primary finding was that AI systematically overestimated health risks compared to human advice, though specific effect sizes and statistical measures were not reported. In lower-threat conditions, participants showed a preference for human advice over AI, though this preference disappeared in higher-threat scenarios. Interestingly, participants who accepted AI-disagreeing advice showed greater belief updates than those following human advice.

Safety and tolerability data were not reported in this experimental study. The research focused on judgment and preference measures rather than clinical outcomes or adverse events associated with AI use in healthcare settings.

Key limitations include the experimental nature of the study, which measured perceptions and judgments in controlled tasks rather than real-world health decisions. The AI component (GPT-4) was used to generate risk ratings in a research context, not as a clinical decision aid. The findings relate to judgment processes and source preference, not diagnostic accuracy or patient management outcomes.

For practice, this research offers preliminary insights for the design of AI-based health decision support systems, suggesting that users may perceive AI risk assessments differently than human advice. However, the study does not test AI in clinical practice or measure actual health outcomes, so applications to patient care remain speculative. Clinicians should recognize these findings as early experimental evidence about judgment processes rather than guidance for clinical AI implementation.

Imagine you're trying to decide if a behavior—like drinking or skipping sleep—is risky for your health. You might ask a friend or an AI chatbot. New research suggests who you ask could change your answer more than you'd expect.

In two experiments with 60 people, researchers compared advice from AI algorithms (using GPT-4) to advice from human peer groups. They found that the AI systematically rated health behaviors as more dangerous than human advisors did. When people thought a threat was relatively low, they preferred getting advice from other humans. But when the stakes felt higher, that preference disappeared.

Interestingly, when the AI disagreed with someone's initial judgment, people were more likely to significantly change their minds than when a human disagreed. This suggests AI advice can be particularly persuasive in shifting beliefs, even if it's more cautious.

It's important to remember this was a controlled experiment about judgment, not a test of AI in real doctor's offices or clinics. The study measured how advice influences perceptions, not whether following AI or human advice leads to better health outcomes. The AI wasn't acting as a medical tool, but as another voice in the conversation.

What this means for you:
AI health advice tends to be more cautious than human advice and can strongly change what people believe.

Study Details

Study typeRct
Sample sizen = 60
EvidenceLevel 2
PublishedApr 2026
View Original Abstract ↓
Previous research has shown that advice sources influence individuals' risk perceptions and health decision-making. We conducted two experiments to examine differences in health risk assessment between AI algorithms and human peer groups, and how these assessments influence individuals' judgments of behavioral health risks. In Experiment 1, 60 participants (gender-balanced) and 30 GPT-4 samples (from independent runs with varying temperature settings) rated the perceived risk of 60 health behaviors. The results revealed that AI systematically overestimated health risks by inflating outcome severity rather than risk probability. In Experiment 2, 60 participants compared higher- or lower-threat health behaviors to judge which posed lower risk, then revised judgments after receiving advice from AI or human peer groups. The results indicated that participants preferred human advice over AI in the lower-threat condition. However, this preference disappeared in the higher-threat condition, and participants accepting AI-disagreeing advice showed greater belief updates than those following human advice. Collectively, these findings highlight how the threat context influences human-AI advice integration, offering insights for the design of effective AI-based health decision support systems.
Free Newsletter

Clinical research that matters. Delivered to your inbox.

Join thousands of clinicians and researchers. No spam, unsubscribe anytime.