Skin cancer scares people because missing a diagnosis can be dangerous. This analysis looked at how well different tools find dangerous spots on skin. Doctors used standard dermoscopy, AI alone, or AI helping the doctor. They tested these methods in real clinics across many locations. The goal was to see which method caught the most cancers without flagging too many harmless spots. The numbers show that standard skin checks miss some cancers. AI alone also misses some cases. But when doctors used AI to help them, they found every single cancer case. This combination caught everything without reporting false alarms about safety. The study included seventeen different groups of skin checks. It is important to note that the AI performance varied a lot depending on the settings used. More evidence is needed to confirm how well this works everywhere. The researchers suggest that AI should help doctors, not replace their skill. This partnership offers a clear path forward for safer skin checks.
Systematic review and meta-analysis of AI for melanoma risk assessment in pigmented skin lesionsAI plus doctors finds every skin cancer case in real-world settings
AI-generated summary of the cited source, checked by automated accuracy review. How we work
This is a systematic review and meta-analysis of diagnostic accuracy for malignancy risk assessment in pigmented skin lesions, including melanoma. The scope covered 17 diagnostic arms: 10 dermoscopy arms, 6 AI-alone arms, and 1 AI-assisted clinician arm, evaluated in real-world clinical settings. The authors synthesized pooled sensitivity and specificity for each modality compared to standard dermoscopy.
For dermoscopy, pooled sensitivity was 0.773 (95% CI, 0.648-0.863) and specificity was 0.793 (95% CI, 0.673-0.877). For standalone AI, pooled sensitivity was 0.757 (95% CI, 0.428-0.928) and specificity was 0.859 (95% CI, 0.619-0.958). For the single AI-assisted clinician arm, sensitivity was 1.000 and specificity was 0.837, though confidence intervals were not reported.
The authors noted that heterogeneity in AI performance was driven almost entirely by threshold effects rather than by differences in inherent model capacity. They also highlighted that more evidence is needed for AI-assisted clinicians. Limitations include the small number of AI-assisted clinician arms and the lack of reported follow-up duration.
Practice relevance is restrained; the authors suggest AI should be viewed as a complementary decision-support tool rather than a replacement for dermoscopic evaluation. The evidence base is early and incomplete, and clinicians should interpret findings with caution.