Mode
Text Size
Log in / Sign up

Deep learning algorithms outperform ophthalmologists in detecting AMD and differentiating wet from dry casesDeep learning algorithms outperform doctors in detecting age-related macular degeneration

AI-generated summary of the cited source, checked by automated accuracy review. How we work

Key Takeaway
Note that DL algorithms show superior sensitivity over senior ophthalmologists for AMD detection but require local calibration.

This systematic review and meta-analysis evaluates the diagnostic performance of deep learning (DL) algorithms for detecting age-related macular degeneration (AMD) and differentiating between wet (wAMD) and dry (dAMD) subtypes. The analysis synthesized data from 77,485 samples for AMD detection and 28,705 samples for subtype classification.

For AMD detection, DL algorithms demonstrated high performance with a sensitivity of 0.98 (95% CI 0.96-0.99) and an AUC of 1.00 (95% CI 0.99-1.00), outperforming senior ophthalmologists who achieved a sensitivity of 0.75. For differentiating wAMD from dAMD, DL algorithms showed a sensitivity of 0.95 (95% CI 0.91-0.97) and an AUC of 0.99 (95% CI 0.97-0.99).

The authors noted several limitations, including high heterogeneity among studies, wide prediction intervals, and predominantly retrospective designs. There is also a risk of performance inflation from internal validation. While DL algorithms showed superior and more balanced diagnostic performance compared to clinicians in head-to-head evidence, the findings are preliminary. These tools should be viewed as triage adjuncts requiring local calibration rather than autonomous replacements for clinical diagnosis.

How this fits prior evidence

This meta-analysis addresses a gap in evaluating automated diagnostic tools for age-related macular degeneration. While prior coverage noted that AI-based fluid volume quantification predicts visual acuity in neovascular AMD better than CST, this study specifically evaluates the diagnostic accuracy of deep learning algorithms compared to human ophthalmologists. It provides evidence on the technical performance of DL for both initial detection and subtype classification.

When vision fades due to age-related macular degeneration (AMD), early and accurate diagnosis is everything. A large review of over 77,000 retinal images shows that deep learning algorithms—a type of advanced computer software—can identify the disease with a 98% sensitivity rate. This outperformed senior ophthalmologists, who had a 75% sensitivity rate in the same tests.

The technology also proved highly effective at distinguishing between wet and dry AMD, both types of the condition. The algorithms showed 95% accuracy in telling these two versions apart. While the software performed very well in these head-to-head comparisons, researchers note that the results come from a mix of different studies with varying designs.

Because the data comes from mostly past records and shows some variation between studies, this technology is not yet ready to replace doctors entirely. Instead, it is seen as a powerful tool to help triage patients more quickly. It acts like a high-tech assistant that can catch issues early while the medical team provides the final care.

What this means for you:
Deep learning software shows higher accuracy than senior specialists in detecting and classifying macular degeneration.

Common questions

How accurate is the AI at finding eye disease?

The study found that deep learning algorithms had a 98% sensitivity rate for detecting age-related macular degeneration. This was significantly higher than the 75% sensitivity rate reported by senior ophthalmologists in the same comparison.

Can the software tell the difference between wet and dry AMD?

Yes, the deep learning algorithms showed a 95% accuracy rate when distinguishing between wet and dry age-related macular degeneration. This helps doctors identify which specific type of the condition is affecting a patient's vision.

Will this technology replace my eye doctor?

No, the findings suggest that these tools are not meant to replace doctors. Instead, they are viewed as a triage adjunct, meaning they help doctors work more efficiently by providing highly accurate initial screenings and data.

Study Details

Study typeMeta analysis
EvidenceLevel 1
PublishedJun 2026
View Original Abstract ↓
BACKGROUND: Age-related macular degeneration (AMD) is a leading cause of irreversible blindness worldwide. Retinal imaging and deep learning (DL) may support scalable screening, but deployment requires evidence on pooled performance. This is important because missed neovascular disease may delay treatment, whereas excessive false positives may overload referral pathways. OBJECTIVE: This study aimed to compare the diagnostic performance of DL algorithms with ophthalmologists for detecting AMD and differentiating wet AMD (wAMD) from dry AMD (dAMD) and to identify factors that influence DL performance. METHODS: PubMed, Embase, Web of Science, and the Cochrane Library were searched through October 5, 2025, and updated on April 19, 2026. Eligible studies applied DL to classify AMD from normal retinas or wAMD from dAMD using retinal images. Two reviewers (MHT and XL) independently extracted data and assessed risk of bias using the Prediction model Risk Of Bias Assessment Tool for Artificial Intelligence (PROBAST+AI) tool. Pooled sensitivity, specificity, accuracy, and area under the curve were estimated using bivariate random-effects models. Clinician comparisons were stratified by experience (junior vs senior). Small-study effects were assessed via Deeks' funnel plot asymmetry test. Evidence certainty was appraised using the Grading of Recommendations, Assessment, Development, and Evaluation framework. The protocol was registered in the International Prospective Register of Systematic Reviews (PROSPERO; CRD420251243276). RESULTS: Overall, 28 studies were included, comprising 77,485 samples for AMD detection and 28,705 samples for wAMD versus dAMD classification. For AMD detection, DL achieved a pooled sensitivity of 0.98 (95% CI 0.96-0.99; prediction interval [PI] 0.95-0.99), specificity of 0.98 (95% CI 0.95-0.99; PI 0.95-0.99), accuracy of 0.97 (95% CI 0.96-0.99), and area under the curve of 1.00 (95% CI 0.99-1.00). For wAMD versus dAMD, DL showed sensitivity of 0.95 (95% CI 0.91-0.97; PI 0.89-0.97), specificity of 0.95 (95% CI 0.93-0.97; PI 0.92-0.97), accuracy of 0.95 (95% CI 0.92-0.97), and area under the curve of 0.99 (95% CI 0.97-0.99). DL showed higher sensitivity than senior ophthalmologists for AMD (0.98 vs 0.75; P<.001) and higher specificity and accuracy than junior ophthalmologists for wAMD classification. Optical coherence tomography-based models performed more consistently than color fundus photography or multimodal models. Evidence certainty was moderate. CONCLUSIONS: Compared with ophthalmologists, DL algorithms demonstrated superior and more balanced diagnostic performance in the available head-to-head evidence, potentially providing a consistent decision-support baseline that mitigates human threshold-dependent trade-offs. However, high heterogeneity, wide PIs, predominantly retrospective designs, and possible performance inflation from internal validation mean that these relative performance findings remain preliminary rather than deployment ready. DL should be viewed as a triage adjunct requiring local calibration, not an autonomous diagnostic replacement. Prospective, multicenter, patient-level external validation with prespecified human comparison arms is required.
Free Newsletter

Clinical research that matters. Delivered to your inbox.

Join thousands of clinicians and researchers. No spam, unsubscribe anytime.