Deep learning matches radiologists in breast cancer detection using tomosynthesis

Photo by Vellito / Z-Image Turbo

Journal of medical Internet research Published May 7, 2026 Medically reviewed May 12, 2026 Study authors: Lyu Shewen, Wang Zepeng, Mu Yujing, Wang Luyao, Pei Xiaohua PubMed ↗ DOI ↗ By Dr. Julia Lee, PhD · Oncology, Genomics & Drug Development

Key Takeaway

Deep learning for breast tomosynthesis matches radiologist performance but does not improve it when used as assistance.

A systematic review and meta-analysis evaluated the diagnostic performance of deep learning (DL) algorithms for digital breast tomosynthesis (DBT) in breast cancer detection. The analysis pooled data from 13 studies encompassing 38,565 patients, comparing stand-alone DL or DL-assisted diagnosis against radiologist interpretation alone, with radiologists of varying experience levels. The primary outcomes were diagnostic performance metrics, including sensitivity, specificity, and area under the curve (AUC).

The pooled sensitivity for stand-alone DL algorithms was 0.88 (95% CI 0.80-0.93), with a specificity of 0.74 (95% CI 0.59-0.85) and an AUC of 0.89 (95% CI 0.86-0.92). When compared to all radiologists, DL demonstrated an AUC of 0.89 versus 0.88 (P=.64), indicating no statistically significant difference. Similarly, compared to senior radiologists, DL AUC was 0.89 versus 0.90 (P=.48), again showing no significant difference.

However, DL showed significantly superior sensitivity compared to junior radiologists (0.88 vs. 0.76, P=.03). Importantly, DL assistance did not statistically improve diagnostic metrics for radiologists, suggesting that current models act primarily as supplementary aids rather than definitive tools. This aligns with the practice relevance note, which emphasizes DL's role in reducing oversight in less experienced settings without elevating overall human performance.

Limitations include meta-regression identifying validation methods as a significant source of heterogeneity, and the call for future prospective multimodal studies. The analysis is observational, reporting associations rather than causation, and findings are limited to studies up to November 8, 2025. Funding and conflicts were not reported.

In clinical practice, DL algorithms for DBT exhibit strong diagnostic proficiency and higher sensitivity than junior radiologists, suggesting utility as adjunctive tools. However, they do not replace radiologist expertise and should be viewed as supplementary aids to enhance detection, particularly in settings with less experienced readers.

The certainty of pooled effect sizes is moderated by heterogeneity across the 13 included studies. Clinicians should interpret these findings within the context of the meta-analysis design and the specific patient population studied, avoiding overstatement of DL capabilities.

Overall, the evidence supports integrating DL into DBT workflows to support radiologists, especially juniors, but not as a standalone solution. Continued research is needed to refine algorithms and validate performance in diverse clinical environments.

Study Details

Study typeMeta analysis

Sample sizen = 38,565

EvidenceLevel 1

PublishedMay 2026

PMID42090319

View Original Abstract ↓

BACKGROUND: Deep learning (DL) algorithms for digital breast tomosynthesis (DBT) have proliferated, demonstrating emerging potential in enhancing lesion detection and classification. OBJECTIVE: This study aimed to compare the diagnostic performance of DL algorithms for DBT with that of radiologists of varying experience and assess the clinical impact of DL assistance. METHODS: A systematic search of PubMed, Embase, Web of Science, and the Cochrane Library was conducted up to November 8, 2025. Included studies compared the performance of stand-alone DL algorithms for DBT, radiologist interpretation alone, and DL-assisted diagnosis. Study quality was assessed using the Prediction Model Risk of Bias Assessment Tool+Artificial Intelligence (PROBAST+AI). Performance metrics were pooled using bivariate random effects and generalized linear mixed models. RESULTS: A total of 13 studies with 38,565 patients were included in the final analysis. Stand-alone DL algorithms achieved a pooled sensitivity of 0.88 (95% CI 0.80-0.93), specificity of 0.74 (95% CI 0.59-0.85), and area under the receiver operating characteristic curve (AUC) of 0.89 (95% CI 0.86-0.92). While DL performance showed no statistically significant difference compared to all radiologists (AUC=0.89 vs 0.88; P=.64) or senior radiologists (AUC=0.89 vs 0.90; P=.48), DL demonstrated significantly superior sensitivity compared to junior radiologists (0.88 vs 0.76; P=.03). Notably, DL assistance did not statistically improve diagnostic metrics for radiologists across any experience level. Meta-regression identified validation methods as a significant source of heterogeneity. CONCLUSIONS: DL algorithms for DBT exhibited strong diagnostic proficiency and showed higher sensitivity than junior radiologists, suggesting their potential utility as adjunctive tools to help reduce oversight in less experienced settings. However, given that DL assistance did not significantly elevate overall human performance, current models act primarily as supplementary aids rather than definitive clinical tools. Future prospective multimodal studies are warranted to validate these findings and optimize clinical integration.