Mode
Text Size
Log in / Sign up

Deep learning matches radiologists in breast cancer detection using tomosynthesisAI Matches Radiologists at Spotting Breast Cancer on 3D Mammograms

AI-generated summary of the cited source, checked by automated accuracy review. How we work

Key Takeaway
Deep learning for breast tomosynthesis matches radiologist performance but does not improve it when used as assistance.

A systematic review and meta-analysis evaluated the diagnostic performance of deep learning (DL) algorithms for digital breast tomosynthesis (DBT) in breast cancer detection. The analysis pooled data from 13 studies encompassing 38,565 patients, comparing stand-alone DL or DL-assisted diagnosis against radiologist interpretation alone, with radiologists of varying experience levels. The primary outcomes were diagnostic performance metrics, including sensitivity, specificity, and area under the curve (AUC).

The pooled sensitivity for stand-alone DL algorithms was 0.88 (95% CI 0.80-0.93), with a specificity of 0.74 (95% CI 0.59-0.85) and an AUC of 0.89 (95% CI 0.86-0.92). When compared to all radiologists, DL demonstrated an AUC of 0.89 versus 0.88 (P=.64), indicating no statistically significant difference. Similarly, compared to senior radiologists, DL AUC was 0.89 versus 0.90 (P=.48), again showing no significant difference.

However, DL showed significantly superior sensitivity compared to junior radiologists (0.88 vs. 0.76, P=.03). Importantly, DL assistance did not statistically improve diagnostic metrics for radiologists, suggesting that current models act primarily as supplementary aids rather than definitive tools. This aligns with the practice relevance note, which emphasizes DL's role in reducing oversight in less experienced settings without elevating overall human performance.

Limitations include meta-regression identifying validation methods as a significant source of heterogeneity, and the call for future prospective multimodal studies. The analysis is observational, reporting associations rather than causation, and findings are limited to studies up to November 8, 2025. Funding and conflicts were not reported.

In clinical practice, DL algorithms for DBT exhibit strong diagnostic proficiency and higher sensitivity than junior radiologists, suggesting utility as adjunctive tools. However, they do not replace radiologist expertise and should be viewed as supplementary aids to enhance detection, particularly in settings with less experienced readers.

The certainty of pooled effect sizes is moderated by heterogeneity across the 13 included studies. Clinicians should interpret these findings within the context of the meta-analysis design and the specific patient population studied, avoiding overstatement of DL capabilities.

Overall, the evidence supports integrating DL into DBT workflows to support radiologists, especially juniors, but not as a standalone solution. Continued research is needed to refine algorithms and validate performance in diverse clinical environments.

Imagine sitting in a doctor's office waiting for your mammogram results. The radiologist has years of training. But what if a computer could help them spot something they might miss?

That question is becoming more important every day.

Breast cancer is the most common cancer in women worldwide. Mammograms save lives by catching tumors early. But reading these images is hard work. Radiologists look at hundreds of scans. Fatigue sets in. Small cancers can hide in dense breast tissue.

Now a new type of mammogram called digital breast tomosynthesis (DBT) gives doctors a 3D view of the breast. It is better than older 2D mammograms. But it also creates more images to review.

Enter artificial intelligence.

What the AI Could Do

Researchers wanted to know if AI could read these 3D mammograms as well as human radiologists. They combined data from 13 studies with nearly 39,000 patients.

The results were surprising.

The AI found breast cancer with 88% accuracy. That means it caught most cancers. It also correctly ruled out cancer 74% of the time. Overall, its performance matched senior radiologists almost exactly.

But here is the twist. The AI was better than junior radiologists at finding cancers. Junior doctors caught 76% of cancers. The AI caught 88%. That is a real difference.

This does not mean AI is ready to replace your radiologist.

How AI Learns to Read Scans

Think of it like this. A radiologist learns by looking at thousands of mammograms during training. They learn what cancer looks like and what normal tissue looks like.

AI learns the same way. But it can look at millions of images. It finds patterns that human eyes might miss. It does not get tired. It does not get distracted.

The AI in these studies uses deep learning. That is a type of computer program that mimics how the brain works. It processes the 3D mammogram slice by slice. Then it flags areas that look suspicious.

The Catch Nobody Expected

Here is where things get interesting.

When radiologists used the AI as a helper tool, their performance did not improve. Not for senior doctors. Not for junior doctors.

That seems odd. If AI is so good, why does it not make humans better?

The answer may be simple. Radiologists already know what they are looking for. The AI might flag things they already see. Or it might distract them with false alarms.

This matters because many people think AI will make doctors better. This study suggests the reality is more complex.

What This Means for Patients

For now, AI works best as a safety net. It can double-check scans in busy hospitals. It can help in places where experienced radiologists are scarce.

Think about rural hospitals. Or clinics in developing countries. A junior radiologist with AI support might catch cancers they would otherwise miss.

But you should not expect AI to replace your doctor anytime soon.

The Limits of This Research

This study has important limits. The 13 studies used different types of AI. Some were older. Some were newer. The researchers could not test them all the same way.

Also, these were research settings. Real hospitals are messier. Patients move. Equipment varies. Results might look different in daily practice.

What Happens Next

The researchers say we need more studies. Future research should combine AI with other tools. It should test AI in real clinics, not just research labs.

New AI models are being developed right now. They are getting better at reading scans. They may eventually help radiologists work faster and more accurately.

For now, the message is clear. AI can match your doctor at finding breast cancer. But it works best as a helper, not a replacement. The human touch still matters most.

Study Details

Study typeMeta analysis
Sample sizen = 38,565
EvidenceLevel 1
PublishedMay 2026
View Original Abstract ↓
BACKGROUND: Deep learning (DL) algorithms for digital breast tomosynthesis (DBT) have proliferated, demonstrating emerging potential in enhancing lesion detection and classification. OBJECTIVE: This study aimed to compare the diagnostic performance of DL algorithms for DBT with that of radiologists of varying experience and assess the clinical impact of DL assistance. METHODS: A systematic search of PubMed, Embase, Web of Science, and the Cochrane Library was conducted up to November 8, 2025. Included studies compared the performance of stand-alone DL algorithms for DBT, radiologist interpretation alone, and DL-assisted diagnosis. Study quality was assessed using the Prediction Model Risk of Bias Assessment Tool+Artificial Intelligence (PROBAST+AI). Performance metrics were pooled using bivariate random effects and generalized linear mixed models. RESULTS: A total of 13 studies with 38,565 patients were included in the final analysis. Stand-alone DL algorithms achieved a pooled sensitivity of 0.88 (95% CI 0.80-0.93), specificity of 0.74 (95% CI 0.59-0.85), and area under the receiver operating characteristic curve (AUC) of 0.89 (95% CI 0.86-0.92). While DL performance showed no statistically significant difference compared to all radiologists (AUC=0.89 vs 0.88; P=.64) or senior radiologists (AUC=0.89 vs 0.90; P=.48), DL demonstrated significantly superior sensitivity compared to junior radiologists (0.88 vs 0.76; P=.03). Notably, DL assistance did not statistically improve diagnostic metrics for radiologists across any experience level. Meta-regression identified validation methods as a significant source of heterogeneity. CONCLUSIONS: DL algorithms for DBT exhibited strong diagnostic proficiency and showed higher sensitivity than junior radiologists, suggesting their potential utility as adjunctive tools to help reduce oversight in less experienced settings. However, given that DL assistance did not significantly elevate overall human performance, current models act primarily as supplementary aids rather than definitive clinical tools. Future prospective multimodal studies are warranted to validate these findings and optimize clinical integration.
Free Newsletter

Clinical research that matters. Delivered to your inbox.

Join thousands of clinicians and researchers. No spam, unsubscribe anytime.