Large language models provide automated feedback to improve diagnostic accuracy and report quality in radiology trainingAI feedback helps radiology trainees learn faster, review finds

Frontiers in Medicine Published June 10, 2026 DOI ↗ Editorial oversight: Dr. Amelia Tan, PhD · Internal Medicine & Chronic Disease

AI-generated summary of the cited source, checked by automated accuracy review. How we work

Key Takeaway

Note that LLMs can improve radiology trainee performance but require expert oversight due to preliminary evidence.

This systematic review evaluates 7 studies regarding the integration of large language models (LLMs) as tools for automated feedback within medical imaging education. The scope includes assessing educational outcomes such as report quality, diagnostic accuracy, and efficiency in discrepancy detection among radiology residents and fellows.

The synthesis indicates that LLM-generated feedback is well-received by learners, who expressed satisfaction and a preference for hybrid human-AI feedback models. While fine-tuned models generally outperformed general-purpose LLMs, the level of agreement with expert-human consensus varied. The review notes positive trends in learner satisfaction and efficiency, though specific effect sizes were not reported.

Several limitations are noted, including the preliminary nature of the evidence and a lack of large multicenter studies or standardized methodologies. Because the current evidence is limited, LLMs should be viewed as supportive tools rather than replacements for human instruction. Clinical application suggests that while LLMs may assist in training, expert oversight remains essential to ensure educational quality.

Imagine learning to read X-rays and CT scans. You write a report, and an AI gives you instant feedback on what you missed. That's the promise of large language models (LLMs) in medical imaging education, according to a new review of 7 studies.

The review found that when radiology residents and fellows used LLMs for automated feedback, their report quality, diagnostic accuracy, and efficiency in spotting discrepancies all improved. Trainees liked it too, and many said they preferred a hybrid model where AI and human experts work together.

But here's the catch: the evidence is still early and limited. The studies were small, used different methods, and didn't compare LLMs against standard teaching. Also, fine-tuned models did better than general-purpose ones, but their agreement with expert humans varied.

So while AI shows promise as a teaching tool, human oversight remains essential. More research is needed before we know if this approach truly boosts long-term learning.

What this means for you:

AI feedback can help radiology trainees, but human experts are still essential.

Common questions

Does this mean AI can replace teachers in radiology training?

No. The review says evidence is preliminary and limited. Trainees preferred a hybrid model with both AI and human experts. Human oversight is still essential.

How many studies were included in this review?

The review included 7 studies. The authors note that larger multicenter studies are needed for more definitive conclusions.

What did the AI feedback improve for radiology trainees?

The review found that AI feedback enhanced report quality, improved diagnostic accuracy, and increased efficiency in detecting discrepancies. Trainees also reported satisfaction with the feedback.

Are there any risks or side effects of using AI for feedback?

The review did not report any adverse events or safety issues. However, the evidence is limited, and the authors caution that widespread adoption without human oversight is not recommended.

Study Details

Study typeMeta analysis

EvidenceLevel 1

PublishedJun 2026

View Original Abstract ↓

IntroductionLarge language models (LLMs) are an emerging form of generative artificial intelligence (AI) with promising applications in medical education, and their ability to provide automated feedback may enhance medical imaging education for trainees. This review aims to systematically examine and synthesize the published literature on the use of LLMs in providing automated feedback in medical imaging education.MethodsWe conducted this systematic review in accordance with the PRISMA 2020 guidelines. A comprehensive search of the PubMed, Scopus, and Embase databases was conducted, covering studies published through January 2026. Our search strategy included keywords related to “feedback, generative artificial intelligence, large language models, radiology, and medical imaging.” Studies were eligible if they examined the use of LLMs to generate automated feedback for medical trainees within medical imaging education. Extracted data were synthesized using descriptive synthesis, with quality appraisal assessed using ROBINS-I and GRADE.ResultsOf 1,003 identified records, 7 met the inclusion criteria. All studies examined the applications of automated LLM feedback in the medical education of radiology residents, with one study also including fellows. Reported educational outcomes included enhanced report quality, improved diagnostic accuracy, and increased efficiency in discrepancy detection. LLM feedback was generally well-received among trainees, with learners expressing satisfaction with the LLM feedback and preferring a hybrid human-AI feedback model. Additionally, fine-tuned models generally showed stronger performance than general-purpose LLMs and demonstrated variable agreement with expert-human consensus.ConclusionLLMs show a potentially promising role as supportive tools for providing automated feedback in medical imaging education, alongside human feedback. This includes reported gains in accuracy, efficiency, and learner satisfaction. However, the current published evidence is preliminary and limited. Larger multicenter studies with standardized methods are necessary before widespread adoption can be justified. Our systematic review emphasizes that human expert oversight remains essential, as the current evidence supports preliminary technical feasibility, but not yet definitive educational effectiveness.Systematic review registrationhttps://www.crd.york.ac.uk/PROSPERO/view/CRD420251081394, Identifier CRD420251081394

Large language models provide automated feedback to improve diagnostic accuracy and report quality in radiology trainingAI feedback helps radiology trainees learn faster, review finds

Common questions

Study Details

Late-onset rheumatoid arthritis shows higher DAS28 scores and lower remission rates compared to young-onset disease in a meta-analysis

Meta-analysis compares treatment outcomes in older versus younger rheumatoid arthritis patients

Clinical research that matters. Delivered to your inbox.

Large language models provide automated feedback to improve diagnostic accuracy and report quality in radiology trainingAI feedback helps radiology trainees learn faster, review finds

Common questions

Study Details

Late-onset rheumatoid arthritis shows higher DAS28 scores and lower remission rates compared to young-onset disease in a meta-analysis

Meta-analysis compares treatment outcomes in older versus younger rheumatoid arthritis patients

Clinical research that matters. Delivered to your inbox.

Related in Rheumatology

From Other Specialties