Mode
Text Size
Log in / Sign up

AI interventions in undergraduate health professions education show positive effects on satisfaction, confidence, and knowledgeAI Tutors Show Promise for Med Students, But Evidence Is Weak

AI-generated summary of the cited source, checked by automated accuracy review. How we work

Key Takeaway
Consider AI tools for health professions education cautiously due to low-certainty evidence on surrogate outcomes.

This is a systematic review and meta-analysis of randomized controlled trials examining Artificial Intelligence (AI) interventions in undergraduate health professions education. The population consisted of undergraduate health professions education students, with a total sample size of 4911 participants across the included studies. The setting was undergraduate health professions education programs. The intervention involved various AI technologies, subcategorized by type and educational function, including LLM-based personalized learning aids, LLM content generators, NLP chatbots, and non-LLM adaptive learning platforms. The comparator was standard educational interventions. No specific dosing or protocol details for the AI interventions were reported.

The primary outcome was not reported in the meta-analysis. Key secondary outcomes included satisfaction, confidence, and theoretical knowledge. For satisfaction, the analysis included 430 participants and reported positive effects with a standardized mean difference (SMD) of 0.93, a 95% confidence interval (CI) of 0.40 to 1.46. For confidence, the analysis included 609 participants and reported positive effects with an SMD of 0.91, a 95% CI of 0.54 to 1.29. For theoretical knowledge, the analysis included 955 participants and reported positive effects with an SMD of 0.53, a 95% CI of 0.13 to 0.94. All outcomes showed positive directionality.

Safety and tolerability findings were not reported. The review did not include data on adverse events, serious adverse events, discontinuations, or overall tolerability of the AI interventions.

These results should be compared cautiously to prior landmark studies in educational technology. The current meta-analysis synthesizes evidence from multiple trials, but the certainty of the evidence is low to very low, and the effects are inconsistent. Prior studies in this area have often been small or have had methodological limitations similar to those identified here.

Key methodological limitations include a high risk of bias in most studies, primarily due to poor allocation concealment and blinding. The certainty of evidence ranged from low to very low. There was substantial heterogeneity and wide confidence intervals, and prediction intervals frequently crossed the null. No studies assessed Kirkpatrick levels 3 or 4, which relate to behavior change and clinical outcomes. The effects are inconsistent and not reliably reproducible.

Clinical implications are that AI applications in undergraduate health professions education should be used cautiously and on a trial basis, as noted in the practice relevance section. The findings suggest potential benefits for student satisfaction, confidence, and theoretical knowledge, but these are surrogate outcomes. Clinicians and educators should not infer causation, overstate effects given the high risk of bias, or assume clinical outcomes from these surrogate measures.

Key questions remain unanswered. It is not reported whether AI interventions affect real-world clinical performance or patient outcomes. The long-term durability of the observed effects is unknown. Future research should address the high risk of bias, improve study design, and assess higher-level Kirkpatrick outcomes.

The Promise of AI in Medical Training

Picture this: You are a first-year medical student staring at a textbook on heart failure. The material is dense. The diagrams blur together. You wish you had a tutor who could explain things in a different way, at your own pace, at 2 a.m.

That is exactly what artificial intelligence promises to deliver.

A major new review of 66 randomized trials, published in JMIR Medical Education, looked at how well AI tools work for teaching future doctors, nurses, and other health professionals. The results are encouraging in some ways but far from settled.

Medical education is under pressure. Classrooms are packed with more information than ever. Students need to learn faster. Clinical rotations are shorter. And the science keeps changing.

Traditional lectures and textbooks cannot always keep up. AI tools, especially those powered by large language models (the same technology behind ChatGPT), offer a way to personalize learning. They can answer questions, generate practice problems, and adapt to each student's weak spots.

But schools need proof that these tools actually work before spending money and time on them.

The review covered 66 trials with nearly 5,000 students. Most studies tested AI tools in subjects like anatomy, pharmacology, and clinical reasoning.

The clearest results came from AI tools that acted like personal tutors. These systems used large language models to give students customized feedback and practice.

Students who used these tools reported higher satisfaction. They also felt more confident in their skills. And they scored better on tests of theoretical knowledge.

But here is the twist. The effects were not consistent. Some studies showed big improvements. Others showed little to no benefit.

Think of a large language model as a supercharged autocomplete. It has read millions of medical textbooks, journal articles, and clinical guidelines. When a student asks a question, the AI predicts the most helpful answer based on everything it has learned.

A good AI tutor does not just spit out facts. It can explain a concept in simpler terms. It can give examples. It can quiz the student and point out mistakes.

The best part? It never gets tired. It never judges. And it is available 24/7.

The Numbers Behind the Hype

For AI-powered personalized learning aids, the results were statistically significant. That means the improvements were unlikely to be due to chance.

Satisfaction scores jumped by nearly a full point on a standard scale. Confidence scores followed a similar pattern. Knowledge test scores improved by about half a point.

Those numbers sound good. But they come with a major warning label.

The quality of the evidence was rated as very low.

That does not mean the tools do not work. It means the studies were too small, too short, or too poorly designed to trust the results fully.

But There's a Catch

Most of the studies had a high risk of bias. That is a technical way of saying the results might be skewed.

For example, many studies did not hide which students got the AI tool and which got traditional teaching. Students who knew they were using a fancy new technology might have tried harder. That alone could explain some of the improvement.

Also, the studies measured only short-term effects. None looked at whether AI training actually made students better doctors in real clinics. That is a big gap.

What This Means for Students and Schools

If you are a medical student, this does not mean you should ignore AI tools. They can be helpful study aids. But do not assume they will guarantee better grades or clinical skills.

If you are an educator, the message is clear. AI tools are promising but not proven. Use them on a trial basis. Track the results. And do not replace human teachers yet.

The researchers recommend caution. AI applications should be used as supplements, not replacements, for traditional education.

The Limits of This Research

The review itself has limitations. Most studies lasted only a few weeks. Many had fewer than 50 participants. And the AI tools varied widely, from simple chatbots to complex adaptive learning platforms.

No studies looked at whether AI training leads to better patient care. That is the ultimate test, and it is still missing.

What Happens Next

Researchers need larger, longer, and better-designed trials. They need to follow students into clinical practice. They need to see if AI-trained doctors make fewer mistakes or communicate better with patients.

That work will take years. Medical education changes slowly for good reason. Lives depend on it.

For now, AI is a helpful study buddy. It is not a replacement for the hard work of becoming a doctor. And the evidence, while promising, is not strong enough to change how medical schools teach.

The technology will get better. The research will get stronger. But for today, the smart move is to use AI tools with open eyes and realistic expectations.

Study Details

Study typeMeta analysis
Sample sizen = 4,911
EvidenceLevel 1
PublishedMay 2026
View Original Abstract ↓
BACKGROUND: Health professions education faces increasing challenges from rising health care complexity, pedagogical shifts, and constrained curricular space, and rapidly expanding knowledge and technological advances. While artificial intelligence (AI) shows promise for transforming health professions education, evidence of its effectiveness remains unclear. OBJECTIVE: This study synthesized evidence from randomized controlled trials (RCTs) on the effectiveness of AI in undergraduate health professions education. METHODS: We included RCTs, randomized crossover trials, and cluster RCTs comparing AI against standard educational interventions at the undergraduate level. We excluded quasi-experimental studies and those without clear AI components. We searched PubMed, Cochrane, Embase, Educational Resources Information Center, and Web of Science up to January 26, 2026. Outcomes were categorized according by Kirkpatrick levels; risk of bias was assessed using the Risk Of Bias Instrument for Use in Systematic Reviews for Randomised Controlled Trials tool; random-effects meta-analysis was conducted in RevMan (Cochrane); and certainty of evidence was rated using the Grading of Recommendations, Assessment, Development, and Evaluation approach. AI interventions were subcategorized by technology type and educational functions, yielding 13 subcategories. RESULTS: Of 39,783 records identified, 66 RCTs (N=4911 participants; 2020-2026) were included. Subcategorized analyses across 7 outcome domains yielded 48 comparisons. Most studies had high risk of bias, mainly due to poor allocation concealment and blinding, and certainty of evidence ranged from low to very low. Large language model (LLM)-based personalized learning aids comprised the largest evidence base and showed positive effects for satisfaction (standardized mean difference [SMD] 0.93, 95% CI 0.40-1.46; 7 studies; 430 participants; I²=74%), confidence (SMD 0.91, 95% CI 0.54-1.29; 7 studies; 609 participants; I²=64%), and theoretical knowledge (SMD 0.53, 95% CI 0.13-0.94; 12 studies; 955 participants; I²=86%), all with very low certainty. Other AI subtypes, including LLM content generators, natural language processing (NLP) chatbots, and non-LLM adaptive learning platforms, showed generally favorable point estimates but substantial heterogeneity and wide CIs, often included no effect. Prediction intervals frequently crossed the null, indicating uncertainty across educational setting. No studies assessed Kirkpatrick levels 3 or 4. CONCLUSIONS: This review synthesized RCT evidence on AI in undergraduate health professions education by technology type and function, incorporating evidence certainty. Despite the large number of included studies, evidence remains insufficient to inform educational practice. Some AI interventions may improve some learning outcomes, but effects are inconsistent and not reliably reproducible. High risk of bias, heterogeneity, imprecision, and absence of higher-level outcomes limit conclusions. AI applications should therefore be used cautiously and on a trial basis.
Free Newsletter

Clinical research that matters. Delivered to your inbox.

Join thousands of clinicians and researchers. No spam, unsubscribe anytime.