AI interventions in undergraduate health professions education show positive effects on satisfaction, confidence, and knowledge
This is a systematic review and meta-analysis of randomized controlled trials examining Artificial Intelligence (AI) interventions in undergraduate health professions education. The population consisted of undergraduate health professions education students, with a total sample size of 4911 participants across the included studies. The setting was undergraduate health professions education programs. The intervention involved various AI technologies, subcategorized by type and educational function, including LLM-based personalized learning aids, LLM content generators, NLP chatbots, and non-LLM adaptive learning platforms. The comparator was standard educational interventions. No specific dosing or protocol details for the AI interventions were reported.
The primary outcome was not reported in the meta-analysis. Key secondary outcomes included satisfaction, confidence, and theoretical knowledge. For satisfaction, the analysis included 430 participants and reported positive effects with a standardized mean difference (SMD) of 0.93, a 95% confidence interval (CI) of 0.40 to 1.46. For confidence, the analysis included 609 participants and reported positive effects with an SMD of 0.91, a 95% CI of 0.54 to 1.29. For theoretical knowledge, the analysis included 955 participants and reported positive effects with an SMD of 0.53, a 95% CI of 0.13 to 0.94. All outcomes showed positive directionality.
Safety and tolerability findings were not reported. The review did not include data on adverse events, serious adverse events, discontinuations, or overall tolerability of the AI interventions.
These results should be compared cautiously to prior landmark studies in educational technology. The current meta-analysis synthesizes evidence from multiple trials, but the certainty of the evidence is low to very low, and the effects are inconsistent. Prior studies in this area have often been small or have had methodological limitations similar to those identified here.
Key methodological limitations include a high risk of bias in most studies, primarily due to poor allocation concealment and blinding. The certainty of evidence ranged from low to very low. There was substantial heterogeneity and wide confidence intervals, and prediction intervals frequently crossed the null. No studies assessed Kirkpatrick levels 3 or 4, which relate to behavior change and clinical outcomes. The effects are inconsistent and not reliably reproducible.
Clinical implications are that AI applications in undergraduate health professions education should be used cautiously and on a trial basis, as noted in the practice relevance section. The findings suggest potential benefits for student satisfaction, confidence, and theoretical knowledge, but these are surrogate outcomes. Clinicians and educators should not infer causation, overstate effects given the high risk of bias, or assume clinical outcomes from these surrogate measures.
Key questions remain unanswered. It is not reported whether AI interventions affect real-world clinical performance or patient outcomes. The long-term durability of the observed effects is unknown. Future research should address the high risk of bias, improve study design, and assess higher-level Kirkpatrick outcomes.