Systematic review finds LLMs show promise for ophthalmic education in primary care but lack real-world validation

Photo by Brett Jordan / Unsplash

Frontiers in Medicine Published April 7, 2026 Medically reviewed April 26, 2026 DOI ↗ By Dr. Amelia Tan, PhD · Internal Medicine & Chronic Disease

Key Takeaway

Consider LLMs as potential cognitive apprenticeship tools in ophthalmic education, but recognize the current lack of real-world validation and significant safety risks.

This systematic review and meta-analysis evaluated studies of large language models (LLMs) in ophthalmic education, training, assessment, and primary care–relevant clinical support for primary care physicians. The evidence base is dominated by vignette-based benchmarks, comparative scoring studies, and evaluations in limited-sample or controlled settings. Prospective real-world validation using learner transfer, clinical behavior change, or patient outcomes remains scarce.

LLM applications identified include serving as 'cognitive apprenticeship' partners for clinical reasoning, with specific uses in triage and reasoning drills, virtual patient interviewing, and support for structured referrals, documentation, and patient education. Most studies benchmarked LLM outputs against expert consensus or guidelines, though scoring rubrics and reference standards varied widely. Some reports noted that adding clinical photographs to prompts could reduce model accuracy.

Key limitations include significant heterogeneity across studies, rapid model iteration creating reproducibility challenges, multimodal instability, and safety risks such as hallucination, bias, and automation bias. Adverse events, tolerability, and discontinuation data were not reported. The review concludes that with clearly defined task scopes and robust safeguards, LLMs may improve the accessibility and efficiency of primary care ophthalmic education, but should augment rather than replace expert judgment.

Study Details

Study typeMeta analysis

EvidenceLevel 1

PublishedMar 2026

View Original Abstract ↓

BackgroundPrimary care physicians (PCPs) are a critical first contact for eye-health screening, risk stratification, and referral, yet ophthalmology training and point-of-care support in primary care remain insufficient. Recent advances in generative artificial intelligence (generative AI), particularly large language models (LLMs), may help address these gaps through conversational, scenario-based learning and structured feedback. However, the educational effectiveness, reproducibility, and safety boundaries of LLM-enabled tools in primary care ophthalmology remain unclear.MethodsWe conducted a systematic review of studies evaluating or applying LLMs in ophthalmic education, training, assessment, or primary care–relevant clinical support. PubMed, Web of Science Core Collection, and Scopus were searched from January 1, 2020 to December 31, 2025 using combined terms related to LLMs/generative AI, ophthalmology, and education or assessment. Citation chaining was also performed to reduce omission. Two reviewers independently screened records and extracted data.ResultsThe evidence base is dominated by vignette-based benchmarks, comparative scoring studies, and evaluations conducted in limited-sample or controlled settings; prospective real-world validation using learner transfer, clinical behavior change, workflow impact, or patient outcomes remains scarce. Across studies, LLMs can serve as “cognitive apprenticeship” partners by externalizing clinical reasoning and enabling repeated practice in key-feature extraction, differential diagnosis, risk stratification, and referral-threshold decisions. Applications include triage/reasoning drills, virtual patient interviewing, and support for structured referrals, documentation, and patient education, often strengthened by retrieval-augmented generation. Most studies benchmarked outputs against expert consensus or guidelines, but scoring rubrics and reference standards varied widely, limiting cross-study comparability. Some reports noted that adding clinical photographs could reduce accuracy, suggesting current multimodal models are better suited for history-based reasoning than fine-grained image interpretation. Limitations include heterogeneity, rapid model iteration, reproducibility challenges, multimodal instability, and safety risks such as hallucination, bias, and automation bias. Commonly recommended safeguards include retrieval grounding, source attribution, red-flag checklists, and human-in-the-loop review.ConclusionWith clearly defined task scopes and robust safeguards, LLMs may improve the accessibility and efficiency of primary care ophthalmic education, but should augment rather than replace expert judgment. Future work should prioritize pragmatic multicenter trials, mixed-method implementation studies, and standardized cross-lingual evaluations to define safe and effective implementation pathways.

Systematic review finds LLMs show promise for ophthalmic education in primary care but lack real-world validation

Study Details

Late-onset rheumatoid arthritis shows higher DAS28 scores and lower remission rates compared to young-onset disease in a meta-analysis.

Meta-analysis compares treatment outcomes in older versus younger rheumatoid arthritis patients

Clinical research that matters. Delivered to your inbox.

Systematic review finds LLMs show promise for ophthalmic education in primary care but lack real-world validation

Study Details

Late-onset rheumatoid arthritis shows higher DAS28 scores and lower remission rates compared to young-onset disease in a meta-analysis.

Meta-analysis compares treatment outcomes in older versus younger rheumatoid arthritis patients

Clinical research that matters. Delivered to your inbox.

Related in Rheumatology

From Other Specialties