Mode
Text Size
Log in / Sign up

Preclinical study tests AI pipeline for glaucoma detection using Harvard dataset

Preclinical study tests AI pipeline for glaucoma detection using Harvard dataset
Photo by Lye Clicks / Unsplash
Key Takeaway
Consider that a multi-agent AI system may improve glaucoma detection accuracy on uncertain cases in a specific dataset.

This is a preclinical study describing a two-tier diagnostic pipeline for glaucoma detection applied to the Harvard Glaucoma Detection and Progression dataset. The first tier is a semi-supervised EfficientNetV2S classifier, and the second tier is a multi-agent system built on MedGemma 4B with three specialist agents deliberating over three rounds. The comparator was the classifier alone.

The authors report that the classifier alone had an AUC of 0.84 on 150 held-out test patients. On 124 flagged cases, the agent system achieved 100% sensitivity (55 glaucoma cases detected, zero missed) and 89.5% overall accuracy (111 correct out of 124), compared to the classifier's 73.4% accuracy (91 correct out of 124). An uncertainty analysis showed 96.3% accuracy for confident predictions (n=27) and 74.0% for uncertain predictions (n=123). The net improvement from the agent system was 20 cases (32 fixed, 12 new errors).

The authors acknowledge key limitations: a single training run without variance estimates, preliminary evidence, and results from a specific dataset that may not generalize. No safety data were reported. The practice relevance noted is that uncertainty-gated routing to vision language model agents can improve diagnostic accuracy on cases where automated classifiers are least reliable. Causal claims are not made, and results should be interpreted as preliminary.

Study Details

Sample sizen = 27
EvidenceLevel 5
PublishedApr 2026
View Original Abstract ↓
Automated glaucoma screening from optical coherence tomography (OCT) faces two persistent challenges: scarcity of expert labeled data and unreliable model predictions on diagnostically ambiguous cases. We present a two tier diagnostic pipeline that addresses both. In the first tier, an EfficientNetV2S classifier trained under a semi supervised pseudo supervisor framework achieves 0.84 AUC on 150 held out test patients from the Harvard Glaucoma Detection and Progression dataset, using only 350 labeled training samples out of 700. In the second tier, 124 flagged cases are routed to a multi agent system built on MedGemma 4B, where three specialist agents deliberate over three rounds before rendering a final diagnosis. On these flagged cases, the agent system achieves 100% sensitivity detecting all 55 glaucoma cases with zero missed diagnoses and 89.5% overall accuracy (111/124), compared to the classifiers 73.4% (91/124). Uncertainty analysis confirms that the classifiers output probability reliably separates confident predictions (96.3% accuracy, n = 27) from uncertain ones (74.0%, n = 123), producing a 22-percentage-point gap that serves as a triage signal. The agents fix 32 cases the classifier misclassifies while introducing 12 new errors, yielding a net improvement of 20 cases. These results are from a single training run without variance estimates and should be interpreted as preliminary evidence that uncertainty gated routing to vision language model agents can meaningfully improve diagnostic accuracy on the cases where automated classifiers are least reliable.
Free Newsletter

Clinical research that matters. Delivered to your inbox.

Join thousands of clinicians and researchers. No spam, unsubscribe anytime.