Mode
Text Size
Log in / Sign up

LLM ensemble shows potential for PCI decision support in retrospective analysis of 93 patients

LLM ensemble shows potential for PCI decision support in retrospective analysis of 93 patients
Photo by Dmytro Vynohradov / Unsplash
Key Takeaway
Consider LLM ensembles for PCI support as early research; prospective validation needed before clinical use.

A retrospective cohort study at Ruijin Hospital evaluated 15 large language model (LLM) versions for percutaneous coronary intervention (PCI) decision support using data from 93 patients with moderate-to-severe coronary stenosis. The study assessed LLM behavioral patterns and performance metrics without a direct clinical comparator. Distinct behavioral patterns emerged across LLM families, with Llama-3.3-70B-Instruct making more aggressive recommendations and Grok-3 being more conservative.

The main finding was that advanced ensemble strategies surpassed individual models. An adaptive grouped ensemble achieved an F1 score of 0.921, compared to 0.807 for the best single model and 0.794 for a standard ensemble. Performance was significantly modulated by patient age, with Holm-adjusted analysis identifying performance gaps at age cut-points of 73, 75, and 76 years, and a likelihood ratio test confirming a significant age-score interaction (p = 0.00089).

Safety and tolerability data were not reported. Key limitations include the retrospective, single-center design with a small sample size of 93 patients. The study does not report clinical outcomes from using LLM recommendations or compare them to actual clinical decisions or patient outcomes. The performance metrics (F1 scores) represent model classification agreement, not clinical efficacy.

For practice, this early evidence suggests tailored LLM ensembles are technically feasible for PCI decision support and may improve robustness. However, the authors note that multicenter prospective validation and multimodal integration are needed before any clinical deployment. Clinicians should interpret these findings as preliminary computational research rather than evidence supporting clinical implementation.

Study Details

Study typeCohort
EvidenceLevel 3
PublishedApr 2026
View Original Abstract ↓
BackgroundClinical decision-making for percutaneous coronary intervention (PCI) in patients with moderate-to-severe coronary stenosis is complex and sensitive to data completeness and guideline interpretation. We aimed to evaluate large language models (LLMs) for PCI support and to develop an ensemble framework for this complex decision setting.MethodsIn this retrospective study, 15 LLM versions were evaluated using data of 93 patients from Ruijin Hospital. A hierarchical framework was employed to assess performance across varying data inputs. To optimize accuracy, advanced grouped ensemble strategies were developed and validated via nested repeated stratified 5-fold cross-validation. Probabilistic reliability and clinical utility were quantified through calibration plots and Decision Curve Analysis (DCA). Statistical robustness was ensured by bootstrap ROC-AUC comparisons with Holm-Bonferroni adjustment and restricted cubic spline modeling to analyze age-performance interactions.ResultsDistinct behavioral patterns emerged across LLM families: Llama-3.3-70B-Instruct made more aggressive recommendations, whereas Grok-3 was more conservative. Holm-adjusted analysis identified significant performance gaps at age cut-points of 73, 75, and 76. A significant age-score interaction (LRT p = 0.00089) confirmed that patient age modulates model performance. The advanced ensemble strategies surpassed individual models, with an adaptive grouped ensemble achieving an F1 score of 0.921, compared to 0.807 for the best single model and 0.794 for a standard ensemble.ConclusionTailored LLM ensembles are feasible for PCI decision support and can improve robustness. Further multicenter prospective validation and multimodal integration are needed before clinical deployment.
Free Newsletter

Clinical research that matters. Delivered to your inbox.

Join thousands of clinicians and researchers. No spam, unsubscribe anytime.