Machine learning models show moderate to high accuracy for predicting variceal bleeding in cirrhosis patients
This systematic review and meta-analysis evaluated the performance of machine learning prediction models for esophageal variceal bleeding (EVB) and esophagogastric variceal bleeding (EGVB) in patients with liver cirrhosis. The analysis included 21 studies with a total of 7,011 patients, though the specific clinical settings and geographic locations were not reported. The population consisted exclusively of patients with liver cirrhosis, with 1,412 patients (20.14%) developing EVB and 733 patients (10.45%) developing EGVB during the study periods. Follow-up durations were not consistently reported across the included studies.
The intervention examined was machine learning prediction models, which were developed using various input variables including clinical features, radiomics, endoscopic features, or combinations thereof. No specific comparator or standard prediction method was reported for comparison in this meta-analysis. The models were designed to predict the occurrence of EVB or EGVB in cirrhosis patients, with performance evaluated in validation sets rather than treatment outcomes.
For the primary outcome of predictive performance for EVB, the pooled c-index across studies was 0.85 (95% CI 0.77-0.92), with sensitivity of 0.93 (95% CI 0.87-0.96) and specificity of 0.66 (95% CI 0.46-0.82). For EGVB prediction, performance was slightly higher with a pooled c-index of 0.89 (95% CI 0.85-0.94), sensitivity of 0.77 (95% CI 0.66-0.85), and specificity of 0.81 (95% CI 0.67-0.90). These results indicate moderate to high discriminatory ability, though confidence intervals were wide for some estimates, particularly specificity for EVB prediction.
Key secondary outcomes included subgroup analyses by model variable type. For EVB prediction, models using clinical features alone achieved a c-index of 0.84 (95% CI 0.80-0.88), radiomics alone 0.82 (95% CI 0.69-0.96), combined radiomics and clinical features 0.78 (95% CI 0.67-0.89), and endoscopic features 0.97 (95% CI 0.95-1.00). For EGVB prediction, clinical feature models showed a c-index of 0.91 (95% CI 0.86-0.96) while combined radiomics and clinical features achieved 0.85 (95% CI 0.75-0.96). The endoscopic feature models demonstrated particularly high discrimination for EVB prediction, though this was based on limited studies.
Safety and tolerability data were not reported in this meta-analysis, as it focused on prediction model performance rather than therapeutic interventions. The analysis did not include information on adverse events, serious adverse events, or discontinuations related to model implementation or the diagnostic procedures used to obtain input variables.
Compared to traditional risk stratification methods for variceal bleeding in cirrhosis, such as Child-Pugh score, MELD score, or platelet count, these machine learning models appear to offer potentially superior discrimination based on the reported c-indices. However, direct comparisons with established clinical prediction rules were not performed in this analysis. The performance metrics are comparable to or exceed those reported for some existing prediction tools, though validation in head-to-head studies is needed.
Methodological limitations include the limited number of original studies included (21 total), which may affect the precision and generalizability of the pooled estimates. The analysis did not assess publication bias or between-study heterogeneity in detail. Most included studies were likely retrospective, introducing potential biases in data collection and model development. The wide confidence intervals for some performance metrics, particularly specificity for EVB prediction (0.46-0.82), indicate substantial uncertainty in these estimates.
Clinical implications suggest that machine learning approaches show promise for risk stratification of variceal bleeding in cirrhosis patients, potentially identifying high-risk individuals who might benefit from more intensive monitoring or prophylactic interventions. The high sensitivity for EVB prediction (0.93) could be valuable for ruling out low-risk patients, while the excellent discrimination of endoscopic feature models (c-index 0.97) supports the value of endoscopic information when available. However, these models require external validation in diverse clinical settings before implementation.
Unanswered questions include how these models perform compared to existing clinical prediction rules in prospective studies, whether they improve clinical outcomes when used to guide management decisions, and what the optimal model variables and algorithms are for different clinical contexts. The cost-effectiveness, implementation challenges, and effect on healthcare utilization of ML-based prediction tools also require investigation. Additionally, the generalizability to different cirrhosis etiologies and stages remains uncertain.