Mode
Text Size
Log in / Sign up

Machine learning model using routine clinical indicators predicts coronary heart disease risk

Machine learning model using routine clinical indicators predicts coronary heart disease risk
Photo by Stephan HK / Unsplash
Key Takeaway
Note that a machine learning model using routine clinical indicators shows high accuracy for CHD risk prediction.

This retrospective cohort model development and validation study evaluated a stacked ensemble machine learning model for predicting coronary heart disease (CHD) risk. The study utilized data from the Framingham Heart Study and a retrospective hospital cohort (2024–2025). The model incorporated routine clinical indicators, including age, systolic blood pressure, total cholesterol, and fasting glucose.

Internal validation was performed with a sample size of n = 4,240, yielding an AUC of 0.977, accuracy of 0.942, and F1 score of 0.944. External validation was conducted with a sample size of n = 200, demonstrating an AUC of 0.929 and accuracy of 0.885.

Safety and tolerability data, including adverse events or discontinuations, were not reported. The study focused on the predictive performance of the model across different cohorts.

While the model demonstrates strong discrimination for CHD risk and generalizes to an external cohort, it remains a tool for risk assessment based on routine measures. The clinical utility of this machine learning approach for cardiovascular risk assessment warrants further investigation in prospective settings.

Study Details

Study typeCohort
EvidenceLevel 3
PublishedApr 2026
View Original Abstract ↓
BackgroundTo develop and externally validate a coronary heart disease (CHD) risk model from routine clinical indicators and identify key predictors.MethodsThe Framingham Heart Study cohort (n = 4,240) was used. Missing values and outliers were handled, and class imbalance was corrected with SMOTEENN/SMOTETomek. Data were split 7:3 for training and internal validation. A two-tier feature selection (chi-square, mutual information, ANOVA F-test) retained ten variables. A stacked ensemble of gradient boosting, random forest, and XGBoost with a logistic-regression meta-learner was trained. Performance was measured by AUC, accuracy, precision, recall, and F1. External validation used a retrospective hospital cohort (n = 200; 2024–2025). Model explanations were derived with SHAP.ResultsInternal validation yielded AUC 0.977 and accuracy 0.942 (F1: 0.944). External validation achieved AUC 0.929 and accuracy 0.885. SHAP identified systolic blood pressure, age, total cholesterol, and fasting glucose as leading contributors, with plausible nonlinear effects and interactions.ConclusionA model built from routinely available measures demonstrates strong discrimination for CHD risk and generalizes to an external cohort, offering a clinically interpretable tool for cardiovascular risk assessment.
Free Newsletter

Clinical research that matters. Delivered to your inbox.

Join thousands of clinicians and researchers. No spam, unsubscribe anytime.