Mode
Text Size
Log in / Sign up

Machine learning algorithms predicted stroke risk with high accuracy in a large retrospective cohortNew AI models may help predict stroke risk better than current methods in large data reviews

AI-generated summary of the cited source, checked by automated accuracy review. How we work

Key Takeaway
Consider machine learning models for stroke risk prediction, but await external validation before clinical implementation.

This retrospective multicenter study analyzed data from 35,859 participants within a high-quality health database to assess stroke risk prediction. The investigation compared multiple machine learning algorithms, including random forest, against established models. The primary outcome measured was the area under the curve (AUC) for stroke risk prediction.

The random forest model demonstrated the best performance among the evaluated algorithms, achieving an AUC of 0.97. During the observation period, 781 participants, representing 2.2% of the cohort, experienced a stroke. The dataset was initially incomplete and class imbalanced; therefore, extreme outliers and noises were eliminated, missing values were imputed, and the Synthetic Minority Over-sampling Technique was used to generate a balanced dataset for analysis.

Safety and tolerability data were not reported in this study. Key limitations include the incomplete nature of the dataset, the use of data imputation and oversampling techniques, and the elimination of extreme outliers. The authors note that future studies should further validate and optimize the current model to assess its generalizability across different populations.

The study suggests these tools may facilitate the application of clinical practice guidelines and shared decision-making. However, given the observational nature of the data and the specific preprocessing steps taken, clinicians should interpret these results with caution until external validation confirms the model's performance in diverse settings.

This study looked at how well different computer algorithms could predict the risk of having a stroke. It used data from a high-quality health database involving 35,859 participants. The researchers compared new machine learning methods, including random forest, against models that are currently used in healthcare settings.

The main finding was that the random forest model showed the best performance for predicting stroke risk. In the group studied, 781 participants experienced a stroke, which represented 2.2% of the total population. The new model achieved an AUC score of 0.97, indicating strong predictive ability compared to older methods.

The study had several limitations that affect how we should view these results. The dataset was incomplete and had an uneven number of cases, so researchers had to eliminate extreme outliers and use special techniques to balance the data. Missing values were also filled in using statistical methods. Because of these steps, the results may not apply perfectly to every patient or every hospital.

Readers should understand that this is a retrospective analysis, meaning it looked back at past data rather than testing new treatments on people. While the model shows promise, future studies must validate and optimize it to ensure it works well in real-world settings. This research could eventually help doctors apply clinical guidelines better, but it does not mean patients should change their care based on this single study.

What this means for you:
New AI models showed better stroke prediction in data, but need more testing before clinical use.

Study Details

Study typeCohort
EvidenceLevel 3
PublishedApr 2026
View Original Abstract ↓
Our study aims to develop a stroke risk prediction model by multiple machine learning algorithms and optimize the model as a stroke risk prediction tool. This retrospective multicenter study derived the original dataset from a high-quality health database. The dataset was incomplete and class imbalanced. Firstly, we eliminated extreme outliers and noises and imputed missing values by appropriate algorithms. We further used Synthetic Minority Over-sampling Technique to generate a balanced dataset. Secondly, we fitted seven algorithms to develop a machine learning-based prediction tool for clinical practice. Overall, 35,859 participants were included, of whom 781 (2.2%) experienced a stroke. The random forest model demonstrated the best performance with high predictive value and discrimination ability. For stroke risk prediction, the AUC of the best-performing model was 0.97. A new random forest algorithms-based stroke risk prediction model using easily obtainable data was developed and outperformed established models. Future studies should further validate and optimize the current model to assess its generalizability and promote the wide application. The utilization of proposed random forest algorithms as an individualized risk prediction model could facilitate the application of clinical practice guidelines and shared decision-making.
Free Newsletter

Clinical research that matters. Delivered to your inbox.

Join thousands of clinicians and researchers. No spam, unsubscribe anytime.