A systematic review and meta-analysis evaluated the performance of machine learning algorithms for predicting atrial fibrillation recurrence after catheter ablation. The analysis included 17 studies, though specific population characteristics, comparator approaches, and follow-up duration were not reported. The primary outcome was predictive accuracy measured by Area Under the receiver operating characteristic Curve (AUC).
The main finding was a mean overall AUC of 0.81 for machine learning algorithms predicting AF recurrence. Models incorporating complex data modalities reportedly achieved higher accuracy, though specific effect sizes, absolute numbers, and confidence intervals were not reported. No safety or tolerability data were available in the analysis.
Key limitations included substantial inter-study heterogeneity, lack of standardized datasets, and limited generalizability across studies. The meta-analysis synthesized observational studies evaluating predictive models, establishing association rather than causation. Funding sources and conflicts of interest were not reported.
These findings indicate machine learning shows moderate predictive accuracy for AF recurrence in research settings. However, the heterogeneity and methodological limitations prevent definitive conclusions about clinical applicability. Further validation with standardized approaches is needed before considering integration into clinical prediction tools.
View Original Abstract ↓
BACKGROUND AND OBJECTIVE: This systematic review evaluates the current state of Machine Learning (ML) methods for predicting Atrial Fibrillation (AF) recurrence following catheter ablation. With the growing use of ML, a systematic evaluation of performance and key influencing factors such as study design, data types, and reporting is needed. The main objectives are to provide an updated overview of current achievements of ML in this field, anticipate future challenges and opportunities, and derive methodological recommendations based on the findings.
METHODS: Seven databases were systematically searched, and studies proposing ML algorithms with well-documented implementation, testing, and reporting of performance metrics underwent a qualitative synthesis and risk-of-bias assessment. A meta-analysis of 17 studies was conducted using the Area Under the receiver operating characteristic Curve (AUC) as the most commonly reported performance metric.
RESULTS: The mean overall AUC was 0.81, indicating reasonable predictive accuracy, although there was substantial inter-study heterogeneity. Meta-regression identified sample size and input data type (clinical, imaging, or electrophysiological) as significant contributors to this heterogeneity. Subgroup analysis demonstrated that models incorporating complex data modalities achieved higher predictive accuracy and lower heterogeneity compared to those relying solely on simpler clinical variables.
CONCLUSION: This review quantifies the performance of ML algorithms in predicting AF recurrence and establishes a benchmark for future research. It also highlights key challenges, including the lack of standardized datasets and limited generalizability. Incorporating more complex data sources may improve model performance, reduce inconsistencies, and enhance the potential clinical applicability of ML models in guiding patient management.