This systematic review and meta-analysis examines the performance of machine learning and deep learning models designed to predict falls in community-dwelling settings. The scope includes 28 studies, of which 18 focused on general older adults, with a follow-up duration ranging from 3 months to 7 years. The primary outcome measured was discrimination, reported as areas under the curve.
The pooled AUC was 0.79 with a 95% CI of 0.69-0.87. Fall incidence ranged from 1.6% to 46.6% across the included studies. The prediction horizon varied significantly, spanning from 3 months to 7 years. Extreme heterogeneity was noted with an I2 of 99.8% and a tau-squared of 0.64.
All 28 models were rated at high risk of bias. Limitations include wide prediction intervals, sparse calibration reporting, and limited external validation, with only one model undergoing external validation. Some cohorts were recruited through clinically enriched pathways rather than general community sampling. Subgroup analysis included only 2 studies.
The authors note that the pooled AUC should not be interpreted as a robust estimate of transportable real-world performance. Real-world performance may be optimistic. Findings support the use of these models for proactive fall prevention while emphasizing the need for validation and context-specific implementation.
View Original Abstract ↓
BACKGROUND: Machine learning (ML) and deep learning (DL) show promise for fall risk prediction, but prior reviews focused mainly on real-time fall detection, in-hospital falls, or conventional statistical models. The performance of ML-DL-based models for predicting future falls in community-dwelling older adults remains unclear.
OBJECTIVE: This study aimed to review ML-DL studies for predicting future falls among community-dwelling older adults and meta-analyze discrimination where feasible.
METHODS: Six databases were searched from inception to September 23, 2024, with updates on August 31, 2025, and February 28, 2026. We included longitudinal studies developing or validating ML-DL models to predict future falls in community-dwelling adults aged ≥60 years and excluded real-time detection, simulated or no fall, and inpatient studies. Risk of bias was assessed using the Prediction Model Risk of Bias Assessment Tool (PROBAST). Areas under the curve (AUCs) were meta-analyzed using Hartung-Knapp-Sidik-Jonkman random-effects models with 95% CIs. Heterogeneity, 95% prediction intervals (PIs), sensitivity analyses, and subgroup analyses were conducted.
RESULTS: After screening 10,253 records, 28 (0.3%) studies were included; 18 (64.3%) focused on general older adults. Prediction horizons ranged from 3 months to 7 years, and fall incidence ranged from 1.6% to 46.6%. Twenty-three (82.1%) studies applied ML, and 5 (17.9%) studies used DL. Input modalities included text (n=18, 64.3%), sensor (n=5, 17.9%), image (n=1, 3.6%), and multimodal data (n=4, 14.3%). Common predictors included age, sex, fall history, depression, and basic daily activities. Only one model underwent external validation. Calibration reporting was sparse. All models were rated at high risk of bias. Ten models were meta-analyzed, yielding a pooled AUC of 0.79 (95% CI 0.69-0.87) with extreme heterogeneity (τ2=0.64; τ=0.80; I2=99.8%; Q=4128.99). The confidence-distribution bootstrap PI was 0.20 to 0.99, indicating substantial uncertainty in expected performance across new populations. Subgroup analyses indicated moderation by sample size and population type, with higher discrimination in specific populations than in general samples; however, the specific population subgroup included only 2 studies. Although all participants were community dwelling, some cohorts were recruited through clinically enriched pathways rather than general community sampling.
CONCLUSIONS: ML-DL models show potential for identifying community-dwelling older adults at elevated future fall risk; however, wide PIs, limited external validation, and high risk of bias suggest real-world performance may be optimistic. The pooled AUC should be interpreted as a summary of reported discrimination under study-specific conditions, predominantly from internally validated, high-risk-of-bias models, rather than as a robust estimate of transportable real-world performance. This review extends prior reviews by focusing on community-dwelling settings and by integrating PROBAST, Hartung-Knapp-Sidik-Jonkman meta-analysis, PIs, and modality-specific synthesis to evaluate both discrimination and uncertainty. Findings support the use of ML-DL models for proactive fall prevention while emphasizing the need for validation and context-specific implementation.