Mode
Text Size
Log in / Sign up

Systematic review of machine learning models for fall prediction in community-dwelling older adults shows high biasMachine learning models predict falls in older adults but need more testing before doctors use them widely

AI-generated summary of the cited source, checked by automated accuracy review. How we work

Key Takeaway
Note high bias and wide prediction intervals in ML models for fall prediction in older adults.

This systematic review and meta-analysis examines the performance of machine learning and deep learning models designed to predict falls in community-dwelling settings. The scope includes 28 studies, of which 18 focused on general older adults, with a follow-up duration ranging from 3 months to 7 years. The primary outcome measured was discrimination, reported as areas under the curve.

The pooled AUC was 0.79 with a 95% CI of 0.69-0.87. Fall incidence ranged from 1.6% to 46.6% across the included studies. The prediction horizon varied significantly, spanning from 3 months to 7 years. Extreme heterogeneity was noted with an I2 of 99.8% and a tau-squared of 0.64.

All 28 models were rated at high risk of bias. Limitations include wide prediction intervals, sparse calibration reporting, and limited external validation, with only one model undergoing external validation. Some cohorts were recruited through clinically enriched pathways rather than general community sampling. Subgroup analysis included only 2 studies.

The authors note that the pooled AUC should not be interpreted as a robust estimate of transportable real-world performance. Real-world performance may be optimistic. Findings support the use of these models for proactive fall prevention while emphasizing the need for validation and context-specific implementation.

Researchers looked at many computer programs that try to predict if older people will fall. These programs use special math to look at health data and guess the future. The average accuracy for these tools was about seventy-nine percent. This means they get the answer right roughly four out of five times.

However, the results were very different between studies. Some programs guessed correctly often, while others were not very good at all. The time they looked into the future also changed, from three months up to seven years. Because of this big difference, it is hard to know exactly how well these tools will work in your specific area.

Most of the studies had serious problems with their design. Many only tested the program on the same people who helped build it. This is like testing a new car only on the road where it was made. Only one program was tested on a completely different group of people. Also, the studies did not report enough details about how well the tools matched real life.

Even though these tools show promise, doctors should be careful. The tools might seem better than they really are in the real world. It is important to test them carefully in your own community before using them to help patients stay safe.

What this means for you:
Computer tools can help predict falls, but they need more testing to be trusted by doctors and patients.

Study Details

Study typeMeta analysis
Sample sizen = 18
EvidenceLevel 1
Follow-up720.0 mo
PublishedMay 2026
View Original Abstract ↓
BACKGROUND: Machine learning (ML) and deep learning (DL) show promise for fall risk prediction, but prior reviews focused mainly on real-time fall detection, in-hospital falls, or conventional statistical models. The performance of ML-DL-based models for predicting future falls in community-dwelling older adults remains unclear. OBJECTIVE: This study aimed to review ML-DL studies for predicting future falls among community-dwelling older adults and meta-analyze discrimination where feasible. METHODS: Six databases were searched from inception to September 23, 2024, with updates on August 31, 2025, and February 28, 2026. We included longitudinal studies developing or validating ML-DL models to predict future falls in community-dwelling adults aged ≥60 years and excluded real-time detection, simulated or no fall, and inpatient studies. Risk of bias was assessed using the Prediction Model Risk of Bias Assessment Tool (PROBAST). Areas under the curve (AUCs) were meta-analyzed using Hartung-Knapp-Sidik-Jonkman random-effects models with 95% CIs. Heterogeneity, 95% prediction intervals (PIs), sensitivity analyses, and subgroup analyses were conducted. RESULTS: After screening 10,253 records, 28 (0.3%) studies were included; 18 (64.3%) focused on general older adults. Prediction horizons ranged from 3 months to 7 years, and fall incidence ranged from 1.6% to 46.6%. Twenty-three (82.1%) studies applied ML, and 5 (17.9%) studies used DL. Input modalities included text (n=18, 64.3%), sensor (n=5, 17.9%), image (n=1, 3.6%), and multimodal data (n=4, 14.3%). Common predictors included age, sex, fall history, depression, and basic daily activities. Only one model underwent external validation. Calibration reporting was sparse. All models were rated at high risk of bias. Ten models were meta-analyzed, yielding a pooled AUC of 0.79 (95% CI 0.69-0.87) with extreme heterogeneity (τ2=0.64; τ=0.80; I2=99.8%; Q=4128.99). The confidence-distribution bootstrap PI was 0.20 to 0.99, indicating substantial uncertainty in expected performance across new populations. Subgroup analyses indicated moderation by sample size and population type, with higher discrimination in specific populations than in general samples; however, the specific population subgroup included only 2 studies. Although all participants were community dwelling, some cohorts were recruited through clinically enriched pathways rather than general community sampling. CONCLUSIONS: ML-DL models show potential for identifying community-dwelling older adults at elevated future fall risk; however, wide PIs, limited external validation, and high risk of bias suggest real-world performance may be optimistic. The pooled AUC should be interpreted as a summary of reported discrimination under study-specific conditions, predominantly from internally validated, high-risk-of-bias models, rather than as a robust estimate of transportable real-world performance. This review extends prior reviews by focusing on community-dwelling settings and by integrating PROBAST, Hartung-Knapp-Sidik-Jonkman meta-analysis, PIs, and modality-specific synthesis to evaluate both discrimination and uncertainty. Findings support the use of ML-DL models for proactive fall prevention while emphasizing the need for validation and context-specific implementation.
Free Newsletter

Clinical research that matters. Delivered to your inbox.

Join thousands of clinicians and researchers. No spam, unsubscribe anytime.