Mode
Text Size
Log in / Sign up

Systematic review and meta-analysis of AI/ML models for TB treatment failure prediction

Systematic review and meta-analysis of AI/ML models for TB treatment failure prediction
Photo by Navy Medicine / Unsplash
Key Takeaway
Interpret AI/ML TB treatment failure models cautiously; pooled AUC 0.836 but high heterogeneity and limited validation preclude clinical use.

This systematic review and meta-analysis assessed artificial intelligence and machine learning (AI/ML) prediction models for tuberculosis (TB) treatment failure. The analysis included 100,790 patients receiving anti-TB treatment. The primary outcome was the area under the curve (AUC), a measure of model discrimination.

The pooled AUC was 0.836 (95% confidence interval 0.799-0.868), indicating promising overall performance. However, performance varied substantially across subgroups: AUC was 0.748 in HIV-positive participants and 0.924 in studies excluding HIV-positive participants. The authors noted substantial heterogeneity (I² = 97.9%) and evidence of publication bias (p = 0.024).

Important limitations include underrepresentation of high-burden countries, HIV-affected populations, social determinants, pediatric TB, and extrapulmonary disease. Only 8 studies (23.5%) performed external validation, and only one study (2.9%) was rated as low risk of bias overall. Inadequate validation and poor calibration assessment were common.

The authors conclude that while AI/ML models show promising discrimination, they are not yet ready for routine clinical implementation. Performance varies substantially across populations and settings, and further rigorous validation in diverse, real-world settings is needed before these models can be considered for clinical use.

Study Details

Study typeMeta analysis
EvidenceLevel 1
PublishedApr 2026
View Original Abstract ↓
Background: Tuberculosis (TB) remains a leading cause of infectious disease mortality worldwide, and treatment failure contributes to ongoing transmission, drug resistance, and poor clinical outcomes. Artificial intelligence and machine learning approaches have attracted growing interest for predicting tuberculosis treatment outcomes, but the literature is heterogeneous and lacks a comprehensive synthesis. Methods: We conducted a systematic review and meta-analysis of studies that developed or validated machine learning models to predict TB treatment failure. We searched PubMed/MEDLINE and Embase from January 2000 to October 2025. Studies were eligible if they developed, validated, or implemented an artificial intelligence or machine learning model for the prediction of TB treatment failure or a closely related poor outcome in patients receiving anti-TB treatment. Risk of bias was assessed using the Prediction model Risk Of Bias Assessment Tool. Random-effects meta-analysis was performed to pool area under the curve values, with subgroup analyses and meta-regression to explore heterogeneity. Results: Thirty-four studies were included in the systematic review, of which 19 reported area under the curve values suitable for meta-analysis (total participants, 100,790). Studies were published between 2014 and 2025, with 91% published from 2019 onward. Tree-based methods were the most common algorithm family (52.9%), and multimodal models integrating three or more data types were used in 41.2% of studies. The pooled area under the curve was 0.836 (95% confidence interval 0.799-0.868), with substantial heterogeneity (I{superscript 2} = 97.9%). In subgroup analyses, studies including HIV-positive participants showed lower discrimination (pooled area under the curve 0.748) compared to those excluding them (0.924). Only eight studies (23.5%) performed external validation, and only one study (2.9%) was rated as low risk of bias overall, primarily due to methodological concerns in the analysis domain. Egger's test suggested publication bias (p = 0.024). Major evidence gaps included underrepresentation of high-burden countries, HIV-affected populations, social determinants, pediatric TB, and extrapulmonary disease. Conclusions: Machine learning models for predicting TB treatment failure show promising discrimination but are not yet ready for routine clinical implementation. Performance varies substantially across populations and settings, and methodological limitations, including inadequate validation, poor calibration assessment, and high risk of bias, limit confidence in current estimates. Future research should prioritize rigorous external validation, calibration assessment, and development in underrepresented populations, particularly HIV-affected and high-burden settings.
Free Newsletter

Clinical research that matters. Delivered to your inbox.

Join thousands of clinicians and researchers. No spam, unsubscribe anytime.