Mode
Text Size
Log in / Sign up

Systematic review and meta-analysis of AI/ML models for TB treatment failure predictionAI models show potential to predict tuberculosis treatment outcomes

AI-generated summary of the cited source, checked by automated accuracy review. How we work

Key Takeaway
Interpret AI/ML TB treatment failure models cautiously; pooled AUC 0.836 but high heterogeneity and limited validation preclude clinical use.

This systematic review and meta-analysis assessed artificial intelligence and machine learning (AI/ML) prediction models for tuberculosis (TB) treatment failure. The analysis included 100,790 patients receiving anti-TB treatment. The primary outcome was the area under the curve (AUC), a measure of model discrimination.

The pooled AUC was 0.836 (95% confidence interval 0.799-0.868), indicating promising overall performance. However, performance varied substantially across subgroups: AUC was 0.748 in HIV-positive participants and 0.924 in studies excluding HIV-positive participants. The authors noted substantial heterogeneity (I² = 97.9%) and evidence of publication bias (p = 0.024).

Important limitations include underrepresentation of high-burden countries, HIV-affected populations, social determinants, pediatric TB, and extrapulmonary disease. Only 8 studies (23.5%) performed external validation, and only one study (2.9%) was rated as low risk of bias overall. Inadequate validation and poor calibration assessment were common.

The authors conclude that while AI/ML models show promising discrimination, they are not yet ready for routine clinical implementation. Performance varies substantially across populations and settings, and further rigorous validation in diverse, real-world settings is needed before these models can be considered for clinical use.

This review looked at data from over 100,000 patients to see if computer programs could guess who might stop working well on tuberculosis medicine. The results suggest these tools can work, but they are not ready for doctors to use right now in regular care.

Researchers found that the computer models were fairly good at spotting risk, scoring about 0.84 out of 1.0 overall. However, performance dropped when looking at people with HIV, and results varied widely between different studies. Only a small number of studies checked if the models worked in new groups of people.

There are important gaps in the data. Many studies did not include children, people with HIV, or those from countries with high tuberculosis rates. Most studies also had high risks of bias, meaning the results might not be fully reliable.

While the technology shows potential, patients should not rely on these models yet. More research is needed to make sure they work safely and fairly for everyone before they become part of standard treatment plans.

What this means for you:
AI models show promise for predicting tuberculosis treatment failure, but more proof is needed before routine use.

Study Details

Study typeMeta analysis
EvidenceLevel 1
PublishedApr 2026
View Original Abstract ↓
Background: Tuberculosis (TB) remains a leading cause of infectious disease mortality worldwide, and treatment failure contributes to ongoing transmission, drug resistance, and poor clinical outcomes. Artificial intelligence and machine learning approaches have attracted growing interest for predicting tuberculosis treatment outcomes, but the literature is heterogeneous and lacks a comprehensive synthesis. Methods: We conducted a systematic review and meta-analysis of studies that developed or validated machine learning models to predict TB treatment failure. We searched PubMed/MEDLINE and Embase from January 2000 to October 2025. Studies were eligible if they developed, validated, or implemented an artificial intelligence or machine learning model for the prediction of TB treatment failure or a closely related poor outcome in patients receiving anti-TB treatment. Risk of bias was assessed using the Prediction model Risk Of Bias Assessment Tool. Random-effects meta-analysis was performed to pool area under the curve values, with subgroup analyses and meta-regression to explore heterogeneity. Results: Thirty-four studies were included in the systematic review, of which 19 reported area under the curve values suitable for meta-analysis (total participants, 100,790). Studies were published between 2014 and 2025, with 91% published from 2019 onward. Tree-based methods were the most common algorithm family (52.9%), and multimodal models integrating three or more data types were used in 41.2% of studies. The pooled area under the curve was 0.836 (95% confidence interval 0.799-0.868), with substantial heterogeneity (I{superscript 2} = 97.9%). In subgroup analyses, studies including HIV-positive participants showed lower discrimination (pooled area under the curve 0.748) compared to those excluding them (0.924). Only eight studies (23.5%) performed external validation, and only one study (2.9%) was rated as low risk of bias overall, primarily due to methodological concerns in the analysis domain. Egger's test suggested publication bias (p = 0.024). Major evidence gaps included underrepresentation of high-burden countries, HIV-affected populations, social determinants, pediatric TB, and extrapulmonary disease. Conclusions: Machine learning models for predicting TB treatment failure show promising discrimination but are not yet ready for routine clinical implementation. Performance varies substantially across populations and settings, and methodological limitations, including inadequate validation, poor calibration assessment, and high risk of bias, limit confidence in current estimates. Future research should prioritize rigorous external validation, calibration assessment, and development in underrepresented populations, particularly HIV-affected and high-burden settings.
Free Newsletter

Clinical research that matters. Delivered to your inbox.

Join thousands of clinicians and researchers. No spam, unsubscribe anytime.