Mode
Text Size
Log in / Sign up

Machine learning framework predicts CD4 and CD8 counts in people living with HIV

Machine learning framework predicts CD4 and CD8 counts in people living with HIV
Photo by Faustina Okeke / Unsplash
Key Takeaway
Consider this predictive model for immune markers in HIV, but note it requires external validation before clinical implementation.

This retrospective cohort study developed an ensemble machine learning framework to predict longitudinal CD4+ count, CD8+ count, and CD4/CD8 ratio in people living with HIV. The model was trained and tested on a real-world dataset of 5,436 patients, with an independent test set of 1,088 patients.

The intervention was a heterogeneous stacking ensemble of XGBoost, LightGBM, Random Forest, Gradient Boosting, and Ridge regression. The comparator was a baseline Robust Transformer model. For CD4+ count prediction in the test set (n=1,088), the model achieved an R2 of 0.768 and a mean absolute error (MAE) of 74.8 cells/μL, representing a relative improvement in R2 of 66.4% compared to the baseline.

For CD8+ count prediction (n=1,088), the model achieved an R2 of 0.636 and an MAE of 300.5 cells/μL, with a relative improvement in R2 of 128.6% compared to the baseline. For CD4/CD8 ratio prediction (n=1,088), the model achieved an R2 of 0.131 and an MAE of 0.137.

Safety and tolerability data were not reported. A key limitation is that the model was trained and tested using only demographic and clinical features while explicitly excluding baseline CD4+/CD8+ counts. The practice relevance is that this provides a robust and clinically applicable tool for forecasting multi-dimensional immune reconstitution in HIV care, though causal claims regarding immune reconstitution are not supported.

Study Details

Study typeCohort
EvidenceLevel 3
PublishedApr 2026
View Original Abstract ↓
Accurate prediction of long-term CD4+ T-cell recovery trajectories in people living with HIV on antiretroviral therapy (ART) is a crucial unmet need for personalized monitoring and treatment optimization. Traditional statistical models have limited ability to capture the complex, non-linear relationships inherent in longitudinal clinical data. We developed a heterogeneous stacking ensemble framework to predict longitudinal CD4+ count, CD8+ count, and CD4/CD8 ratio. The model integrates four tree-based algorithms—XGBoost, LightGBM, Random Forest, and Gradient Boosting—with a Ridge regression meta-learner. It was trained and tested on a retrospective cohort of 5,436 patients who initiated ART between 2016 and 2025, using only demographic and clinical features while explicitly excluding baseline CD4+/CD8+ counts to prevent data leakage. On an independent test set (n=1,088), the ensemble achieved an R2 of 0.768 (MAE: 74.8 cells/μL) for CD4+ count, 0.636 (MAE: 300.5 cells/μL) for CD8+ count, and 0.131 (MAE: 0.137) for the CD4/CD8 ratio. This represents a relative improvement in R2 of 66.4% for CD4+ and 128.6% for CD8+ predictions compared to a baseline Robust Transformer model. The model accurately replicated the statistical distributions of observed outcomes and demonstrated stable learning dynamics without overfitting. Our ensemble learning framework provides a robust and clinically applicable tool for forecasting multi-dimensional immune reconstitution in HIV care. By synthesizing diverse algorithmic perspectives without relying on baseline immunology, it offers a foundation for data-driven clinical decision support to personalize long-term treatment monitoring.
Free Newsletter

Clinical research that matters. Delivered to your inbox.

Join thousands of clinicians and researchers. No spam, unsubscribe anytime.