Meta-analysis shows radiomics-based machine learning models predict recurrence risk in non-small cell lung cancer patients.
This systematic review and meta-analysis assessed the accuracy of radiomics-based machine learning models in predicting the risk of recurrence among patients with non-small cell lung cancer (NSCLC). The study pooled data from multiple sources, encompassing a total sample size of 7,964 patients. The setting of the included studies was not reported in the source data. The primary outcome measured was the concordance index (c-index), which quantifies the ability of a model to correctly rank patients by their risk of recurrence. No specific comparator group was defined in the analysis, as the focus was on the intrinsic performance of the models.
The analysis reported distinct performance metrics based on the model composition and patient treatment history. In the training set, the c-index for radiomics-based machine learning models alone was 0.850 (95% CI 0.834-0.866, 95% PI 0.623-1.004). When the models were applied specifically to patients receiving stereotactic body radiation therapy, the c-index improved to 0.876 (95% CI 0.853-0.900). For patients undergoing surgery combined with other adjuvant treatment regimens, the c-index was 0.825 (95% CI 0.804-0.848).
The predictive performance also varied depending on whether clinical features were integrated with the radiomics data. In the training set, models combining radiomics with clinical features yielded a c-index of 0.833 (95% CI 0.822-0.854, 95% PI 0.717-0.945). In the validation set, the standalone radiomics model achieved a c-index of 0.878 (95% CI 0.854-0.902, 95% PI 0.681-1.000), while the combined model with clinical features resulted in a c-index of 0.854 (95% CI 0.830-0.878, 95% PI 0.655-0.992).
Safety and tolerability data were not reported in the included studies. Consequently, no adverse event rates, serious adverse events, discontinuation rates, or general tolerability profiles could be determined from this meta-analysis. The absence of this data limits the ability to assess the clinical safety of implementing these models in routine practice.
A critical limitation identified in this analysis was the methodological heterogeneity and lack of standardization across the included studies. The average Radiomics Quality Score (RQS) was 27.4%, indicating significant variability in study design and reporting quality. This low average score suggests that the evidence base is currently immature and prone to bias. Furthermore, the study phase was not reported, and funding sources or potential conflicts of interest were not disclosed.
These findings provide evidence-based support for the subsequent development or updating of radiomics-based machine learning models. However, the results should not be overextended to claim immediate clinical utility or generalizability beyond the specific cohorts included in the meta-analysis. The observed associations reflect predictive accuracy but do not establish causation or guarantee improved patient outcomes in diverse clinical settings.
Several important questions remain unanswered. The lack of standardization in radiomics acquisition and analysis protocols raises concerns about reproducibility. Additionally, the absence of safety data and the low quality scores suggest that further high-quality prospective studies are needed before these models can be reliably integrated into clinical decision-making pathways for NSCLC management.