This retrospective cohort study evaluated the performance of normal tissue complication probability (NTCP) models for predicting radiation pneumonitis (RP) in lung cancer patients treated with thoracic IMRT and multimodal therapy. The development cohort included 580 patients treated between 2018 and 2023, and external validation was performed in 100 patients from an independent center.
The study compared the original QUANTEC and Appelt NTCP models with a simplified local model (Model D). The QUANTEC and Appelt models showed substantial calibration bias, with systematic underestimation of RP risk. In the development cohort, Model D demonstrated the best apparent overall performance, with an AUC of 0.708, Brier score 0.215, calibration-in-the-large (CITL) of 0, calibration slope of 1, and Hosmer–Lemeshow test P = 0.599.
In the external validation cohort, Model D showed similar discrimination and prediction error (AUC 0.718, 95% CI 0.576–0.831; Brier 0.207), but absolute RP risk was overestimated (CITL = −1.043, slope = 1.133; P < 0.001). Safety outcomes were not reported.
Key limitations include the retrospective design and the small external validation sample of 100 patients. The study suggests that simple recalibration may improve absolute risk estimation when applying NTCP models in new settings. Clinicians should interpret these models cautiously and consider local recalibration.
View Original Abstract ↓
ObjectiveTo externally evaluate and update the QUANTEC and Appelt NTCP models for radiation pneumonitis (RP) in lung cancer patients treated with contemporary IMRT and multimodal therapy, and to preliminarily validate a simplified local model in an independent cohort.MethodsWe retrospectively analyzed 580 lung cancer patients treated with thoracic IMRT between 2018 and 2023 as the development cohort. The QUANTEC and Appelt models were evaluated and locally updated using a closed testing procedure to determine the least extensive revision required. Clinical and DVH variables were standardized, and smoking status and pulmonary comorbidity were recoded according to published definitions. A final simplified local model (Model D) was developed using BIC-guided multivariable logistic regression with regularization. Performance was assessed by AUC, Brier score, calibration-in-the-large (CITL), calibration slope, Hosmer–Lemeshow test, and decision curve analysis. External validation of Model D was performed in 100 patients from an independent center using fixed coefficients.ResultsBoth the QUANTEC and Appelt models showed substantial calibration bias in the local cohort, with systematic underestimation of RP risk. Updating improved calibration as expected, with little change in discrimination. Model D, incorporating age, stage, smoking status, tumor location, pulmonary comorbidity, NLR, SII, V30, and MLD, showed the best apparent overall performance in the development cohort (AUC 0.708, Brier 0.215, CITL = 0, slope = 1, Hosmer–Lemeshow P = 0.599). In the external cohort, discrimination and prediction error were similar (AUC 0.718, 95% CI 0.576–0.831; Brier 0.207), although absolute RP risk was overestimated (CITL = −1.043, slope = 1.133, Hosmer–Lemeshow P < 0.001).ConclusionsThe original QUANTEC and Appelt models underestimated RP risk in this contemporary IMRT cohort. Updating improved calibration, whereas discrimination changed little. Model D showed better apparent overall performance and preserved ranking ability in an independent external cohort. Calibration drift across centers suggests that simple recalibration may improve absolute risk estimation in new settings.Clinical trial registrationhttps://www.chictr.org.cn/hvshowproject.html?id=276191&v=1.1, identifier ChiCTR2500102055.