This cohort study assessed the CIPHER model for predicting immune checkpoint inhibitor-induced pneumonitis in patients with non-small cell lung cancer. The analysis included a development cohort of 254 patients, an internal validation cohort of 93 patients, and an external validation cohort of 116 patients. Baseline CT scans served as the exposure, compared against classical radiomic models.
In the internal immunotherapy cohort, the CIPHER model distinguished patients at elevated risk from those without the event, with AUCs ranging from 0.77 to 0.85. Head-to-head benchmarking showed the CIPHER AUC was 0.83, outperforming the radiomic models. In the external validation cohort, CIPHER maintained strong performance with an AUC of 0.83 and balanced accuracy of 81.7%. The radiomic model comparison yielded a DeLong p value of 0.0318.
Confusion matrix analysis for CIPHER indicated it correctly identified 80 of 96 non-ICI-P cases and 16 of 20 ICI-P cases. While the radiomic model demonstrated high sensitivity of 85.0%, its specificity was markedly lower at 45.8%. Safety data, adverse events, and discontinuations were not reported in this study.
A key limitation is that prospective validation is required. With prospective validation, CIPHER may be incorporated into routine patient management to improve outcomes, though this remains a future possibility rather than a current standard.
View Original Abstract ↓
Background: Immune checkpoint inhibitors (ICIs) have revolutionized cancer therapy but can cause serious immune-related adverse events (irAEs), with pneumonitis (ICI-P) being among the most severe. Early identification of high-risk patients before ICI initiation is critical for closer monitoring, timely intervention, and improved outcomes. Purpose: To develop and validate a deep learning foundation model to predict ICI-P from baseline CT scans in patients with lung cancer. Methods: We designed the Checkpoint-Inhibitor Pneumonitis Hazard EstimatoR (CIPHER), a deep learning foundation model that combines contrastive learning with a transformer-based masked autoencoder to predict ICI-P from baseline CT scans in patients with lung cancer. Using self-supervised learning, CIPHER was pre-trained on 590,284 CT slices from 2,500 non-small cell lung cancer (NSCLC) patients to capture heterogeneous lung parenchymal patterns. After pre-training, the model was fine-tuned on an internal NSCLC cohort for ICI-P risk prediction, using images from 254 patients for model development and 93 patients for internal validation. We compared CIPHER with classical radiomic models and further evaluated it on an external NSCLC cohort of 116 patients. Results: In the internal immunotherapy cohort, CIPHER consistently distinguished patients at elevated risk of ICI-P from those without the event, with AUCs ranging from 0.77 to 0.85. In head-to-head benchmarking, CIPHER achieved an AUC of 0.83, outperforming the radiomic models. In the external validation cohort, CIPHER maintained strong performance (AUC = 0.83; balanced accuracy = 81.7%), exceeding the radiomic models (DeLong p = 0.0318) and demonstrating higher specificity without sacrificing sensitivity. By contrast, the radiomic model showed high sensitivity (85.0%) but markedly lower specificity (45.8%). Confusion matrix analysis confirmed the robust classification performance of CIPHER, correctly identifying 80 of 96 non-ICI-P cases and 16 of 20 ICI-P cases. Conclusions: We developed and externally validated CIPHER for predicting future risk of ICI-P from pre-treatment CT scans. With prospective validation, CIPHER may be incorporated into routine patient management to improve outcomes.