This was a cohort study using data from the U.S. SEER database (1,071 patients) and a Chinese cohort (198 patients) to develop and validate machine learning models for predicting survival in patients with lymph-node-positive medullary thyroid carcinoma. The study compared LightGBM, XGBoost, RF, MLP, and KNN algorithms for predicting 3-year and 5-year overall survival (OS) and cancer-specific survival (CSS).
The primary outcome was survival prediction accuracy, measured by AUC. In the SEER dataset, LightGBM achieved the highest AUC for 3-year OS (0.833) and 5-year OS (0.892). In external validation, LightGBM also achieved the highest AUC for 5-year OS (0.869). Secondary outcomes included F1-score, MCC, sensitivity, specificity, calibration curve, and decision-curve analysis, though specific values for these were not reported.
Safety and tolerability data were not reported for any model, as this was a computational study. Key limitations include the observational nature of the cohort design, which cannot establish causality, and the lack of reported p-values or confidence intervals for the AUC results. The study did not report specific limitations.
Practice relevance includes the development of an online calculator for predicting 3- and 5-year OS and CSS in this patient population. However, the evidence is early and observational, and the models require prospective validation before clinical implementation.
View Original Abstract ↓
BackgroundMedullary thyroid carcinoma (MTC) carries a disproportionately high mortality among thyroid malignancies, and the risk is even greater once metastasis occurs; nevertheless, a dependable prognostic tool for lymph-node-positive MTC patients remains elusive. We aimed to derive and externally validate a machine learning model for predicting 3-year and 5-year overall survival (OS) and cancer-specific survival (CSS) in this high-risk population.MethodsRetrospective cohorts were assembled from the U.S. SEER database (n = 1,071) and Zibo Municipal Hospital (external validation, n = 198). After feature selection (Cox, Boruta, RFE), five algorithms (LightGBM, XGBoost, RF, MLP, KNN) were trained in 70% SEER data and tested in the remaining 30% and in the Chinese cohort. F1-score, MCC, sensitivity, specificity, AUC, calibration curve, and decision-curve analysis were evaluated; model explainability was assessed with SHAP.ResultsIn OS prediction, LightGBM achieved the highest AUC in both time horizons (SEER 3-year 0.833, 5-year 0.892; external 5-year 0.869), with superior accuracy. Calibration curves lay closest to the 45° diagonal, and decision-curve analysis demonstrated the greatest net benefit across clinically relevant risk thresholds. SHAP revealed the absence of surgery as the strongest adverse contributor for OS, followed by advanced age, larger tumour size, higher LNR, radiotherapy and chemotherapy demonstrated adverse effects. The same pattern emerges when predicting CSS. Based on these results, we developed an online calculator for predicting 3- and 5-year OS and CSS in patients with lymph-node-positive MTC.ConclusionLightGBM model provides an accurate, well-calibrated, and clinically useful tool for estimating survival in lymph-node-positive MTC. In addition, the decision to undergo surgery is considered the most important factor in the survival of MTC patients.