This cohort study evaluated the cross-cohort deployment of plasma biomarker-based machine learning models for Alzheimer's disease in the ADNI (n=885) and A4 (n=822) cohorts. The primary outcome was negative predictive value (NPV), with secondary outcomes including ROC AUC, accuracy, R2, RMSE, calibration, and net clinical benefit. Within cohorts, the models demonstrated high discrimination, with ROC AUCs of 0.913 in ADNI and 0.870 in A4, and moderate performance for centiloid prediction (R2 of 0.628 in ADNI and 0.535 in A4). However, cross-cohort deployment led to a decline in NPV from 0.831 to 0.644 and modest AUC attenuation of approximately 4-7%, indicating reduced generalizability.
Safety and tolerability data were not reported, limiting assessment of adverse events or discontinuations. Key limitations include that the impact of cross-cohort deployment on clinically actionable metrics like NPV remains poorly characterized, and calibration instability and prevalence differences critically affect NPV. These factors highlight the need for further validation before clinical use.
In practice, this study underscores the importance of cross-cohort validation, calibration assessment, and assay harmonization before implementing such models in clinical settings. The findings are observational and based on surrogate outcomes, so they should not be overinterpreted as causal or directly applicable to clinical decision-making without additional evidence.
View Original Abstract ↓
BackgroundPlasma biomarkers demonstrate strong within-cohort performance for identifying cerebral amyloid pathology, but their real-world clinical utility depends on generalization across populations and assay platforms. The impact of cross-cohort deployment on clinically actionable metrics such as negative predictive value (NPV) remains poorly characterized.
ObjectiveTo evaluate the performance and portability of plasma biomarker-based machine learning models for amyloid PET prediction across independent cohorts, with emphasis on calibration and clinically relevant predictive values.
MethodsData from ADNI (n=885) and A4 (n=822) were analyzed. Machine learning models were trained within each cohort to predict amyloid PET status and continuous amyloid burden (centiloids). Performance was assessed using ROC AUC, accuracy, R{superscript 2}, and RMSE. Cross-cohort generalizability was evaluated using bidirectional transfer without retraining. Calibration, predictive values, and decision curve analysis were used to assess clinical utility.
ResultsWithin-cohort discrimination was high (AUC up to 0.913 in ADNI and 0.870 in A4), with moderate performance for centiloid prediction (R{superscript 2} up to 0.628 and 0.535, respectively). Cross-cohort deployment resulted in modest attenuation of AUC ([~]4-7%) but substantially greater degradation in clinically actionable performance. NPV declined from 0.831 to 0.644 under ADNI[->]A4 transfer ([~]19 percentage points) despite preserved discrimination. Calibration analyses demonstrated systematic probability misestimation, and decision curve analysis showed reduced net clinical benefit. Biomarker distribution differences across cohorts were consistent with dataset shift.
ConclusionPlasma biomarker models retain discrimination across cohorts but exhibit clinically meaningful degradation in predictive value under deployment. Calibration instability and prevalence differences critically affect NPV, highlighting the need for cross-cohort validation, calibration assessment, and assay harmonization before clinical implementation.