Mode
Text Size
Log in / Sign up

Plasma biomarker models show high discrimination but NPV decline in cross-cohort Alzheimer's validationCan a blood test predict Alzheimer's in new patients? The answer depends on where the test was developed

AI-generated summary of the cited source, checked by automated accuracy review. How we work

Key Takeaway
Consider cross-cohort validation and calibration issues before using plasma biomarker models for Alzheimer's.

This cohort study evaluated the cross-cohort deployment of plasma biomarker-based machine learning models for Alzheimer's disease in the ADNI (n=885) and A4 (n=822) cohorts. The primary outcome was negative predictive value (NPV), with secondary outcomes including ROC AUC, accuracy, R2, RMSE, calibration, and net clinical benefit. Within cohorts, the models demonstrated high discrimination, with ROC AUCs of 0.913 in ADNI and 0.870 in A4, and moderate performance for centiloid prediction (R2 of 0.628 in ADNI and 0.535 in A4). However, cross-cohort deployment led to a decline in NPV from 0.831 to 0.644 and modest AUC attenuation of approximately 4-7%, indicating reduced generalizability.

Safety and tolerability data were not reported, limiting assessment of adverse events or discontinuations. Key limitations include that the impact of cross-cohort deployment on clinically actionable metrics like NPV remains poorly characterized, and calibration instability and prevalence differences critically affect NPV. These factors highlight the need for further validation before clinical use.

In practice, this study underscores the importance of cross-cohort validation, calibration assessment, and assay harmonization before implementing such models in clinical settings. The findings are observational and based on surrogate outcomes, so they should not be overinterpreted as causal or directly applicable to clinical decision-making without additional evidence.

Imagine a doctor using a blood test to check for Alzheimer's disease. The goal is simple: if the test says you are safe, you should be safe. But a new study shows this isn't always true when moving between different groups of patients. Researchers looked at nearly 1,700 people from two large groups, the ADNI and A4 cohorts, who were being monitored for this condition. They used advanced computer models to read blood samples and predict the disease.

The models showed strong promise when tested on the people they were originally built for. In the first group, the model correctly identified those without the disease 91% of the time. In the second group, it still did very well at 87%. These numbers sound great, but there is a catch when you try to use the same test on a completely different group of people.

When the model trained on the first group was applied to the second group, its ability to rule out the disease fell sharply. The chance of a false alarm went up, meaning a negative result could be misleading. The study also found that the model's predictions for the disease stage were only moderate. The researchers did not report any safety issues or side effects because the test involves blood draws, not new drugs. However, the main problem is that the test does not work equally well everywhere.

This study highlights a critical gap in how we use these new tools. The ability of the test to give a reliable 'all clear' is not stable across different patient groups. Before doctors can trust these blood tests in real clinics, we need to fix how they are calibrated and ensure they work for everyone, not just the people they were first tested on.

What this means for you:
Blood tests for Alzheimer's may give false reassurance if used outside the specific group they were trained on.

Study Details

Study typeCohort
Sample sizen = 885
EvidenceLevel 3
PublishedApr 2026
View Original Abstract ↓
BackgroundPlasma biomarkers demonstrate strong within-cohort performance for identifying cerebral amyloid pathology, but their real-world clinical utility depends on generalization across populations and assay platforms. The impact of cross-cohort deployment on clinically actionable metrics such as negative predictive value (NPV) remains poorly characterized. ObjectiveTo evaluate the performance and portability of plasma biomarker-based machine learning models for amyloid PET prediction across independent cohorts, with emphasis on calibration and clinically relevant predictive values. MethodsData from ADNI (n=885) and A4 (n=822) were analyzed. Machine learning models were trained within each cohort to predict amyloid PET status and continuous amyloid burden (centiloids). Performance was assessed using ROC AUC, accuracy, R{superscript 2}, and RMSE. Cross-cohort generalizability was evaluated using bidirectional transfer without retraining. Calibration, predictive values, and decision curve analysis were used to assess clinical utility. ResultsWithin-cohort discrimination was high (AUC up to 0.913 in ADNI and 0.870 in A4), with moderate performance for centiloid prediction (R{superscript 2} up to 0.628 and 0.535, respectively). Cross-cohort deployment resulted in modest attenuation of AUC ([~]4-7%) but substantially greater degradation in clinically actionable performance. NPV declined from 0.831 to 0.644 under ADNI[->]A4 transfer ([~]19 percentage points) despite preserved discrimination. Calibration analyses demonstrated systematic probability misestimation, and decision curve analysis showed reduced net clinical benefit. Biomarker distribution differences across cohorts were consistent with dataset shift. ConclusionPlasma biomarker models retain discrimination across cohorts but exhibit clinically meaningful degradation in predictive value under deployment. Calibration instability and prevalence differences critically affect NPV, highlighting the need for cross-cohort validation, calibration assessment, and assay harmonization before clinical implementation.
Free Newsletter

Clinical research that matters. Delivered to your inbox.

Join thousands of clinicians and researchers. No spam, unsubscribe anytime.