This observational review evaluates the performance of a layered fixed-policy framework integrating intrinsic risk, laboratory snapshots, medication exposure, longitudinal care trajectories, and social determinants of health. The analysis utilized data from the All of Us Research Program (N = 39,431) and the BioMe Biobank (N = 9,818) to assess DKD risk detection in individuals with type 2 diabetes. The primary outcome measured was the area under the receiver operating characteristic curve (AUROC), alongside sensitivity and specificity at a one-year landmark following T2D diagnosis.
Key findings demonstrate that successive nested models (M2-M5) significantly outperformed the base model (M1). Specifically, the AUROC improved from 0.673 in the base model to 0.797 in the most advanced model (M5). Sensitivity nearly doubled, increasing from 0.27 to 0.49, while the cumulative recovery of cases missed by the base model reached 30.4%. In the BioMe Biobank, the AUROC for model M4 was 0.659, representing an attenuation compared to the primary cohort.
The authors note limitations including sparse data in clinical records which indicated low observability rather than low risk. External validation in BioMe showed attenuated discrimination, necessitating recalibration for that specific setting. Consequently, the practice relevance is established as a foundation for context-aware EHR-based screening that accounts for data availability at the time of risk assessment, rather than a universal solution without adjustment.
View Original Abstract ↓
Background Diabetic kidney disease (DKD) is a leading cause of kidney failure in individuals with type 2 diabetes (T2D), yet risk identification in routine clinical practice remains incomplete. A critical and often overlooked barrier is risk observability: how much of a patient's underlying risk is actually captured in their clinical record at the time of screening. Existing prediction models evaluate performance using model-specific thresholds, making it difficult to understand how additional data sources alter real-world screening behavior or which individuals benefit when models are expanded. Methods We developed a series of five nested machine learning models evaluated at a one-year landmark following T2D diagnosis using data from the All of Us Research Program (N = 39,431; cases = 16,193). Each successive model added a distinct information layer -- intrinsic risk, laboratory snapshots, medication exposure, longitudinal care trajectories, and social determinants of health (SDOH) -- while retaining all prior features. All models were evaluated under a fixed screening policy targeting 90% specificity, so that the false positive rate remained constant as the information available to the model grew. External validation was conducted in the BioMe Biobank (N = 9,818) without retraining. Results Discrimination improved consistently across layers, from AUROC 0.673 (M1) to 0.797 (M5). Under the fixed screening policy, sensitivity nearly doubled from 0.27 to 0.49, with a cumulative recovery of 30.4% of cases missed by the base model. Gains were driven by distinct subgroups at each transition: laboratory features identified biologically high-risk individuals; medication features captured those with high treatment intensity reflecting advanced cardiometabolic burden; longitudinal care trajectory features rescued cases with biological instability observable only through repeated measurements; and SDOH features recovered individuals with limited clinical observability, with rescue probability highest among those with the fewest recorded monitoring domains. Sparse data in the clinical record indicated low observability, not low risk. Social and genetic features each contributed most when downstream physiologic signal was limited, supporting a contextual rather than universal role for each. In BioMe, discrimination was attenuated (M4 AUROC 0.659), but the relative ordering of information layers was fully preserved, and a systematic upward shift in predicted probability distributions underscored the need for recalibration before deployment in a new setting. Conclusions DKD risk detection in T2D is substantially improved by integrating complementary information layers under a fixed clinical screening policy, with gains arising from distinct domains that identify at-risk individuals in different clinical contexts. The layered landmark framework introduced here reveals how risk observability -- shaped by monitoring intensity, healthcare engagement, and access -- determines what a screening model can detect, and provides a foundation for context-aware EHR-based screening that accounts for data availability at the time of risk assessment.