Observational review of layered data models improves DKD risk detection in type 2 diabetes cohortsBetter tools found to spot kidney disease risk in people with type 2 diabetes

medRxiv Published April 23, 2026 Study authors: Khattab, A.; Wang, Z.; Srinivasasainagendra, V.; Tiwari, H. K.; Loos, R.; Limdi, N.; Irvin, M. R. DOI ↗ Editorial oversight: Dr. Amelia Tan, PhD · Internal Medicine & Chronic Disease

AI-generated summary of the cited source, checked by automated accuracy review. How we work

Key Takeaway

Consider layered data models to improve DKD risk detection, noting the need for recalibration in new settings.

This observational review evaluates the performance of a layered fixed-policy framework integrating intrinsic risk, laboratory snapshots, medication exposure, longitudinal care trajectories, and social determinants of health. The analysis utilized data from the All of Us Research Program (N = 39,431) and the BioMe Biobank (N = 9,818) to assess DKD risk detection in individuals with type 2 diabetes. The primary outcome measured was the area under the receiver operating characteristic curve (AUROC), alongside sensitivity and specificity at a one-year landmark following T2D diagnosis.

Key findings demonstrate that successive nested models (M2-M5) significantly outperformed the base model (M1). Specifically, the AUROC improved from 0.673 in the base model to 0.797 in the most advanced model (M5). Sensitivity nearly doubled, increasing from 0.27 to 0.49, while the cumulative recovery of cases missed by the base model reached 30.4%. In the BioMe Biobank, the AUROC for model M4 was 0.659, representing an attenuation compared to the primary cohort.

The authors note limitations including sparse data in clinical records which indicated low observability rather than low risk. External validation in BioMe showed attenuated discrimination, necessitating recalibration for that specific setting. Consequently, the practice relevance is established as a foundation for context-aware EHR-based screening that accounts for data availability at the time of risk assessment, rather than a universal solution without adjustment.

Researchers reviewed data from large groups of people with type 2 diabetes to see if combining different types of health information could better predict kidney disease risk. The study looked at data from the All of Us Research Program and the BioMe Biobank, involving tens of thousands of individuals. They compared a basic prediction model against versions that added more layers of information, such as lab results, medication history, and social factors.

The analysis found that adding these extra data layers significantly improved the ability to detect the disease. The model's accuracy score increased, and it successfully identified nearly twice as many cases as the basic model alone. About 30% of cases that the simple model missed were found using the more detailed approach.

However, when the team tested these findings in a different group from the BioMe Biobank, the results were less strong. This suggests the new tools might need adjustment before being used broadly. The study notes that some medical records had limited data, which can affect how well these tools work in real-world settings. This review lays a foundation for using smarter, context-aware screening in electronic health records.

What this means for you:

Layered health data may improve kidney disease risk detection, but results need adjustment for different settings.

Study Details

Sample sizen = 39,431

EvidenceLevel 5

PublishedApr 2026

View Original Abstract ↓

Background Diabetic kidney disease (DKD) is a leading cause of kidney failure in individuals with type 2 diabetes (T2D), yet risk identification in routine clinical practice remains incomplete. A critical and often overlooked barrier is risk observability: how much of a patient's underlying risk is actually captured in their clinical record at the time of screening. Existing prediction models evaluate performance using model-specific thresholds, making it difficult to understand how additional data sources alter real-world screening behavior or which individuals benefit when models are expanded. Methods We developed a series of five nested machine learning models evaluated at a one-year landmark following T2D diagnosis using data from the All of Us Research Program (N = 39,431; cases = 16,193). Each successive model added a distinct information layer -- intrinsic risk, laboratory snapshots, medication exposure, longitudinal care trajectories, and social determinants of health (SDOH) -- while retaining all prior features. All models were evaluated under a fixed screening policy targeting 90% specificity, so that the false positive rate remained constant as the information available to the model grew. External validation was conducted in the BioMe Biobank (N = 9,818) without retraining. Results Discrimination improved consistently across layers, from AUROC 0.673 (M1) to 0.797 (M5). Under the fixed screening policy, sensitivity nearly doubled from 0.27 to 0.49, with a cumulative recovery of 30.4% of cases missed by the base model. Gains were driven by distinct subgroups at each transition: laboratory features identified biologically high-risk individuals; medication features captured those with high treatment intensity reflecting advanced cardiometabolic burden; longitudinal care trajectory features rescued cases with biological instability observable only through repeated measurements; and SDOH features recovered individuals with limited clinical observability, with rescue probability highest among those with the fewest recorded monitoring domains. Sparse data in the clinical record indicated low observability, not low risk. Social and genetic features each contributed most when downstream physiologic signal was limited, supporting a contextual rather than universal role for each. In BioMe, discrimination was attenuated (M4 AUROC 0.659), but the relative ordering of information layers was fully preserved, and a systematic upward shift in predicted probability distributions underscored the need for recalibration before deployment in a new setting. Conclusions DKD risk detection in T2D is substantially improved by integrating complementary information layers under a fixed clinical screening policy, with gains arising from distinct domains that identify at-risk individuals in different clinical contexts. The layered landmark framework introduced here reveals how risk observability -- shaped by monitoring intensity, healthcare engagement, and access -- determines what a screening model can detect, and provides a foundation for context-aware EHR-based screening that accounts for data availability at the time of risk assessment.

Observational review of layered data models improves DKD risk detection in type 2 diabetes cohortsBetter tools found to spot kidney disease risk in people with type 2 diabetes

Study Details

Finerenone Efficacy and Safety in Patients with Chronic Kidney Disease and Glomerular Disease

Finerenone slows kidney function loss in patients with glomerular disease

Clinical research that matters. Delivered to your inbox.

Observational review of layered data models improves DKD risk detection in type 2 diabetes cohortsBetter tools found to spot kidney disease risk in people with type 2 diabetes

More on Type 2 Diabetes

Study Details

Finerenone Efficacy and Safety in Patients with Chronic Kidney Disease and Glomerular Disease

Finerenone slows kidney function loss in patients with glomerular disease

Clinical research that matters. Delivered to your inbox.

Related in Nephrology

From Other Specialties