Observational biobank study compares phenomic and social determinants for disease risk prediction

Photo by Irene Demetri / Unsplash

medRxiv Published April 19, 2026 Medically reviewed April 25, 2026 Study authors: Wang, Y.; Truong, B.; Lu, W.; Fadil, C.; He, Y.; Luo, W.; Koyama, S.; Tsuo, K.; Paruchuri, K.; Yu, Z… DOI ↗ By Dr. Julia Lee, PhD · Oncology, Genomics & Drug Development

Key Takeaway

Consider that phenomic PGS and social determinants can refine risk prediction, but findings are observational and disease-specific.

This research article presents an observational analysis from UK Biobank, Mass General Brigham Biobank, and All of Us participants. It evaluates phenome-derived polygenic scores (PGS) for 35 latent phenomic factors and social determinants of health (SDoH) for predicting asthma, coronary artery disease, and type 2 diabetes, compared to conventional disease-specific PGS.

For asthma prediction, factor-based PGS outperformed disease-specific PGS and showed superior cross-ancestry portability. Specifically, the respiratory factor PGS retained 41.5% of European-ancestry predictive accuracy in African-ancestry individuals, compared with 22.9% for asthma PGS from multi-ancestry GWAS. For coronary artery disease and type 2 diabetes, disease-specific PGS remained superior. SDoH contributed substantial and largely independent predictive information across all three diseases.

The authors note that genetic liability translation was modified by SDoH; for asthma and CAD, genetic stratification attenuated with increasing social burden, with weaker attenuation for T2D. Key limitations include that predictive utility is strongly disease dependent and findings are based on observational biobank data. Practice relevance suggests phenome-derived PGS may improve prediction when disease-specific GWAS incompletely capture underlying liability, and SDoH independently modifies genetic risk performance.

Study Details

EvidenceLevel 5

PublishedApr 2026

View Original Abstract ↓

Polygenic scores (PGS) are typically derived from single-trait genome-wide association studies (GWAS), yet many complex diseases arise from shared genetic liability distributed across correlated clinical dimensions. Accordingly, disease risk depends not only on how genetic liability is represented but also on the social context in which that liability is expressed. Whether phenome-derived latent factors improve prediction, and how social determinants of health (SDoH) modify the realized utility of PGS, remains unclear. Here we constructed PGS for 35 orthogonal latent phenomic factors derived from 2,772 phenotypes in 361,114 UK Biobank (UKB) participants and evaluated their phenomic specificity, cross-dataset portability and predictive performance relative to conventional disease-specific PGS across the UKB holdout, Mass General Brigham Biobank and the All of Us (AoU) Research Program. Factor-based PGS showed widespread, biologically coherent phenome-wide associations that were reproducible across biobanks and ancestries. Their predictive utility, however, was strongly disease dependent. For asthma, a respiratory factor PGS outperformed an internally derived disease-specific PGS and showed superior cross-ancestry portability, retaining 41.5% of European-ancestry predictive accuracy in African-ancestry individuals, compared with 22.9% for an asthma PGS derived from the largest available multi-ancestry GWAS. By contrast, disease-specific PGS remained superior for coronary artery disease (CAD) and type 2 diabetes (T2D). These findings suggest that phenome-derived aggregation is most beneficial when disease-specific GWAS incompletely capture underlying liability, including settings of biological heterogeneity or imprecise phenotyping. We then evaluated SDoH in AoU as a complementary axis shaping prevalent disease prediction beyond genetic susceptibility. Across all three diseases, SDoH contributed substantial and largely independent predictive information beyond the disease-optimal genetic model. SDoH also modified how genetic liability translated into observed disease prevalence: for asthma and CAD, genetic stratification attenuated with increasing social burden, whereas this attenuation was substantially weaker for T2D. As a result, the same genetic percentile corresponded to different standardized predicted prevalences across social strata, reflecting disease-specific shifts in baseline prevalence, genetic gradients and calibration. Together, these findings indicate that disease risk is shaped by both genetic liability and the social context in which that liability is realized. Phenome-derived PGS improve prediction under specific architectural conditions, whereas social context independently modifies the performance, calibration and interpretation of genetic risk across populations.

Observational biobank study compares phenomic and social determinants for disease risk prediction

Study Details

Regional motif diversity score in cfDNA predicts pembrolizumab response in head and neck cancer

Can a DNA pattern predict who will heal from head and neck cancer immunotherapy?

Clinical research that matters. Delivered to your inbox.

Observational biobank study compares phenomic and social determinants for disease risk prediction

More on Type 2 Diabetes

Study Details

Regional motif diversity score in cfDNA predicts pembrolizumab response in head and neck cancer

Can a DNA pattern predict who will heal from head and neck cancer immunotherapy?

Clinical research that matters. Delivered to your inbox.

Related in Genetics & Precision Medicine

From Other Specialties