Poster #1 - Lindsay Guare
- vitod24
- Oct 20
- 2 min read
Beyond PRS: Geno2Vec Learns Rich SNP Embeddings to Predict Endometriosis Risk
Lindsay Guare, Rachit Kumar, Jagyashila Das, Anurag Verma, Shefali Setia-Verma
Polygenic risk scores (PRSs) remain limited in clinical utility for most complex traits due to modest predictiveness and poor transferability across ancestries. We focused on endometriosis (endo), a women's health condition characterized by painful endometrial lesions growing outside of the uterus. While genomic studies have captured ~11% of the phenotypic variance, PRSs have performed poorly. We hypothesize that conventional PRSs are limited by collapsing biological information into one dimension. To address this, we present Geno2Vec, a deep heterogeneous graph learning method that embeds SNPs as vectors by integrating multi-omic data. The model was trained for contrastive link prediction, keeping related nodes near each other in the latent space. We then leveraged the learned vectors to test risk models for endo in ~250k female participants from the All of Us Research Program (AOU). The cohort (6,262 endo cases) was split into 80% training and 20% validation sets. Traditional prune-and-threshold PRSs were computed for comparison. Logistic models were adjusted for age and ten principal components. We assessed overall multi-ancestry performance and single-ancestry performance for AFR, AMR, and EUR populations. The Geno2Vec-PRS had the highest pseudo-R-squared (0.011) as compared to PRS (0.0099) and covariate-only null (0.0076) models, indicating that it explained the greatest proportion of variance. The training AUROC was also highest for the Geno2Vec-PRS (0.5920). When applied to the validation subset, the PRS model had the highest AUROC overall (0.5838) and for the AFR (0.5132) and AMR (0.6124) groups. Geno2Vec-PRS had the highest EUR test AUROC (0.5667). In the future, we will incorporate more -omics types into the embeddings and apply this methodology to a wide range of complex diseases. This innovative approach represents a significant advancement in polygenic risk modeling, offering enhanced predictive capability through learned representations that capture complex multi-omic relationships beyond traditional linear methods.


Comments