A Novel Genetic Correlation Disease-Disease Network for the Improved Identification of Associated Ph
Updated: Sep 29, 2022
Jakob Woerner(1,2), Vivek Sriram(1,2), Yonghyun Nam(2), Dokyoon Kim(2,3) 1Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA 2Department of Biostatistics, Epidemiology & Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA 3Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA 19104, USA
Many complex disorders share associations with multiple phenotypes. However, our understanding of these disease connections is still severely limited, and it is unclear how much of a role is played by genetics. A disease-disease network (DDN), a graph where nodes represent diseases, can help to identify related disorders. By applying cross-trait linkage disequilibrium score regression to the data generated from a biobank-scaled phenome-wide association study (PheWAS), we can generate a corresponding genetic correlation DDN (gcDDN), where edges represent genetic associations between phenotypes. We hypothesize that a gcDDN can help to better identify novel disease co-occurrences as well as highlight potential genetic contributors to these disease connections, particularly compared to conventional PheWAS-derived DDNs, where edges represent phenotype-associated variants that pass a specified significance threshold and are shared across nodes. We constructed a gcDDN for 487 phenotypes that had at least 1000 cases from UK Biobank (UKBB) PheWAS summary data. To demonstrate the gcDDN's clinical significance, we focused on myocardial infarction (MI), a polygenic disease with high heritability, as our index phenotype of interest. Graph-based semi-supervised learning was applied to generate scores of predicted disease co-occurrence between MI and its neighboring phenotypes in the gcDDN. These predictions were validated against ground truth disease co-occurrences derived from UKBB in-patient data. We find that the gcDDN identifies disease connections with MI evinced in the in-patient data more accurately than a conventional DDN, with a 22.8% increase in the area under the receiver operating curve (AUC) from 0.632 to 0.776. This result suggests that a gcDDN can effectively identify relationships between diseases for complex, heritable phenotypes, indicating its relevance in further explorations of personalized medicine and disease comorbidity. Remaining work involves the assessment of gcDDN for additional phenotypes as well as an interpretation of the specific genetic variants that contribute to connections in the network.