Network-based cross-phenotype risk scoring models for compositing multiple disease risks using bioba
Updated: Sep 29, 2022
Yonghyun Nam 1, Vivek Sriram 1, Sang-Hyuk Jung 1,2, Brenda Xiao 1, Manu Shivakumar 1, Anurag Verma 3*, Dokyoon Kim 1,4* 1Department of Biostatistics, Epidemiology & Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA 2Samsung Advanced Institute for Health Sciences and Technology (SAIHST), Sungkyunkwan University, Samsung Medical Center, Seoul 06351, Republic of Korea 3Division of Translational Medicine and Human Genetics, Department of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA 4Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA 19104, USA
The polygenic risk score (PRS) can help to identify high-genetic risk by combining individual genetic profiles with single-nucleotide polymorphisms identified through genome-wide association studies. Phenome-wide association studies (PheWAS) can successfully identify associations between multiple phenotypes and genetic variants, allowing us to develop risk scores considering cross-phenotype relationships for index diseases of interest. We developed a network-based cross-phenotype risk score algorithm (netCRS) to predict individual disease risk by utilizing cross-phenotype associations identified from PheWAS. We designed a multi-layered phenotype-variant relational network using biobank-scaled PheWAS summary statistics. The network consists of a cross-phenotype association network and variant network. Label propagation algorithm was applied to predict individual risk scores by diffusing individual genetic profiles into cross-phenotype associations. The final aggregation of propagated results representeda measurement of the possible risk scores across phenotypes focused on an index disease. UK Biobank PheWAS summary statistics were used for constructing multi-layered networks, and the individual genetic profiles for European populations were collected from the Penn Medicine BioBank. We obtained the netCRS for three dichotomous traits: type 2 diabetes (T2D; 4,400 cases vs. 24,219 controls), obesity (OBS; 4,185 vs. 22,971), and coronary atherosclerosis (CAD; 6,064 vs. 21,900). The number of cross-phenotypes was considered for each trait as follows: 192 phenotypes for T2D, 65 for OBS, and 184 for CAD. To investigate the utility of netCRS, we compared the disease risk prediction between netCRS and PRS (LDpred). The combined model (netCRS + PRS + Sex + Age + PC1~5) achieved an AUC improvement compared to the (PRS + Sex + Age + PC1~5) model; improvement of 3.78% for T2D, 6.57% for OBS, and 3.54% for CAD. We expect that using these risk prediction models will allow for the development of prevention strategies and reduction of disease mortality.