Poster #18 - Yunjun Kang
- vitod24
- Oct 20
- 2 min read
A Patient Data-driven Graph Approach to Understand Genetic Drivers of Disease.
Yunjun Kang, BS, University of Pennsylvania Bioengineering Graduate Group, Perelman School of Medicine Division of Translational Medicine and Human Genetics Theodore G. Drivas, University of Pennsylvania Perelman School of Medicine Division of Translational Medicine and Human Genetics
Despite advances in genomic association studies, understanding disease mechanisms from genetic correlations remains challenging. Knowledge graphs (KGs) offer a powerful framework for modeling complex biomedical relationships. Using a bipartite KG-based approach, we analyzed phenome-wide association study (PheWAS) data from the Penn Medicine BioBank (PMBB) to identify genes with shared "phenotypic profiles," to potentially uncover novel biological pathways. We constructed a bipartite KG from a dataset linking loss-of-function (LoF) variants in 12,054 genes to 1,487 phenotypes in 44,297 PMBB participants. The data was preprocessed using similarity-based weight penalties and significance-threshold filtering to minimize noise. Subsequently, we projected the bipartite graph into a gene-centric graph, where edges reflect phenotypic profile similarity. Community detection within this graph revealed clusters of genes associated with specific phenotypes. For instance, ciliopathy-related genes (e.g., PKD1, PKD2, IFT172) grouped into a distinct cluster, confirming the robustness of our approach. Interestingly, non-ciliary genes such as COL4A5 also clustered with these genes, suggesting potential novel genetic pathways. Separately, we developed a graph neural network (GNN) model to predict ciliary genes using the PMBB data-constructed KG. Through this model, we identified two promising novel candidates, ORC4 and RAB42, supported by literature indicating roles in cilium formation. We are currently designing a wet lab validation assay to assess our GNN's predictive power. We are also applying our pipeline to the All of Us and UK Biobank datasets to replicate these findings, and we anticipate that this analysis will yield further insights into the genetic architecture of disease.


Comments