top of page
Search

Poster #18 - Yunjun Kang

  • vitod24
  • Oct 20
  • 2 min read

A Patient Data-driven Graph Approach to Understand Genetic Drivers of Disease.


Yunjun Kang, BS, University of Pennsylvania Bioengineering Graduate Group, Perelman School of Medicine Division of Translational Medicine and Human Genetics Theodore G. Drivas, University of Pennsylvania Perelman School of Medicine Division of Translational Medicine and Human Genetics


Despite advances in genomic association studies, understanding disease mechanisms from genetic correlations remains challenging. Knowledge graphs (KGs) offer a powerful framework for modeling complex biomedical relationships. Using a bipartite KG-based approach, we analyzed phenome-wide association study (PheWAS) data from the Penn Medicine BioBank (PMBB) to identify genes with shared "phenotypic profiles," to potentially uncover novel biological pathways. We constructed a bipartite KG from a dataset linking loss-of-function (LoF) variants in 12,054 genes to 1,487 phenotypes in 44,297 PMBB participants. The data was preprocessed using similarity-based weight penalties and significance-threshold filtering to minimize noise. Subsequently, we projected the bipartite graph into a gene-centric graph, where edges reflect phenotypic profile similarity. Community detection within this graph revealed clusters of genes associated with specific phenotypes. For instance, ciliopathy-related genes (e.g., PKD1, PKD2, IFT172) grouped into a distinct cluster, confirming the robustness of our approach. Interestingly, non-ciliary genes such as COL4A5 also clustered with these genes, suggesting potential novel genetic pathways. Separately, we developed a graph neural network (GNN) model to predict ciliary genes using the PMBB data-constructed KG. Through this model, we identified two promising novel candidates, ORC4 and RAB42, supported by literature indicating roles in cilium formation. We are currently designing a wet lab validation assay to assess our GNN's predictive power. We are also applying our pipeline to the All of Us and UK Biobank datasets to replicate these findings, and we anticipate that this analysis will yield further insights into the genetic architecture of disease.

 
 
 

Recent Posts

See All
Poster #9 - Yuheng Du

Cell-Type-Resolved Placental Epigenomics Identifies Clinically Distinct Subtypes of Preeclampsia Yuheng Du, Ph.D. Student, Department of Computational Medicine and Bioinformatics, University of Michig

 
 
 
Poster #15 - Jiayi Xin

Interpretable Multimodal Interaction-aware Mixture-of-Experts Jiayi Xin, BS, PhD Student, University of Pennsylvania, PA, USA Sukwon Yun, MS, PhD Student, University of North Carolina at Chapel Hil

 
 
 
Poster #14 - Aditya Shah

Tumor subtype and clinical factors mediate the impact of tumor PPARɣ expression on outcomes in patients with primary breast cancer. Aditya Shah1,2, Katie Liu1,3, Ryan Liu1, 4, Gautham Ramshankar1, Cur

 
 
 

Comments


bottom of page