Prioritizing de novo variants using phenotype selection and an annotated data knowledge graph
Updated: Sep 29, 2022
Erin Nesmith(1), Michelle Gong(1), Kat Horsley (1), Ben Stear (1), Taha Moheseni Ahooyi (1), Deanne Taylor (1,2) 1. DBHI, CHOP. 2. Dept Pediatrics, UPenn Perelman School of Medicine
Birth defects are functional and structural abnormalities that impact 1 in 33 births in the United States. Many structural birth defects may arise from unknown rare syndromes or disorders that might be shared across multiple children. However, while some have known genetic factors, most birth defects have no known genetic causes, making it difficult to identify these shared syndromes. Further, not all subjects with similar phenotypes will follow a Mendelian model (one gene, one disease). A singular disease is more likely to manifest from multiple different genes operating in a shared biological process with mutations that can lead to a similar phenotype. To explore whether similar phenotypes may point to similar birth defect causes, we first created phenotypic fingerprints for each subject in the KF Data Resource. Using cosine distance calculations across each fingerprint and clustering methods, we created a phenotypic similarity map to find subjects with similar profiles. We clustered this similarity matrix to find subjects with similar phenotypic profiles, possibly leading to similarity clusters of birth defects. After this final round of clustering, specific groups of children were selected, and we identified de novo genomic variants within each cohort. In parallel, we have created a knowledge graph of annotated genomic features, medical terminologies, ontologies and genomic data, called Petagraph. We used the group of identified de novo variants in Petagraph to find the most highly connected gene nodes within the sets of de novo variants in each phenotypic cluster. We identify these connected gene nodes and consequent feature enrichment between the de novo genes in these individuals, as compared to de novo variants in all other individuals. Using this data, we present candidate genes for these disorders, that may help point to networks of interacting developmental genes.