Poster #2 - Haedong Kim

vitod24
Oct 20, 2025
2 min read

Exome-Wide Copy Number Variation in 142,357 Individuals from Autism Spectrum Families in the Simons SPARK Cohort

Haedong Kim,1,2 Grace Tzun-Wen Shaw,1 Jeffrey K. Ng,3 Timothy L. Mosbruger,1 Ramakrishnan Rajagopalan,1,2 Tychele N. Turner*,3,4 Tristan J. Hayeck*,1,2 (*Co-Senior Authors. 1. Division of Genomic Diagnostics, Department of Pathology and Laboratory Medicine, Children's Hospital of Philadelphia, Philadelphia, PA. 2. Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA. 3. Department of Genetics, Washington University School of Medicine, St. Louis, MO 63110, US. 4. Intellectual and Developmental Disabilities Research Center, Washington University School of Medicine, St Louis, Missouri.)

Rare and de novo CNVs play a critical role in neurodevelopmental disorders like autism spectrum disorder (ASD). While whole-exome sequencing (WES) has become more accessible in clinical settings, CNV detection from short-read WES data faces challenges including inherent biases, inconsistent results across callers, and substantial false positive/negative rates, necessitating labor-intensive manual curation. We developed a scalable CNV call and scoring pipeline to detect and distinguish valid CNVs from large datasets. The aim of this project is to provide both tools and resources, we applied our approach to the SFARI SPARK cohort, analyzing 142,357 individuals. First, we developed a novel CNV caller based on a fast kernel change point detection method. Then hybrid partially Bayesian machine learning framework is employed to train scoring models. We curated training datasets and generated features for modeling from high quality cross platform data and manually labeled CNVs by experts. Features were derived from multiple complementary sources to provide a comprehensive perspective CNVs to account for systematic effects on CNV detection reliability, including primary statistics from various read-depth signals, genomic context properties, individual-level demographic, and ancestral factors. We provide additional CNV call sets with the features generated and scored by the model using other established methods (i.e., XHMM, CoNIFER, cn.MOPS), providing a comprehensive resource of CNV map for the SPARK cohort. Our results and models demonstrate strong performance (AUC ≈ 0.9995; F1≈ 0.9989) and improvements in CNV detection accuracy, providing a resource for ASD research, and enabling seamless integration with existing clinical and research pipelines

MidAtlantic Bioinformatics Conference

Friday October 30, 2026

Poster #2 - Haedong Kim

Recent Posts

Comments