KnowYourCG: automated discovery of biological and technical links from DNA methylation data

David Goldberg1,2, Daniel Atkins1, Wubin Ding1, Ethan Moyer1, Wanding Zhou1,2 1Center for Computational and Genomic Medicine, The Children's Hospital of Philadelphia, PA, 19104, USA 2Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA

Poster # 95

DNA methylation stably encodes footprints of epigenetic changes during development and pathogenesis. Decoding the DNA methylomes for biological insights is a key step in methylome-based research and discovery. We developed a bioinformatics framework, KnowYourCG, for set enrichment testing, continuous variable analysis, probe annotation, sample annotation, and unsupervised correlation network analysis. Compared to existing tools, KnowYourCG targets CpG dinucleotides as the base investigation unit instead of linking to genes or aggregating to genomic regions. It features automated supervised analysis of diverse biological and technical influences, including local base composition, transcription factor binding, epigenetic silencing, histone modification, replication timing, genomic imprinting, tissue-specific methylation, environmental exposure, transposable element activation, sequence polymorphism, and probe cross-hybridization. It also enables fast unsupervised analysis of methylation correlation with neighbor graph-based representation reduction. We demonstrate the utility of KnowYourCG in epigenome-wide association analysis, single-cell methylome data sparsity control, dimensionality reduction, and analysis of aging- and cancer-associated epigenome aberrations. Our correlation network analysis identifies disease and inter-individual co-methylated modules, and the correlation networks derived from module construction can also be used to impute missing data with high accuracy. Our tool streamlines the mining of biological and technical links in large-scale DNA methylation data analysis and dovetails with commercially-available methylome data production platforms.

