"Li, Jenny. Undergraduate Research Intern.,Zhou, Wanding, PhD. Principal Investigator. Goldberg, David, PhD Candidate. Li, Marilyn, MD, MS. Xu, Weixuan, PhD. Xu, Feng, PhD. Kraya, Adam, PhD."
Poster # 91
DNA cytosine modification at CpG dinucleotides are rich encoders of a cancer cell’s mitotic history and cell-of-origin information, establishing the DNA methylome as a powerful molecular analyte for cancer diagnosis. However, classifiers trained on one methylation assay platform may not translate effectively to other assay platforms due to probe selection changes and platform-specific technical artifacts such as signal background and amplification bias. To derive a pan-platform classifier, we explored various feature transformation strategies, guided by biological knowledge of chromatin states and other knowledgebase sets. Targeting brain cancers and 33 cancer types from The Cancer Genome Atlas (TCGA), we investigated diverse methods of feature selection, including aggregating CpG methylation using tissue signature databases and nonparametric rank transformation. After evaluating various tissue signature databases, including transcription factor binding sites (TFBS), chromatin states (chromHMM), histone modifications (HM), and more, we trained individual models for each of these databases and compared their respective feature importances. Through the integration of the most significant feature sets, we identified key features, including cell signatures and repeat elements.By implementing data aggregation, we anticipate that machine learning models will not need to rely on individual CpGs to perform effectively, allowing us to extend our models to other different assay platforms, including HM450, EPICv1, EPICv2, and WGBS data.
Comments