Austin Montgomery BS 1, Georgios Christos Tsiatsianis MS 1,2, Ioannis Mouratidis MS 1, Candace S.Y. Chan MS 3, Maria Athanasiou, PhD 2, Verena Kantere, PhD 2, Nelson S. Yee, MD, PhD, RPh 4, Ilias Georgakopoulos-Soares, PhD 1,* 1 Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA. 2 School of Electrical and Computer Engineering, National Technical University of Athens, Athens, Greece. 3 Institute for Human Genetics, University of California San Francisco, San Francisco, California, USA 4 Next Generation Therapies Program, Penn State Cancer Institute; Division of Hematology-Oncology, Department of Medicine, Penn State Health Milton S. Hershey Medical Center, Hershey, PA, USA
Poster # 54
Early diagnosis of cancer can significantly improve survival of cancer patients; however, most cancer types still lack sensitive and highly specific non-invasive biomarkers needed for detection. cell-free RNA (cfRNA) may improve upon current biomarkers due to over-representation of highly expressed tumor-associated genes compared to its cell-free DNA (cfDNA) counterpart. Nullomers are short sequences absent from the human genome. As nullomers may resurface due to somatic mutations, they could provide more sensitive and specific biomarkers for cancer detection. Here, we examine over 10,000 whole exome sequencing matched tumor-normal samples to characterize nullomer resurfacing across exonic regions. We find that 29,774,302 different sixteen base-pair nullomers appear in this cohort with ~80% of somatic mutations causing a nullomer to resurface. We identify the most frequent 100,000 resurfacing nullomers (for each of 14-16 bp) as a feature space for classifying hepatocellular carcinoma cancer (HCC) samples from cfRNA. We use an L1 regularized logistic regression model with 10 fold cross-validation repeated 100 times as a model. We achieve AUROC scores of 0.998, 0.999, and 1.000 for the models made of 14 bp, 15 bp, and 16 bp nullomers. Each model also shows accurate probabilistic predictions with Brier scores less than or equal to 0.02. We examine the nullomers which occur in over 90% of the repeated cross-validated models and annotate many of them to liver cancer associated genes including FTH1, EEF2, TMSB10, ACTB and the long non-coding RNA MALAT. We then use a separate dataset to create lasso logistic regression models to classify liver (AUC=0.922), stomach (AUC=0.927), and lung (AUC=0.877) cancer samples against healthy samples. We believe these results show the utility of nullomers within cfRNA as a method of detecting cancer non-invasively.
Comments