James J. Kelley(2), Srivardhan Mhetre(3), Syeda Aiman Nadeem(1), Andrey Grigoriev(1,2)* 1. Dept of Biology, 2. Center of Computational and Integrative Biology (CCIB), 3. Department of Computer Science, Rutgers University, Camden, NJ * Correspondence Email: andrey.grigoriev@rutgers.edu
Poster # 51
Use of Machine Learning to Improve Accuracy of SV CallsCategory: Biology; genetics / machine learningNGS data is typically analyzed to find single nucleotide variants (SNVs), indels, and structural variants (SVs) and genotype (GT) of the variant: homozygous or heterozygous. NGS data can be used for disease-normal comparisons where disease and normal samples are considered together, and variants are called that are present in the disease sample and not present in the normal sample. SNV and indel analysis can be automated but visual inspection is necessary to evaluate the accuracy of SV calls due to noise and variations in read depth. GROM is an algorithm developed in our lab which comprehensively detects all variant types in a single run with superior speed and accuracy compared to other variant callers [1]. GROMSOM extends the speed and accuracy of GROM to disease-normal analysis. Speed of analysis is important when rapid identification of the genetic causes of disease and drugs to treat the disease will improve the probability of patient survival. Machine Learning (ML) is being used to improve the accuracy of SV calls and reduce or eliminate the need for manual inspection, further increasing the speed of analysis. We employed Variant Navigator, software for visualization of reads data, to examine GT and difference calls and develop truth sets as input for ML. We tested several ML algorithms and found that the Gradient Boosted Trees and K-Nearest Neighbor algorithms to be the most reliable. We are using these ML algorithms to improve the accuracy of GROM/GROMSOM SV calls.1. Smith SD, Kawash JK, Grigoriev A. (2017) Lightning-fast genome variant detection with GROM. Gigascience.6(10):1-7
Comments