Ahmed M. Moustafa 1,2,3, Erin Theiller 1, Arnav Lal 4, Andries Feder 5, Apurva Narechania 6, Paul J. Planet 2,5,6 1- Division of Gastroenterology, Hepatology and Nutrition, Children's Hospital of Philadelphia, Philadelphia, PA, USA 2- Department of Pediatrics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA 3- Department of Biomedical and Health Informatics, Children's Hospital of Philadelphia, Philadelphia, PA, USA 4- School of Arts and Sciences, University of Pennsylvania, Philadelphia, PA, 19104 5- Division of Infectious Diseases, Children's Hospital of Philadelphia, Philadelphia, PA, USA 6- Sackler Genomic Institute, American Museum of Natural History, New York, NY, USA 10024
Poster # 72
Genomic recombination plays a pivotal role in enhancing biological diversity among microbial populations, facilitating their adaptation to various environments, hosts, and niches. Traditional recombination detection methods are computationally intensive, primarily relying on alignment of genomic sequences, phylogenetic analyses, and comparative techniques. This approach becomes especially challenging given the exponential increase in available whole-genome sequences. Addressing this challenge, we introduce Redcarpet, an innovative, alignment-free, database-driven tool designed for recombination detection. This technique leverages the distribution of exact protein matches within a genomic database, building upon the WhatsGNU algorithm-a tool focused on exact proteomic compression. Redcarpet inputs a single query genome and, for each encoded protein, identifies genomes in the database with exact protein sequence matches. It then computes the Jaccard similarity coefficient between these genome sets based on pairwise protein comparisons. Such operations operate under the assumption that genes with identical sequences are more likely to be present in similar genome sets due to shared evolutionary histories. Redcarpet's results are visualized as a 2-D heatmap, highlighting recombination regions and enabling the identification of recombination tracts. Furthermore, probabilistic changepoint analysis can pinpoint likely recombination breakpoints. When applied to known recombination events in Staphylococcus aureus and Klebsiella pneumoniae, Redcarpet's efficiency was evident. Beyond recombination detection, it can also deduce the probable origins of genomic segments and define a genomic "core" for subsequent phylogenetic analyses. Overall, Redcarpet can be used to rapidly identify recombination tracts in any species that has a large database of genomic sequences.
Comments