Mina Mahboubi (1,2), Lucy Shettel (1,2), Alessandra Pavan Lamarca (1), Sudhir Kumar (1), Vincenzo Carnevale (1,2) 1- Institute for Genomics and Evolutionary Medicine, and 2- Institute for Computational Molecular Science ,Department of Biology, Temple University
Poster # 60
Epistasis is the ubiquitous genetic phenomenon whereby a mutation in a gene at a given position affects the likelihood of mutations throughout the rest of the sequence. For instance, amino acids in contact with one other in a protein's structure are expected to mutate in a concerted fashion to maintain the thermodynamic stability of the native state. As a result, mutational patterns are highly constrained and sequence variability is greatly reduced. While this impact on the evolution of protein sequences is well understood, less is known about the effect of epistasis on the structure of phylogenetic trees especially regarding the expected tree shape in the absence of relatedness. Here we developed a theoretical framework to address these questions quantitatively. We first characterized the intrinsic dimension (ID) of a protein family multiple sequence alignment (MSA) using a recently developed estimator. We found that the dimensionality of the protein space is, in general, very small. We then generated a set of synthetic sequences by progressively reducing the strength of epistasis while keeping the single-site frequencies constant. The ID for these cases increased as the strength of epistasis decreased, supporting the notion that epistasis reduces the number of effective degrees of freedom. We then explored the shape of phylogenetic trees by using several widely used tree-shape statistics. Strikingly, we found that, even in the case of unrelated sequences, the presence of epistasis generates a large deviation from the "star-like" tree. We confirmed this insight quantitatively by computing a likelihood ratio against a star phylogeny null hypothesis. Overall, our results show an unexpected connection between epistasis and tree shape and pave the way for a quantitative framework for phylogenetic tree inference in the presence of epistasis.
コメント