The blobulator: visualizing hydrophobic domains for protein sequences
Connor Pitman , Ezry Santiago-McRae , Ruchi Lohia, Kaitlin Bassi, Matthew E.B. Hansen , Thomas T. Joseph , and Grace Brannigan [1, 5] 1. Center for Computational and Integrative Biology, Rutgers--Camden, NJ, 08102, USA 2.Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724, USA 3. Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA 4. Department of Anesthesiology and Critical Care, Perelman School of Medicine, University of Pennsylvania, PA, 19104, USA 5. Department of Physics, Rutgers--Camden, NJ, 80102, USA
Poster # 53
Though secondary and tertiary structure links protein sequence to function for many proteins, some proteins, such as intrinsically disordered proteins (IDPs), lack such structure despite possessing clear functional effect. In cases like these, it is particularly important to develop methods of detecting innate modularity via criteria other than stable structural folds. Here, we detail the blobulation algorithm for detecting sequence modularity due to residue hydrophobicity and an associated webtool: the blobulator. Blobulation identifies clusters of hydrophobic residues, which can be important interaction subunits of proteins. Examples of this include both the hydrophobic core of structured proteins and interacting residue groups in IDPs. The webtool allows anyone to easily explore the hydrophobic domain distribution within a protein by varying parameters, and view the impact that mutations may have on such domains. We argue that the blobulation algorithm, as implemented in the blobulator webtool, can be a powerful method to identify important clusters of residues without having to rely on secondary or tertiary structure prediction. We present recently published genomic data validating the functional significance of hydrophobic blobs, as well as use-cases where blobulation can provide insight into various research questions such as visually identifying differences between aligned sequences, classifying meaningful interaction units in IDPs, and investigating how protein sequence influences topology.