rMATS-turbo: An efficient and flexible computational tool for alternative splicing analysis of large
Updated: Sep 29, 2022
Yuanyuan Wang 1,2,†, Zhijie Xie 2,†, Eric Kutschera 2, Jenea I. Adams 2,3, Kathryn E. Kadash-Edmondson 2, Yi Xing 2,4,5,* 1. Bioinformatics Interdepartmental Graduate Program, University of California, Los Angeles, Los Angeles, CA 90095, USA 2. Center for Computational and Genomic Medicine, The Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA 3. Genomics and Computational Biology Graduate Program, University of Pennsylvania, Philadelphia, PA, 19104, USA 4. Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA 5. Department of Biomedical and Health Informatics, The Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA *Corresponding author. Phone: (215) 590-0280 †These authors contributed equally to this work
Pre-mRNA alternative splicing is a prevalent mechanism for diversifying eukaryotic transcriptomes and proteomes. Regulated alternative splicing plays a role in many biological processes, and dysregulated alternative splicing is a feature of many human diseases. Short-read RNA sequencing (RNA-seq) is now the standard approach for transcriptome-wide analysis of alternative splicing. Since 2011, our lab has developed and maintained rMATS, a computational tool for discovering and quantifying alternative splicing events from RNA-seq data. The rMATS software has been widely used by the research community. Here we provide a tutorial for the contemporary version of rMATS - called rMATS-turbo - a fast and scalable re-implementation that maintains the statistical framework and user interface of the original rMATS software while incorporating a revamped computational workflow with a substantial improvement in speed and data storage efficiency. The rMATS-turbo software scales up to massive RNA-seq datasets with tens of thousands of samples. To illustrate the utility of rMATS-turbo, we describe two representative application scenarios. First, we describe a broadly applicable two-group comparison to identify differential alternative splicing events between two sample groups, including both annotated and novel alternative splicing events. Second, we describe a quantitative analysis of alternative splicing in a large-scale RNA-seq dataset (~1,000 samples), including the discovery of alternative splicing events associated with distinct cell states. We detail the workflow and features of rMATS-turbo that enable efficient parallel processing and analysis of large-scale RNA-seq datasets on a compute cluster. We anticipate that this tutorial will help the broad user base of rMATS-turbo make the best use of this software for studying alternative splicing in diverse biological systems.