• mabc307

Medulloblastoma subtype single sample predictor built on multiple gene expression platforms

Updated: Sep 29

Steven M. Foltz, PhD 1,2, Casey S. Greene, PhD 1,3, Jaclyn N. Taroni, PhD 2 1 Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA 2 Childhood Cancer Data Lab, Alex's Lemonade Stand Foundation, Wynnewood, PA, USA 3 Center for Health AI, University of Colorado School of Medicine, Aurora, CO, USA

Medulloblastoma (MB) is the most common form of pediatric brain cancer, with over 1000 new cases affecting children and families worldwide each year. The four main subtypes of MB (WNT, SHH, Group 3, and Group 4) show different prognoses, especially when stratified by age and sex, and may inform treatment options based on the risk of recurrence. While MB subtype prediction with gene expression data has been well-characterized in large cohort studies, few single sample predictors have been developed. Single sample predictors make subtype predictions on an individual sample basis and do not require normalization with the model's training data. This approach facilitates analyzing new patient data from different platforms (e.g., microarray and RNA-seq), sample handling processes, and tumor purity levels. Existing MB single sample predictors use activated gene pathways or gene expression ratios to classify tumors and were built using transcriptome-wide gene sets. To complement existing methods, our true single sample predictor approach incorporates a diverse set of publicly-available data from a wide range of studies, including samples generated from microarray and RNA-seq platforms. The models are built using a k top-scoring pairs (kTSP) approach that identifies gene pairs whose relative expression levels (e.g., Gene A < Gene B) make informative rules for subtype classification. Gene sets may also be restricted to targeted panels, expanding potential clinical relevance. In our collection of 1297 samples (1188 microarray, 109 RNA-seq), subtype-level test performance showed median balanced accuracies of 0.96 (Group 3), 0.97 (Group 4), 0.99 (SHH), and 0.97 (WNT). Our software strives to be user-friendly and general enough to work with other cancer types, prediction problems, and data types.

4 views0 comments

Recent Posts

See All