Poster #49 - Busra Coskun
- vitod24
- Oct 20
- 2 min read
Benchmarking Statistical Dimension Reduction Frameworks for Integrative Multi-Omics Analysis with Missing Modalities
Busra Coskun; University of Pennsylvania Qi Long, PhD; University of Pennsylvania Konstantinos Tsingas, MS; University of Pennsylvania
Multi-omics studies, which integrate data sources from distinct biological scales such as transcriptomics, proteomics, and metabolomics, have the potential to reveal mechanisms that underlie disease. However, these studies often contain missing information: one or more molecular layers may not be collected for a given patient or sample, which limits the ability of integration and inference. Standard complete-case analyses reduce sample size and statistical power, while naive imputation methods can produce incorrect associations that do not reflect true biology. We compare a broad class of statistical dimension reduction frameworks that estimate shared low-dimensional representations of patients across heterogeneous data types, including generalized factor analysis, joint matrix factorization, and canonical correlation analysis models. We evaluate three strategies for handling incomplete data: (1) complete-case analysis, which discards patients with any missing values; (2) ad-hoc imputation before applying standard integrative models; and (3) methods designed to tolerate missingness, which implicitly leverage available data. Using incomplete multi-omics data from the Alzheimer's Disease Neuroimaging Initiative, we evaluate these approaches using a systematic benchmark. We initially assess whether latent factors learned under each strategy align with clinical diagnosis and staging using a bootstrap analysis. To evaluate robustness and interpretability of the latent structures, we test whether inferred latent clusters are associated with missingness indicators, revealing whether models truly capture biological or clinical patterns or are instead driven by missing data. Overall, this benchmarking study clarifies trade-offs in using integrative methods for multi-omics studies with missingness. In doing so, we demonstrate how different strategies for handling incomplete multi-omics data can influence patient stratification and disease prediction.

