top of page
Search

Poster #56 - Ali Oku

  • vitod24
  • Oct 20
  • 2 min read

Enhancing Bulk RNA-Seq Deconvolution Using Atlas-Level Deep Learning Embeddings


Ali Oku, Rui Fu, Heather Geiger, Will Liao, Nicolas Robine New York Genome Center


Cell type deconvolution estimates the relative abundances of cell types in bulk RNA-seq data, allowing researchers to infer tissue composition when single-cell RNA sequencing (scRNA-seq) is not feasible. However, conventional tools are often sensitive to batch effects and reference biases. To overcome these limitations, we evaluated deep generative and transformer-based models for more accurate estimation of cell type proportions. We focused on scVI, a probabilistic variational autoencoder (VAE) that learns low-dimensional latent representations while correcting batch effects, and two large-scale transformer models, Geneformer and scGPT, trained on millions of single-cell profiles. Although not specifically designed for batch correction, transformer embeddings can reduce batch effects by capturing biologically meaningful patterns and can be fine-tuned for diverse tasks. To test whether such latent embeddings improve bulk deconvolution, we used single-cell data from the Human Endometrial Cell Atlas (HECA) spanning seven datasets and 87 donors. Pseudobulk profiles were generated by aggregating single-cell counts to simulate bulk data while preserving true cell type proportions. We extracted latent embeddings from scVI (both a pre-trained model trained on ~75 million human cells and a model trained de novo) and from Geneformer (GF-12L-95M-i4096 model, with and without fine-tuning). These embeddings were applied in two strategies. First, we used them directly for Non-Negative Least Squares (NNLS) to estimate cell proportions. Second, we trained random forest regressors on pseudobulk embeddings to predict cell type composition. We compared estimated and true proportions using mean squared error and benchmarked against conventional deconvolution methods. We show that latent embeddings consistently achieved competitive or superior accuracy showing robustness to batch effects and technical noise. These results demonstrate that deep learning latent embeddings can yield promising performance in cell type deconvolution. Notably, embeddings from pre-trained models performed well without additional fine-tuning, highlighting their potential for robust and efficient bulk RNA-seq deconvolution.

 
 
 

Recent Posts

See All
Poster #9 - Yuheng Du

Cell-Type-Resolved Placental Epigenomics Identifies Clinically Distinct Subtypes of Preeclampsia Yuheng Du, Ph.D. Student, Department of Computational Medicine and Bioinformatics, University of Michig

 
 
 
Poster #15 - Jiayi Xin

Interpretable Multimodal Interaction-aware Mixture-of-Experts Jiayi Xin, BS, PhD Student, University of Pennsylvania, PA, USA Sukwon Yun, MS, PhD Student, University of North Carolina at Chapel Hil

 
 
 
Poster #14 - Aditya Shah

Tumor subtype and clinical factors mediate the impact of tumor PPARɣ expression on outcomes in patients with primary breast cancer. Aditya Shah1,2, Katie Liu1,3, Ryan Liu1, 4, Gautham Ramshankar1, Cur

 
 
 

Comments

Couldn’t Load Comments
It looks like there was a technical problem. Try reconnecting or refreshing the page.
bottom of page