Poster #94 - Shasmeen Azhar

vitod24
Oct 20, 2025
2 min read

Pangenome-Guided Recovery of Human DNA Missed by the Linear Reference in Cancer PDX Genomes

Brydon P. G. Wall, MS Student - Virginia Commonwealth University My Nguyen, PhD Candidate - Virginia commonwealth University Stella Castro, BS/MS Student - Virginia Commonwealth University Shasmeen Azhar, BS/MS Student - Virginia Commonwealth University Nina Dashti-Gibson, MD/PhD Student - Virginia Commonwealth University J. Chuck Harrell, PhD - Virginia Commonwealth University Mikhail G. Dozmorov, PhD - Virginia Commonwealth University Charles E. Middleton, MD - University of Florida Grace R. Thompson, MD - University of Florida Jose G. Trevino, MD - Virginia Commonwealth University Katarzyna M. Tyc, PhD - Virginia Commonwealth University

Patient-derived xenograft (PDX) whole-genome sequencing (WGS) analysis begins with separating human and mouse reads prior to downstream variant calling, copy number analysis, and other molecular characterizations. Existing read classification tools typically rely on the GRCh38/hg38 human reference genome, which, while widely adopted, is incomplete and biased toward a narrow range of ancestries. As a result, large genomic segments-particularly those common in underrepresented populations-are absent from the reference, leading to systematic misclassification or loss of biologically relevant reads. In cancer PDX studies, this loss may obscure potentially important tumor-specific or patient-specific genomic signals. In this work, we assess whether the use of comprehensive, multi-ancestry reference genomes from the Human Pangenome Reference Consortium (HPRC) can improve human read assignment in PDX WGS data and reveal previously inaccessible genomic content. We analyzed 63 PDX models representing a diverse set of cancer types, sequenced at ~30X coverage. After initial mouse-human read deconvolution with Xengsort, we extracted reads classified as ambiguous or neither and reanalyzed them against a panel of non-canonical human references from the HPRC, including assemblies with African, East Asian, and other global ancestries. Using this expanded reference set, preliminary results indicate that up to 70% of previously unassigned reads can be confidently reassigned to human origin. We further applied graph-based genome representations and mapping strategies to characterize reads with poor alignment to GRCh38, comparing alignment quality across graph versus linear references. Genes and genomic regions most impacted by improved mapping were flagged for further interrogation. Reads unmapped to GRCh38 but mapped using the graph approach were examined for genomic context, revealing an enrichment of repetitive elements and non-coding gene regions that could hold regulatory or functional significance in cancer biology. This preliminary work establishes the computational framework needed to systematically mine the "dark" regions of the human genome in PDX studies. By integrating multi-ancestry graph references into standard PDX analysis pipelines, we can recover substantial amounts of genomic information that would otherwise be discarded, thereby reducing bias and increasing completeness in cancer genomic analyses. The approach lays the groundwork for future studies linking these recovered sequences to cancer-relevant biology, patient ancestry, and therapeutic response.

MidAtlantic Bioinformatics Conference

Friday November 7, 2025

Poster #94 - Shasmeen Azhar

Recent Posts

Comments