top of page
Search

Poster #17 - Laura Schultz

  • vitod24
  • Oct 20
  • 2 min read

A robust machine learning approach for genotyping the 17q21.31 inversion polymorphism across diverse ancestral populations.


Schultz, L.M. (PhD), Quinto-Cortés, C.D. (PhD), Montserrat, D.M. (PhD), Ioannidis, A. (PhD), Lanzagorta, N. (MS), Bustamante, C.D. (PhD), Jacquémont, S. (MD), Nicolini, H. (MD, PhD), Glahn, D.C. (PhD), and Almasy, L. (PhD)


Inversions are relatively understudied structural variants that are increasingly recognized for their contributions to human phenotypes. A 17q21.31 inversion (inv17_007) is linked to brain morphology and neuropsychiatric disorders. Previous inv17_007 association studies have focused on EUR populations because existing SNP-based methods for inferring inversion status at scale (e.g., scoreInvHap) yield inaccurate calls for other populations. Hence, we devised a method that can be used to robustly assign inversion genotypes across all ancestral populations. First, we identified a stable set of 240 biallelic SNPs within the inv17_007 region of samples genotyped for HapMap3, 1000G, HGDP, UKBB, and EPIMex and harmonized all SNPs to build hg19. Then, we split the samples into 5 continental ancestry groups and ran PCA for each group. We trained and cross-validated ancestry-specific 2-PC linear support vector machine (SVM) models using 592 EUR, 486 AFR, 197 SAS, 521 EAS, and 202 AMR reference samples with known inv17_007 genotypes. All 5 single-ancestry models classified H1/H1 (homozygous non-inverted), H1/H2 (heterozygous), and H2/H2 (homozygous inverted) genotypes with 100% cross-validated accuracy. We used these ancestry-specific models to infer inv17_007 genotypes for the remaining reference samples and the UKBB and EPIMex samples. Analysis of 650 trios from 1000G and EPIMex yielded no Mendelian errors, and the inv17_007 genotypes we inferred for the EUR-ancestry UKBB individuals agreed with the results we obtained using scoreInvHap. Given that the UKBB and EPIMex inv17_007 genotypes inferred by a subsequent multi-ancestry SVM model agreed perfectly with those inferred by the ancestry-specific models, our curated set of 240 SNPs and inferred inv17_007 genotypes for 3431 publicly available reference samples enables fast, accurate inv17_007 genotyping of biobank-scale cohorts via PCA and SVM with no need to classify individuals into discrete ancestry groups.

 
 
 

Recent Posts

See All
Poster #9 - Yuheng Du

Cell-Type-Resolved Placental Epigenomics Identifies Clinically Distinct Subtypes of Preeclampsia Yuheng Du, Ph.D. Student, Department of Computational Medicine and Bioinformatics, University of Michig

 
 
 
Poster #15 - Jiayi Xin

Interpretable Multimodal Interaction-aware Mixture-of-Experts Jiayi Xin, BS, PhD Student, University of Pennsylvania, PA, USA Sukwon Yun, MS, PhD Student, University of North Carolina at Chapel Hil

 
 
 
Poster #14 - Aditya Shah

Tumor subtype and clinical factors mediate the impact of tumor PPARɣ expression on outcomes in patients with primary breast cancer. Aditya Shah1,2, Katie Liu1,3, Ryan Liu1, 4, Gautham Ramshankar1, Cur

 
 
 

Comments


bottom of page