Bayesian Mixture Model for the Identification of Loci of Interest from GWAS Summary Statistics
Rachit Kumar, BS*; Rasika Venkatesh, BS*; Marylyn Ritchie, PhD^ *Genomics and Computational Biology, University of Pennsylvania ^Department of Genetics, University of Pennsylvania
Poster # 38
Genome-wide association studies (GWAS) are a popular method for analyzing the association of genetic mutations or alterations with disease or other phenotypes. However, the biological phenomenon of linkage disequilibrium means that a variant that is linked to a causal variant may be mistakenly identified as causing disease, when in reality it is merely likely to be inherited alongside the true causal variant in a particular region of the genome. As such, people have developed downstream analysis methods that make use of GWAS results to try to identify specific causal variants or identify the functional underpinnings of their effect on disease; however, many of these analyses require users to define the bounds of specific regions of the genome that they would like to assess, and these bounds can have quite significant impacts on the results of these analyses. There is therefore a need for a method that identifies these boundaries in a rigorous and reproducible way. We present a method that uses Bayesian mixture models to perform statistical inference on GWAS summary statistics to identify whether individual genomic positions represent "breakpoints" between regions containing insignificant variants and regions containing significant variants, allowing one to identify probabilistically which regions represent possible regions where variants are in linkage that may be valuable for downstream studies. We will show the results of this analysis on various existing GWAS summary statistics that are publicly available.