Seamless integrative pipeline for QTL datasets enhance the discovery of putative causal variants for
Updated: Sep 29, 2022
Jeffrey Cifello(1), Pavel P Kuksa(1), Li-San Wang(1), Yuk Yee Leung(1) 1 Penn Neurodegeneration Genomics Center, Department of Pathology and Laboratory Medicine, University of Pennsylvania
Colocalization of significant loci between genome-wide association studies (GWAS) and quantitative trait loci (QTLs) have demonstrated tissue-specific associations. Such signals have been detected in the context of Alzheimer's Disease (AD) and are useful for elucidating disease mechanisms. Recently, new QTL datasets generated by different groups have focused on different brain regions, but transforming these heterogeneous datasets into analysis-ready files is not straight-forward. This is due to inconsistencies in file formats, variant nomenclature, allele representation, and reported statistics across datasets. To facilitate the use of various types of QTL datasets in assessing the regulatory potential of non-coding variants, we developed a scalable, automated pipeline (https://bitbucket.org/wanglab-upenn/filer-xqtl-pipeline) for expedited harmonization, quality control, format standardization (BED-like), genomic indexing, and digestion into the FILER repository (Kuksa et al, 2022), which contains >60,000 harmonized functional genomic and annotation datasets available for query. Using this pipeline, we have integrated over 25 billion QTL records representing 84 different tissues and cell types. To illustrate this pipeline, we applied our scalable colocalization pipeline (based on coloc (Giambartolomei et al, 2014)) on 11,449,726 variants (241 linkage-disequilibrium blocks) from AD GWAS (Kunkle et al. 2019) and AMP-AD eQTL (Sieberts et al, 2020) datasets. This analysis of three brain regions (cerebellum (CER), dorsolateral prefrontal cortex (DLPFC), and temporal cortex (TCX)), yielded 173 significantly colocalized variants, associated with 24 target genes and 116 GWAS loci. We restricted our significant loci by excluding HLA and APOE regions and queried the four remaining loci against FILER. All four were previously reported as significant GTEx splicing-QTLs, and three of them positively correlated with PTK2B in whole blood. These variants may provide us with functional insights into PTK2B's role in AD. This also demonstrates the potential of the QTL harmonization pipeline to uncover functional results by efficiently integrating novel datasets.