Linkreg: a Bayesian framework for linking candidate cis-regulatory elements to target genes
Updated: Sep 29
Qiuhai Zeng, BS, Pennsylvania State University; Guanjue Xiang, PhD, CAMP4 Therapeutics; Ross Hardison, PhD, Pennsylvania State University; Xiang Zhu, PhD, Pennsylvania State University; Qunhua Li, PhD, Pennsylvania State University
Cis-regulatory elements (CREs) are non-coding DNA segments that regulate transcription. Hundreds of thousands of candidate CREs (cCREs) have been identified in the human and mouse genomes, many of which have critical regulatory functions, but determining their target genes remains challenging. Experimental efforts that examine one locus at a time have established likely causal cCREs for several genes, but they cannot deliver genome-scale analyses. Genome-wide sequencing of epigenomes and transcriptomes can be used to infer the cCRE-gene links, but existing methods remain correlative and focus on a limited set of histone marks (e.g., H3K27ac, H3K4me3), ignoring a profusion of functional marks in the genome and their combinatory effect on gene regulation. Here we develop Linkreg, a Bayesian model that infers cCRE-gene links by relating the expression level of a gene to the nearby cCRE's epigenomic annotations that are derived from a combination of epigenomic features in multiple cell types. For each gene, Linkreg takes as input the expression level, the list of cCREs in all cell types and their epigenomic annotations (e.g., ChromHMM, IDEAS), and then outputs the posterior inclusion probability and credible set to quantify how likely a cCRE regulates this gene . In simulations based on real epigenomes, Linkreg achieves significantly higher power than existing methods, while rigorously controlling false discoveries. On the mouse VISION (Xiang et al, 2020) and human EpiMap (Boix et al, 2021) datasets, Linkreg identifies many cCRE-gene links with high confidence, which are further validated in external experiments of chromatin conformation and CRISPR screening. In summary, Linkreg connects cCREs to their putative target genes through effective modeling of diverse epigenomic annotations across cell types, leading to a more accurate and interpretable characterization of gene regulatory network.