• mabc307

JOnTAD: joint hierarchical TAD caller for high resolution, single cell and bulk cell Hi-C data

Updated: Sep 29

Qiuhai Zeng, BS, Pennsylvania State University; Guanjue Xiang, PhD, CAMP4 Therapeutics; Yu Zhang, PhD, Two Sigma; Qunhua Li, PhD, Pennsylvania State University

The three-dimensional (3D) genome organization is critical in gene functions and genome replication. Topologically associating domain (TAD), a basic genome organization unit observed on Hi-C data, has been shown to be important for gene functions and encompasses cell-type specific substructures. However, it remains unclear how the TAD-dominated organization contributes to gene regulation. Studying TAD hierarchical structures across different conditions, at the single-cell level, or at the nucleosome resolution would greatly improve our understanding of the role of TAD organization in gene regulation. To date, many TAD calling algorithms are available; however, most of them were designed for calling TAD from a single bulk-cell Hi-C dataset. Hence they may not be effective for capturing specific characteristics in single-cell or high resolution data or for identifying biologically meaningful differences in TAD structures across multiple samples. In this work, we develop a unified framework, called Joint Optimized nested TAD (JOnTAD) model, a hierarchical TAD caller that can perform TAD calling for single cell Hi-C data, high-resolution chromatin conformation data and multiple-sample comparisons. JOnTAD explicitly accounts for the special TAD-related structures on high-resolution data, provides a data-driven FDR control, and integrates information across samples when multiple samples are available. Through a series of systematic evaluations on multiple contact maps from different phases in the cell cycle, single cell interaction data (DIP-C) and high-resolution data (Micro-C), we show that JOnTAD is particularly effective in accentuating meaningful biological differences across samples and reducing spurious TAD boundaries. Compared with the existing methods, it produces robust TAD hierarchies that better explain the variation of interaction frequency in the contact maps, are higher enriched with epigenomic signals and boundary proteins, and agree with the known biological relationships (e.g. lineage and cell phase) across samples. It is computational efficient, especially on high-resolution data.

4 views0 comments

Recent Posts

See All