ChromFormer: A transformer-based model for 3D genome structure prediction
Abstract
Recent research has shown that the three-dimensional (3D) genome structure is strongly linked to cell function. Modeling the 3D genome structure can not only elucidate vital biological processes, but also reveal structural disruptions linked to disease. In the absence of experimental techniques able to determine 3D chromatin structure, this task is achieved computationally by exploiting chromatin interaction frequencies as measured by high-throughput chromosome conformation capture (Hi-C) data. However, existing methods are unsupervised, and limited by underlying assumptions. In this work, we present a novel framework for 3D chromatin structure prediction from Hi-C data. The framework consists of, a novel synthetic data generation module that simulates realistic structures and corresponding Hi-C matrices, and ChromFormer, a transformer-based model to predict 3D chromatin structures from standalone Hi-C data, while providing local structural-level confidence estimates. Our solution outperforms existing methods when tested on unseen synthetic data, and achieves comparable results on experimental data for a full eukaryotic genome. The code, data, and models can be accessed at https://github.com/AI4SCR/ChromFormer.