Predicting 3D genome organization from the nucelotide sequence with DNA-DDA

Xenia Lainscsek*, Leila Taher

*Corresponding author for this work

Research output: Contribution to conferencePaperpeer-review

Abstract

The intricate regulation of multiple levels of gene expression enables the remarkable cellular diversity seen in eukaryotic organisms. Around $2m$ of DNA must be packaged into every cell nucleus with a diameter of only $\approx 2 \mu m$ in such a manner, that allows for the precise and efficient expression of genes into proteins. Genome folding is therefore characterized by highly organized multi-scale structures which can be probed by novel experimental sequencing techniques such as ``Hi-C'' \cite{Lieb09,Rao14}. So called 2D ``contact maps'' or ``contact matrices'', which visualize the 3D proximity of genomic loci and portray the fractal like nature of 3D genome architecture, are derived from Hi-C experiments. One important feature of genome folding discovered by this assay, is its partitioning into A/B compartments which are associated with transcriptionally active euchromatin and inactive heterochromatin respectively \cite{Dekker13}. Although Hi-C and similar technologies have led to major breakthroughs in understanding the principles of genome folding, such as chromosomal compartmentalization, they are costly, tedious and limited by technical constraints. This has fueled the development of computational models that can simulate the complex patterns of chromatin interactions, unravel their molecular determinants and asses the impact of genomic variants \cite{Yang22}.

We are developing a nonlinear dynamics-based approach ``\textit{DNA-DDA}'' for the prediction of contact maps by adapting the time series classification framework, \textit{delay differential analysis} (DDA), to capture dynamical signatures inherent in genomic sequence data.

In a recent proof-of-concept publication \cite{XLain23}, we demonstrated DNA-DDA could accurately predict chromosomal compartments, from an intermediate step that inferred individual interactions at 100 kb. The DNA sequence was represented as a ``1D DNA walk'' in which the walker starts at zero and continues along the nucleotide chain taking a step up for strongly bonded pairs (C or G) and down for weakly bonded pairs (A or T). DNA-DDA exhibited exceptional performance and competed well with state-of-the-art methods \cite{Zhou22,Kirchhof21,Schweiss20,Fuden20}, indicating its potential as a robust alternative tool for the analysis of genomic sequence data. Importantly, while other methods require nearly all chromosomes for training (around $95\%$ of human autosomes), we obtain sparse models from a 20Mb long region on one chromosome ($0.7\%$ of human autosomes) and the corresponding true positives defined by experimental 3D interaction data. We are currently extending DNA-DDA to a multi-scale resolution model that 1) uses a 2D DNA walk sequence representation to include information of all nucleotides and 2) relies on individual contacts instead of A/B compartment labels for quantifying model performance. Such methods will be key in understanding the interplay between 3D genome architecture and proper cellular function.

Original languageEnglish
Pages163-168
Number of pages5
Publication statusPublished - Oct 2023
EventFrom the Nonlinear Dynamical Systems Theory to Observational Chaos - 4 Rue San Subra, Toulouse, France
Duration: 9 Oct 202311 Oct 2023
https://www.cesbio.cnrs.fr/ottochaos-from-the-nonlinear-dynamical-systems-theory-to-observational-chaos/

Conference

ConferenceFrom the Nonlinear Dynamical Systems Theory to Observational Chaos
Country/TerritoryFrance
CityToulouse
Period9/10/2311/10/23
Internet address

Fingerprint

Dive into the research topics of 'Predicting 3D genome organization from the nucelotide sequence with DNA-DDA'. Together they form a unique fingerprint.

Cite this