Abstract
The intricate regulation of multiple levels of gene expression enables the remarkable cellular diversity seen in eukaryotic organisms. Around $2m$ of DNA must be packaged into every cell nucleus with a diameter of only $\approx 2 \mu m$ in such a manner, that allows for the precise and efficient expression of genes into proteins. Genome folding is therefore characterized by highly organized multi-scale structures which can be probed by novel experimental sequencing techniques such as ``Hi-C'' \cite{Lieb09,Rao14}. So called 2D ``contact maps'' or ``contact matrices'', which visualize the 3D proximity of genomic loci and portray the fractal like nature of 3D genome architecture, are derived from Hi-C experiments. One important feature of genome folding discovered by this assay, is its partitioning into A/B compartments which are associated with transcriptionally active euchromatin and inactive heterochromatin respectively \cite{Dekker13}. Although Hi-C and similar technologies have led to major breakthroughs in understanding the principles of genome folding, such as chromosomal compartmentalization, they are costly, tedious and limited by technical constraints. This has fueled the development of computational models that can simulate the complex patterns of chromatin interactions, unravel their molecular determinants and asses the impact of genomic variants \cite{Yang22}.
We are developing a nonlinear dynamics-based approach ``\textit{DNA-DDA}'' for the prediction of contact maps by adapting the time series classification framework, \textit{delay differential analysis} (DDA), to capture dynamical signatures inherent in genomic sequence data.
In a recent proof-of-concept publication \cite{XLain23}, we demonstrated DNA-DDA could accurately predict chromosomal compartments, from an intermediate step that inferred individual interactions at 100 kb. The DNA sequence was represented as a ``1D DNA walk'' in which the walker starts at zero and continues along the nucleotide chain taking a step up for strongly bonded pairs (C or G) and down for weakly bonded pairs (A or T). DNA-DDA exhibited exceptional performance and competed well with state-of-the-art methods \cite{Zhou22,Kirchhof21,Schweiss20,Fuden20}, indicating its potential as a robust alternative tool for the analysis of genomic sequence data. Importantly, while other methods require nearly all chromosomes for training (around $95\%$ of human autosomes), we obtain sparse models from a 20Mb long region on one chromosome ($0.7\%$ of human autosomes) and the corresponding true positives defined by experimental 3D interaction data. We are currently extending DNA-DDA to a multi-scale resolution model that 1) uses a 2D DNA walk sequence representation to include information of all nucleotides and 2) relies on individual contacts instead of A/B compartment labels for quantifying model performance. Such methods will be key in understanding the interplay between 3D genome architecture and proper cellular function.
We are developing a nonlinear dynamics-based approach ``\textit{DNA-DDA}'' for the prediction of contact maps by adapting the time series classification framework, \textit{delay differential analysis} (DDA), to capture dynamical signatures inherent in genomic sequence data.
In a recent proof-of-concept publication \cite{XLain23}, we demonstrated DNA-DDA could accurately predict chromosomal compartments, from an intermediate step that inferred individual interactions at 100 kb. The DNA sequence was represented as a ``1D DNA walk'' in which the walker starts at zero and continues along the nucleotide chain taking a step up for strongly bonded pairs (C or G) and down for weakly bonded pairs (A or T). DNA-DDA exhibited exceptional performance and competed well with state-of-the-art methods \cite{Zhou22,Kirchhof21,Schweiss20,Fuden20}, indicating its potential as a robust alternative tool for the analysis of genomic sequence data. Importantly, while other methods require nearly all chromosomes for training (around $95\%$ of human autosomes), we obtain sparse models from a 20Mb long region on one chromosome ($0.7\%$ of human autosomes) and the corresponding true positives defined by experimental 3D interaction data. We are currently extending DNA-DDA to a multi-scale resolution model that 1) uses a 2D DNA walk sequence representation to include information of all nucleotides and 2) relies on individual contacts instead of A/B compartment labels for quantifying model performance. Such methods will be key in understanding the interplay between 3D genome architecture and proper cellular function.
Original language | English |
---|---|
Pages | 163-168 |
Number of pages | 5 |
Publication status | Published - Oct 2023 |
Event | From the Nonlinear Dynamical Systems Theory to Observational Chaos - 4 Rue San Subra, Toulouse, France Duration: 9 Oct 2023 → 11 Oct 2023 https://www.cesbio.cnrs.fr/ottochaos-from-the-nonlinear-dynamical-systems-theory-to-observational-chaos/ |
Conference
Conference | From the Nonlinear Dynamical Systems Theory to Observational Chaos |
---|---|
Country/Territory | France |
City | Toulouse |
Period | 9/10/23 → 11/10/23 |
Internet address |