TY - JOUR
T1 - T3E
T2 - a tool for characterising the epigenetic profile of transposable elements using ChIP-seq data
AU - Almeida da Paz, Michelle
AU - Taher, Leila
N1 - Funding Information:
Open access funding provided by Graz University of Technology. This work was supported by grant P33437 from the Austrian Science Fund (FWF) awarded to L.T.
Publisher Copyright:
© 2022, The Author(s).
PY - 2022/12
Y1 - 2022/12
N2 - Background: Despite the advent of Chromatin Immunoprecipitation Sequencing (ChIP-seq) having revolutionised our understanding of the mammalian genome’s regulatory landscape, many challenges remain. In particular, because of their repetitive nature, the sequencing reads derived from transposable elements (TEs) pose a real bioinformatics challenge, to the point that standard analysis pipelines typically ignore reads whose genomic origin cannot be unambiguously ascertained. Results: We show that discarding ambiguously mapping reads may lead to a systematic underestimation of the number of reads associated with young TE families/subfamilies. We also provide evidence suggesting that the strategy of randomly permuting the location of the read mappings (or the TEs) that is often used to compute the background for enrichment calculations at TE families/subfamilies can result in both false positive and negative enrichments. To address these problems, we present the Transposable Element Enrichment Estimator (T3E), a tool that makes use of ChIP-seq data to characterise the epigenetic profile of associated TE families/subfamilies. T3E weights the number of read mappings assigned to the individual TE copies of a family/subfamily by the overall number of genomic loci to which the corresponding reads map, and this is done at the single nucleotide level. In addition, T3E computes ChIP-seq enrichment relative to a background estimated based on the distribution of the read mappings in the input control DNA. We demonstrated the capabilities of T3E on 23 different ChIP-seq libraries. T3E identified enrichments that were consistent with previous studies. Furthermore, T3E detected context-specific enrichments that are likely to pinpoint unexplored TE families/subfamilies with individual TE copies that have been frequently exapted as cis-regulatory elements during the evolution of mammalian regulatory networks. Conclusions: T3E is a novel open-source computational tool (available for use at: https://github.com/michelleapaz/T3E) that overcomes some of the pitfalls associated with the analysis of ChIP-seq data arising from the repetitive mammalian genome and provides a framework to shed light on the epigenetics of entire TE families/subfamilies.
AB - Background: Despite the advent of Chromatin Immunoprecipitation Sequencing (ChIP-seq) having revolutionised our understanding of the mammalian genome’s regulatory landscape, many challenges remain. In particular, because of their repetitive nature, the sequencing reads derived from transposable elements (TEs) pose a real bioinformatics challenge, to the point that standard analysis pipelines typically ignore reads whose genomic origin cannot be unambiguously ascertained. Results: We show that discarding ambiguously mapping reads may lead to a systematic underestimation of the number of reads associated with young TE families/subfamilies. We also provide evidence suggesting that the strategy of randomly permuting the location of the read mappings (or the TEs) that is often used to compute the background for enrichment calculations at TE families/subfamilies can result in both false positive and negative enrichments. To address these problems, we present the Transposable Element Enrichment Estimator (T3E), a tool that makes use of ChIP-seq data to characterise the epigenetic profile of associated TE families/subfamilies. T3E weights the number of read mappings assigned to the individual TE copies of a family/subfamily by the overall number of genomic loci to which the corresponding reads map, and this is done at the single nucleotide level. In addition, T3E computes ChIP-seq enrichment relative to a background estimated based on the distribution of the read mappings in the input control DNA. We demonstrated the capabilities of T3E on 23 different ChIP-seq libraries. T3E identified enrichments that were consistent with previous studies. Furthermore, T3E detected context-specific enrichments that are likely to pinpoint unexplored TE families/subfamilies with individual TE copies that have been frequently exapted as cis-regulatory elements during the evolution of mammalian regulatory networks. Conclusions: T3E is a novel open-source computational tool (available for use at: https://github.com/michelleapaz/T3E) that overcomes some of the pitfalls associated with the analysis of ChIP-seq data arising from the repetitive mammalian genome and provides a framework to shed light on the epigenetics of entire TE families/subfamilies.
KW - Background signal
KW - ChIP-seq
KW - Enrichment
KW - Epigenetics
KW - Input control
KW - Transposable element
UR - http://www.scopus.com/inward/record.url?scp=85142905708&partnerID=8YFLogxK
U2 - 10.1186/s13100-022-00285-z
DO - 10.1186/s13100-022-00285-z
M3 - Article
AN - SCOPUS:85142905708
SN - 1759-8753
VL - 13
JO - Mobile DNA
JF - Mobile DNA
IS - 1
M1 - 29
ER -