Standard high-throughput functional analysis pipelines omit specific repetitive genetic groups

Publikation: KonferenzbeitragPosterBegutachtung

Abstract

The advent of next-generation sequencing (NGS) technologies such as ChIP-seq and RNA-seq has helped to reveal many functional properties of DNA and RNA, respectively. However, due to the repeat content of higher eukaryotic genomes, identifying the genomic origin of some sequencing reads is often a challenge. Standard analysis pipelines typically neglect ambiguously mapping reads. Thus, families of genetic elements with very similar members remain underexplored.
Here, we investigated the systematic bias of this practice in downstream analysis. Specifically, we analysed six publicly available single-end ChIP-seq and paired-end RNA-seq ENCODE libraries from human and mouse, weighting the number of reads (or read pairs) mapping to each of the genomic features of interest based on the ambiguity of the corresponding mappings. As expected, a substantial fraction of the ambiguously mapping reads (43–79%) mapped to transposons, which comprehend about half of the human and mouse genomes. Notably, those reads were predominantly mapping to evolutionary young transposons such as AluY and L1HS. Thus, discarding ambiguously mapping reads tends to result in the specific underrepresentation of recently active transposons. Moreover, this common strategy also leads to the underrepresentation of genes with particular function, including cytochrome-c oxidase activity and MHC class I and II protein binding.
This study is a proof of principle to raise awareness on potential systematic distortions caused by the common practice of discarding multimappers from NGS data analysis, and encourages the development of strategies that explicitly consider genomic repetitive sequences.
Originalspracheenglisch
PublikationsstatusVeröffentlicht - 17 Apr. 2023
Veranstaltung27th Annual International Conference on Research in Computational Molecular Biology: RECOMB 2023 - Istanbul Mariott Hotel Sisli, Istanbul, Türkei
Dauer: 16 Apr. 202319 Apr. 2023
http://recomb2023.bilkent.edu.tr/index.html

Konferenz

Konferenz27th Annual International Conference on Research in Computational Molecular Biology
KurztitelRECOMB 2023
Land/GebietTürkei
OrtIstanbul
Zeitraum16/04/2319/04/23
Internetadresse

Fingerprint

Untersuchen Sie die Forschungsthemen von „Standard high-throughput functional analysis pipelines omit specific repetitive genetic groups“. Zusammen bilden sie einen einzigartigen Fingerprint.

Dieses zitieren