Standard high-throughput functional analysis pipelines omit specific repetitive genetic groups

Research output: Contribution to conferencePosterpeer-review

Abstract

The advent of next-generation sequencing (NGS) technologies such as ChIP-seq and RNA-seq has helped to reveal many functional properties of DNA and RNA, respectively. However, due to the repeat content of higher eukaryotic genomes, identifying the genomic origin of some sequencing reads is often a challenge. Standard analysis pipelines typically neglect ambiguously mapping reads. Thus, families of genetic elements with very similar members remain underexplored.
Here, we investigated the systematic bias of this practice in downstream analysis. Specifically, we analysed six publicly available single-end ChIP-seq and paired-end RNA-seq ENCODE libraries from human and mouse, weighting the number of reads (or read pairs) mapping to each of the genomic features of interest based on the ambiguity of the corresponding mappings. As expected, a substantial fraction of the ambiguously mapping reads (43–79%) mapped to transposons, which comprehend about half of the human and mouse genomes. Notably, those reads were predominantly mapping to evolutionary young transposons such as AluY and L1HS. Thus, discarding ambiguously mapping reads tends to result in the specific underrepresentation of recently active transposons. Moreover, this common strategy also leads to the underrepresentation of genes with particular function, including cytochrome-c oxidase activity and MHC class I and II protein binding.
This study is a proof of principle to raise awareness on potential systematic distortions caused by the common practice of discarding multimappers from NGS data analysis, and encourages the development of strategies that explicitly consider genomic repetitive sequences.
Original languageEnglish
Publication statusPublished - 17 Apr 2023
Event27th Annual International Conference on Research in Computational Molecular Biology: RECOMB 2023 - Istanbul Mariott Hotel Sisli, Istanbul, Turkey
Duration: 16 Apr 202319 Apr 2023
http://recomb2023.bilkent.edu.tr/index.html

Conference

Conference27th Annual International Conference on Research in Computational Molecular Biology
Abbreviated titleRECOMB 2023
Country/TerritoryTurkey
CityIstanbul
Period16/04/2319/04/23
Internet address

Keywords

  • repeats
  • multimappers
  • bias
  • Functional Analysis

Fingerprint

Dive into the research topics of 'Standard high-throughput functional analysis pipelines omit specific repetitive genetic groups'. Together they form a unique fingerprint.

Cite this