Disregarding multimappers leads to biases in the functional assessment of NGS data

Michelle Almeida da Paz, Sarah Warger, Leila Taher*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

BACKGROUND: Standard ChIP-seq and RNA-seq processing pipelines typically disregard sequencing reads whose origin is ambiguous ("multimappers"). This usual practice has potentially important consequences for the functional interpretation of the data: genomic elements belonging to clusters composed of highly similar members are left unexplored.

RESULTS: In particular, disregarding multimappers leads to the underrepresentation in epigenetic studies of recently active transposable elements, such as AluYa5, L1HS and SVAs. Furthermore, this common strategy also has implications for transcriptomic analysis: members of repetitive gene families, such the ones including major histocompatibility complex (MHC) class I and II genes, are under-quantified.

CONCLUSION: Revealing inherent biases that permeate routine tasks such as functional enrichment analysis, our results underscore the urgency of broadly adopting multimapper-aware bioinformatic pipelines -currently restricted to specific contexts or communities- to ensure the reliability of genomic and transcriptomic studies.

Original languageEnglish
Article number455
JournalBMC Genomics
Volume25
Issue number1
DOIs
Publication statusPublished - Dec 2024

Keywords

  • Humans
  • High-Throughput Nucleotide Sequencing
  • DNA Transposable Elements/genetics
  • Computational Biology/methods
  • Gene Expression Profiling/methods
  • Genomics/methods
  • Sequence Analysis, RNA/methods
  • Multimappers
  • RNA-seq
  • ChIP-seq
  • Functional analysis
  • Next-generation sequencing (NGS)

ASJC Scopus subject areas

  • Genetics
  • Biotechnology

Fingerprint

Dive into the research topics of 'Disregarding multimappers leads to biases in the functional assessment of NGS data'. Together they form a unique fingerprint.

Cite this