Abstract
BACKGROUND: Standard ChIP-seq and RNA-seq processing pipelines typically disregard sequencing reads whose origin is ambiguous ("multimappers"). This usual practice has potentially important consequences for the functional interpretation of the data: genomic elements belonging to clusters composed of highly similar members are left unexplored.
RESULTS: In particular, disregarding multimappers leads to the underrepresentation in epigenetic studies of recently active transposable elements, such as AluYa5, L1HS and SVAs. Furthermore, this common strategy also has implications for transcriptomic analysis: members of repetitive gene families, such the ones including major histocompatibility complex (MHC) class I and II genes, are under-quantified.
CONCLUSION: Revealing inherent biases that permeate routine tasks such as functional enrichment analysis, our results underscore the urgency of broadly adopting multimapper-aware bioinformatic pipelines -currently restricted to specific contexts or communities- to ensure the reliability of genomic and transcriptomic studies.
Original language | English |
---|---|
Article number | 455 |
Journal | BMC Genomics |
Volume | 25 |
Issue number | 1 |
DOIs | |
Publication status | Published - Dec 2024 |
Keywords
- Humans
- High-Throughput Nucleotide Sequencing
- DNA Transposable Elements/genetics
- Computational Biology/methods
- Gene Expression Profiling/methods
- Genomics/methods
- Sequence Analysis, RNA/methods
- Multimappers
- RNA-seq
- ChIP-seq
- Functional analysis
- Next-generation sequencing (NGS)
ASJC Scopus subject areas
- Genetics
- Biotechnology