Discarding multimappers leads to biases in the functional analysis of NGS data

Activity: Talk or presentationTalk at conference or symposiumScience to science

Description

Introduction
For more than a decade, researchers have recognized the challenge of handling sequencing reads that map to more than one genomic location (“multimappers”) in the genome. Several strategies have been proposed to mitigate the problem, but discarding multimappers is still a common practice in standard next-generation sequencing (NGS) pipelines, including ChIP-seq and RNA-seq pipelines.
Methods
Here, we investigated the biases of discarding multimappers in downstream functional analyses of NGS data. In particular, we analysed six single-end ChIP-seq and pair-end RNA-seq ENCODE libraries from human and mouse. Our strategy is relatively simple and aims to weight each read (or read pair) mapping to a genomic element of interest by the number of loci to which the read (or read pair) maps.
Results
A substantial fraction of multimappers (43–79%) mapped to transposons, which comprise about 50% of the human and mouse genomes. And, specifically, those reads were predominantly mapping to evolutionary young transposons, such as AluYa5 and L1HS. Thus, discarding multimappers leads to an underrepresentation of recently active transposons. In addition, we found that this practice results in an underrepresentation of about 6% and 4% of expressed genes in human and mouse, respectively. Notably, in the datasets we considered, members of repetitive gene families, such the ones including major histocompatibility complex (MHC) class I and II genes, were underrepresented in a widespread manner, masking important functions related to MHC class I and II immune responses and peptide antigen binding.
Conclusion
With this study, we aim to raise awareness on the biases caused by the common practice of discarding multimappers from NGS data in epigenetic and transcriptomic studies, and strongly recommend the implementation of multimapper-aware bioinformatic approaches as fundamental tools of standard NGS data analysis pipelines.
Period3 Nov 2023
Event titleÖGBMT - Jahrestagung 2023
Event typeConference
Degree of RecognitionInternational