Linking genomics and population genetics with R

Emmanuel Paradis; Thierry Gosselin; Jérôme Goudet; Thibaut Jombart; Klaus Schliep

doi:10.1111/1755-0998.12577

Linking genomics and population genetics with R

Emmanuel Paradis, Thierry Gosselin, Jérôme Goudet, Thibaut Jombart, Klaus Schliep

Research output: Contribution to journal › Article › peer-review

Abstract

Population genetics and genomics have developed and been treated as independent fields of study despite having common roots. The continuous progress of sequencing technologies is contributing to (re-)connect these two disciplines. We review the challenges faced by data analysts and software developers when handling very big genetic data sets collected on many individuals. We then expose how r, as a computing language and development environment, proposes some solutions to meet these challenges. We focus on some specific issues that are often encountered in practice: handling and analysing single-nucleotide polymorphism data, handling and reading variant call format files, analysing haplotypes and linkage disequilibrium and performing multivariate analyses. We illustrate these implementations with some analyses of three recently published data sets that contain between 60 000 and 1 000 000 loci. We conclude with some perspectives on future developments of r software for population genomics.

Original language	English
Pages (from-to)	54-66
Number of pages	13
Journal	Molecular Ecology Resources
Volume	17
Issue number	1
DOIs	https://doi.org/10.1111/1755-0998.12577
Publication status	Published - Jan 2017
Externally published	Yes

Keywords

Biostatistics/methods
Computational Biology/methods
Genetics, Population/methods
Genomics/methods
Haplotypes
Linkage Disequilibrium
Polymorphism, Single Nucleotide
Software

Access to Document

10.1111/1755-0998.12577

Cite this

@article{12350047295d46809b9848e9a0bd910e,

title = "Linking genomics and population genetics with R",

abstract = "Population genetics and genomics have developed and been treated as independent fields of study despite having common roots. The continuous progress of sequencing technologies is contributing to (re-)connect these two disciplines. We review the challenges faced by data analysts and software developers when handling very big genetic data sets collected on many individuals. We then expose how r, as a computing language and development environment, proposes some solutions to meet these challenges. We focus on some specific issues that are often encountered in practice: handling and analysing single-nucleotide polymorphism data, handling and reading variant call format files, analysing haplotypes and linkage disequilibrium and performing multivariate analyses. We illustrate these implementations with some analyses of three recently published data sets that contain between 60 000 and 1 000 000 loci. We conclude with some perspectives on future developments of r software for population genomics.",

keywords = "Biostatistics/methods, Computational Biology/methods, Genetics, Population/methods, Genomics/methods, Haplotypes, Linkage Disequilibrium, Polymorphism, Single Nucleotide, Software",

author = "Emmanuel Paradis and Thierry Gosselin and J{\'e}r{\^o}me Goudet and Thibaut Jombart and Klaus Schliep",

year = "2017",

month = jan,

doi = "10.1111/1755-0998.12577",

language = "English",

volume = "17",

pages = "54--66",

journal = "Molecular Ecology Resources",

issn = "1755-098X",

publisher = "Wiley-Blackwell",

number = "1",

}

TY - JOUR

T1 - Linking genomics and population genetics with R

AU - Paradis, Emmanuel

AU - Gosselin, Thierry

AU - Goudet, Jérôme

AU - Jombart, Thibaut

AU - Schliep, Klaus

PY - 2017/1

Y1 - 2017/1

N2 - Population genetics and genomics have developed and been treated as independent fields of study despite having common roots. The continuous progress of sequencing technologies is contributing to (re-)connect these two disciplines. We review the challenges faced by data analysts and software developers when handling very big genetic data sets collected on many individuals. We then expose how r, as a computing language and development environment, proposes some solutions to meet these challenges. We focus on some specific issues that are often encountered in practice: handling and analysing single-nucleotide polymorphism data, handling and reading variant call format files, analysing haplotypes and linkage disequilibrium and performing multivariate analyses. We illustrate these implementations with some analyses of three recently published data sets that contain between 60 000 and 1 000 000 loci. We conclude with some perspectives on future developments of r software for population genomics.

AB - Population genetics and genomics have developed and been treated as independent fields of study despite having common roots. The continuous progress of sequencing technologies is contributing to (re-)connect these two disciplines. We review the challenges faced by data analysts and software developers when handling very big genetic data sets collected on many individuals. We then expose how r, as a computing language and development environment, proposes some solutions to meet these challenges. We focus on some specific issues that are often encountered in practice: handling and analysing single-nucleotide polymorphism data, handling and reading variant call format files, analysing haplotypes and linkage disequilibrium and performing multivariate analyses. We illustrate these implementations with some analyses of three recently published data sets that contain between 60 000 and 1 000 000 loci. We conclude with some perspectives on future developments of r software for population genomics.

KW - Biostatistics/methods

KW - Computational Biology/methods

KW - Genetics, Population/methods

KW - Genomics/methods

KW - Haplotypes

KW - Linkage Disequilibrium

KW - Polymorphism, Single Nucleotide

KW - Software

U2 - 10.1111/1755-0998.12577

DO - 10.1111/1755-0998.12577

M3 - Article

C2 - 27461508

SN - 1755-098X

VL - 17

SP - 54

EP - 66

JO - Molecular Ecology Resources

JF - Molecular Ecology Resources

IS - 1

ER -

Linking genomics and population genetics with R

Abstract

Keywords

Access to Document

Fingerprint

Cite this