tidypopgen

The goal of tidypopgen is to provide a tidy grammar of population genetics, facilitating the manipulation and analysis of biallelic single nucleotide polymorphisms (SNPs). tidypopgen scales to very large genetic datasets by storing genotypes on disk, and performing operations on them in chunks, without ever loading all data in memory. The full functionalities of the package are described in Carter et al. (2025). Please cite this paper if you use tidypopgen in your research.

Installation

You can install the release version of tidypopgen from CRAN:

install.packages("tidypopgen")

You can install the latest development version directly from r-universe (recommended):

install.packages('tidypopgen', repos = c('https://evolecolgroup.r-universe.dev',
                 'https://cloud.r-project.org'))

Alternatively, you can install tidypopgenusing devtools (but you might need to set up your development environment, which can be a bit more complex):

install.packages("devtools")
devtools::install_github("EvolEcolGroup/tidypopgen")

Examples

There are several vignettes designed to teach you how to use tidypopgen. A short introduction to the package is available in the 'introduction' vignette. A more detailed and technical description of the grammar of population genetics, explaining how to manipulate individuals and loci, is available in the 'grammar' vignette.

The 'quality control' vignette illustrates the tidypopgen functions that help running a full QC of a dataset before analysis.

The 'population genetic analysis' vignette provides a fully annotated example of how to run various population genetics analyses with tidypopgen.

We also provide a 'PLINK cheatsheet' aimed at translating common tasks performed in PLINK into tidypopgen commands.

There is also an article showing how manage aDNA sample that have been coded as pseudohaploids, including how to project ancient DNA data onto a PCA fitted to modern data and prepare data for admixtools: 'aDNA pseudohaploids' article.

Finally, tidypopgen is fast and can handle large datasets easily. See a 'benchmark' article using the HGDP, a dataset of over 1000 individuals typed for 650k SNPs. We can load the data, clean it, run imputation, PCA and pairwise Fst among 51 populations in less than 20 seconds on a powerful desktop (and less than a minute on a laptop).

When something does not work

If something does not work, check the issues on GitHub to see whether the problem has already been reported. If not, feel free to create an new issue. Please make sure you have updated to the latest version of tidypopgen on r-universe/Github, as well as updating all other packages on your system, and provide a reproducible example for the developers to investigate the problem. Ideally, try to create a minimalistic dataset that reproduces the error, as it will be much easier (and thus faster!) for the developers to track down the problem.

Functions in tidypopgen (0.4.3)

tidypopgen

Installation

Examples

When something does not work

Copy Link

Version

Install

Monthly Downloads

Version

License

Issues

Pull Requests

Stars

Forks

Repository

Homepage

Maintainer

Last Published

Functions in tidypopgen (0.4.3)