Learn R Programming

tidypopgen

The goal of tidypopgen is to provide a tidy grammar of population genetics, facilitating the manipulation and analysis of biallelic single nucleotide polymorphisms (SNPs). tidypopgen scales to very large genetic datasets by storing genotypes on disk, and performing operations on them in chunks, without ever loading all data in memory. The full functionalities of the package are described in Carter et al. (2025). Please cite this paper if you use tidypopgen in your research.

Installation

You can install the release version of tidypopgen from CRAN:

install.packages("tidypopgen")

You can install the latest development version directly from r-universe (recommended):

install.packages('tidypopgen', repos = c('https://evolecolgroup.r-universe.dev',
                 'https://cloud.r-project.org'))

Alternatively, you can install tidypopgenusing devtools (but you might need to set up your development environment, which can be a bit more complex):

install.packages("devtools")
devtools::install_github("EvolEcolGroup/tidypopgen")

Examples

There are several vignettes designed to teach you how to use tidypopgen. A short introduction to the package is available in the 'introduction' vignette. A more detailed and technical description of the grammar of population genetics, explaining how to manipulate individuals and loci, is available in the 'grammar' vignette.

The 'quality control' vignette illustrates the tidypopgen functions that help running a full QC of a dataset before analysis.

The 'population genetic analysis' vignette provides a fully annotated example of how to run various population genetics analyses with tidypopgen.

We also provide a 'PLINK cheatsheet' aimed at translating common tasks performed in PLINK into tidypopgen commands.

There is also an article showing how manage aDNA sample that have been coded as pseudohaploids, including how to project ancient DNA data onto a PCA fitted to modern data and prepare data for admixtools: 'aDNA pseudohaploids' article.

Finally, tidypopgen is fast and can handle large datasets easily. See a 'benchmark' article using the HGDP, a dataset of over 1000 individuals typed for 650k SNPs. We can load the data, clean it, run imputation, PCA and pairwise Fst among 51 populations in less than 20 seconds on a powerful desktop (and less than a minute on a laptop).

When something does not work

If something does not work, check the issues on GitHub to see whether the problem has already been reported. If not, feel free to create an new issue. Please make sure you have updated to the latest version of tidypopgen on r-universe/Github, as well as updating all other packages on your system, and provide a reproducible example for the developers to investigate the problem. Ideally, try to create a minimalistic dataset that reproduces the error, as it will be much easier (and thus faster!) for the developers to track down the problem.

Copy Link

Version

Install

install.packages('tidypopgen')

Monthly Downloads

464

Version

0.4.3

License

GPL (>= 3)

Issues

Pull Requests

Stars

Forks

Maintainer

Andrea Manica

Last Published

January 24th, 2026

Functions in tidypopgen (0.4.3)

c.gt_admix

Combine method for gt_admix objects
gt_as_geno_lea

Convert a gentibble to a .geno file for sNMF from the LEA package
get_p_matrix

Return a single P matrix from a gt_admix object
gen_tibble

Constructor for a gen_tibble
gt_as_hierfstat

Convert a gen_tibble to a data.frame compatible with hierfstat
gt_impute_xgboost

Imputation based XGBoost
gt_load

Load a gen_tibble
gt_as_genlight

Convert a gen_tibble to a genlight object from adegenet
gt_as_genind

Convert a gen_tibble to a genind object from adegenet
gt_admix_reorder_q

Reorder the q matrices based on the grouping variable
gt_has_imputed

Checks if a gen_tibble has been imputed
gt_impute_simple

Simple imputation based on allele frequencies
gt_admixture

Run ADMIXTURE from R
gt_pca

Principal Component Analysis for gen_tibble objects
gt_cluster_pca_best_k

Find the best number of clusters based on principal components
gt_order_loci

Order the loci table of a gen_tibble
gt_cluster_pca

Run K-clustering on principal components
gt_pca_randomSVD

PCA for gen_tibble objects by randomized partial SVD
get_q_matrix

Return a single Q matrix from a gt_admix object
gt_add_sf

Add an simple feature geometry to a gen_tibble
gt_as_vcf

Convert a gen_tibble to a VCF
gt_as_plink

Export a gen_tibble object to PLINK bed format
filter_high_relatedness

Filter individuals based on a relationship threshold
autoplot_gt_pca

Autoplots for gt_pca objects
gt_set_imputed

Sets a gen_tibble to use imputed data
autoplot_gt_pcadapt

Autoplots for gt_pcadapt objects
gt_snmf

Run SNMF from R in tidypopgen
gt_pcadapt

pcadapt analysis on a gen_tibble object
loci_missingness

Estimate missingness at each locus
gt_dapc

Discriminant Analysis of Principal Components for gen_tibble
find_duplicated_loci

Find duplicates in the loci table
loci_names

Get the names of loci in a gen_tibble
indiv_het_obs

Estimate individual observed heterozygosity
load_example_gt

Load example gen_tibble
gt_pca_partialSVD

PCA for gen_tibble objects by partial SVD
loci_hwe

Test Hardy-Weinberg equilibrium at each locus
is_loci_table_ordered

Test if the loci table is ordered
indiv_inbreeding

Individual inbreeding coefficient
gt_extract_f2

Compute and store blocked f2 statistics for ADMIXTOOLS 2
loci_pi

Estimate nucleotide diversity (pi) at each locus
gt_pca_autoSVD

PCA controlling for LD for gen_tibble objects
gt_pseudohaploid

Set the ploidy of a gen_tibble to include pseudohaploids
indiv_ploidy

Return individual ploidy
indiv_missingness

Estimate individual missingness
pairwise_allele_sharing

Compute the Pairwise Allele Sharing Matrix for a gen_tibble object
gt_save

Save a gen_tibble
mutate.grouped_gen_tbl

A mutate method for grouped gen_tibble objects
loci_transitions

Find transitions
nwise_pop_pbs

Compute the Population Branch Statistics for each combination of populations
gt_get_file_names

Get the names of files storing the genotypes of a gen_tibble
gt_uses_imputed

Checks if a gen_tibble uses imputed data
predict.gt_pca

Predict scores of a PCA
q_matrix

Convert a standard matrix to a q_matrix object
loci_alt_freq

Estimate allele frequencies at each locus
gt_update_backingfile

Update the backing matrix
gt_from_genlight

Convert a genlight object from adegenet to a gen_tibble
loci_chromosomes

Get the chromosomes of loci in a gen_tibble
pop_het_exp

Compute the population expected heterozygosity
pop_global_stats

Compute basic population global statistics
pairwise_grm

Compute the Genomic Relationship Matrix for a gen_tibble object
%>%

Pipe operator
pairwise_pop_fst

Compute pairwise population Fst
rbind.gen_tbl

Combine two gen_tibbles
select_loci_if

The select_if verb for loci
pop_tajimas_d

Estimate Tajima's D for the whole genome
pop_het_obs

Compute the population observed heterozygosity
show_genotypes

Show the genotypes of a gen_tibble
rbind_dry_run

Generate a report of what would happen to each SNP in a merge
scale_fill_distruct

Scale constructor using the distruct colours
summary.rbind_report

Print a summary of a merge report
tidy.q_matrix

Tidy a Q matrix
tidy.gt_dapc

Tidy a gt_dapc object
snp_ibs

Compute the Identity by State Matrix for a bigSNP object
snp_allele_sharing

Compute the Pairwise Allele Sharing Matrix for a bigSNP object
select_loci

The select verb for loci
loci_ld_clump

Clump loci based on a Linkage Disequilibrium threshold
mutate.gen_tbl

A mutate method for gen_tibble objects
qc_report_indiv

Create a Quality Control report for individuals
loci_transversions

Find transversions
summary.gt_admix

Summary method for gt_admix objects
snp_king

Compute the KING-robust Matrix for a bigSNP object
tidy.gt_pca

Tidy a gt_pca object
qc_report_loci

Create a Quality Control report for loci
pairwise_ibs

Compute the Identity by State Matrix for a gen_tibble object
pairwise_king

Compute the KING-robust Matrix for a gen_tibble object
pop_fis

Compute population specific FIS
windows_pairwise_pop_fst

Compute pairwise Fst for a sliding window
theme_distruct

A theme to match the output of distruct
windows_pop_tajimas_d

Compute Tajima's D for a sliding window
windows_indiv_roh

Detect runs of homozygosity using a sliding-window approach
pop_fst

Compute population specific Fst
tidypopgen-package

tidypopgen: Tidy Population Genetics
windows_nwise_pop_pbs

Compute the Population Branch Statistics over a sliding window
reexports

Objects exported from other packages
show_loci

Show the loci information of a gen_tibble
read_q_files

Read and structure .Q files or existing matrices as q_matrix or gt_admix objects.
show_ploidy

Show the ploidy information of a gen_tibble
windows_stats_generic

Estimate window statistics from per locus estimates
augment_q_matrix

Augment data with information from a q_matrix object
autoplot_gt_admix

Autoplots for gt_admix objects
autoplot.qc_report_loci

Autoplots for qc_report_loci objects
augment_loci

Augment the loci table with information from a analysis object
autoplot.qc_report_indiv

Autoplots for qc_report_indiv objects
arrange.grouped_gen_tbl

An arrange method for grouped gen_tibble objects
augment.gt_dapc

Augment data with information from a gt_dapc object
augment_loci_gt_pca

Augment the loci table with information from a gt_pca object
augment_gt_pca

Augment data with information from a gt_pca object
arrange.gen_tbl

An arrange method for gen_tibble objects
count_loci

Count the number of loci in a gen_tibble
cbind.gen_tbl

Combine a gen_tibble to a data.frame or tibble by column
$<-.gen_tbl

A $ method for gen_tibble objects
autoplot.gt_dapc

Autoplots for gt_dapc objects
autoplot.gt_cluster_pca

Autoplots for gt_cluster_pca objects
distruct_colours

Distruct colours
autoplot_q_matrix

Autoplots for q_matrix objects
filter.gen_tbl

Tidyverse methods for gt objects
filter.grouped_gen_tbl

A filter method for grouped gen_tibble objects