PERMANOVA is a form of linear modelling that partitions variation in a triangular matrix of inter-sample proximities obtained from variable-by-sample data.
Uses permutations to estimate the probability of observed group differences in SNP composition given a null hypothesis of no differences between groups (Anderson 2001).
Proximity between samples can be any type of distance, similarity or dissimilarity.
Original acronym NPMANOVA
(Non-Parametric MANOVA) replaced with PERMANOVA (Anderson 2004, 2017).
Univariate ANOVA captures differences in mean and variance referred to as location and dispersion in PERMANOVA's multivariate context (Anderson & Walsh 2013, Warton, Wright and Wang 2012).
To attribute group differences to location (position of sample groups) and/or dispersion (spread of sample groups), PERMANOVA must be combined with PERMDISP as implemented through smart_permdisp
.
Function smart_permanova
uses adonis
to fit formula snp_eucli ~ sample_group
, where snp_eucli
is the sample-by-sample triangular matrix in Principal Coordinate Analysis (Gower 1966) space.
Current version restricted to one-way designs (one categorical predictor) though PERMANOVA can handle >1 crossed and/or nested factors (Anderson 2001) and continuous predictors (McArdle & Anderson 2001).
If >2 sample groups tested, pairwise = TRUE
allows pairwise testing and correction for multiple testing by holm (Holm)
[default], hochberg (Hochberg)
, hommel (Hommel)
, bonferroni (Bonferroni)
, BY (Benjamini-Yekuieli)
, BH (Benjamini-Hochberg)
or fdr (False Discovery Rate)
.
For big data, Dist
builds sample-by-sample triangular matrix much faster than vegdist
.
Dist
computes proximities euclidean
, manhattan
, canberra1
, canberra2
, minimum
, maximum
, minkowski
, bhattacharyya
, hellinger
, kullback_leibler
and jensen_shannon
. vegdist
computes manhattan
, euclidean
, canberra
, clark
, bray
, kulczynski
, jaccard
, gower
, altGower
, morisita
, horn
, mountford
, raup
, binomial
, chao
, cao
and mahalanobis
.
Euclidean distance required for SMARTPCA scaling.
sample_remove
should include both samples removed from PCA and ancient samples projected onto PCA space (if any).
Data read from working directory with SNPs as rows and samples as columns.
Two alternative formats: (1) text file of SNPs by samples (file extension and column separators recognized automatically) read using fread
; or (2) duet of EIGENSTRAT
files (see https://reich.hms.harvard.edu/software) using vroom_fwf
, including a genotype file of SNPs by samples (*.geno
), and a sample file (*.ind
) containing three vectors assigning individual samples to unique user-predefined groups (populations), sexes (or other user-defined descriptor) and alphanumeric identifiers.
For EIGENSTRAT
, vector sample_group
assigns samples to groups retrievable from column 3 of file *.ind
.
SNPs with zero variance removed prior to SVD to optimize computation time and avoid undefined values if scaling = "sd"
or "drift"
.
Users can select subsets of samples or SNPs by introducing a vector including column numbers for samples (sample_remove
) and/or row numbers for SNPs (snp_remove
) to be removed from computations.
Function stops if the final number of SNPs is 1 or 2.
EIGENSOFT
was conceived for the analysis of human genes and its SMARTPCA suite so accepts 22 (autosomal) chromosomes by default.
If >22 chromosomes are provided and the internal parameter numchrom
is not set to the target number chromosomes of interest, SMARTPCA automatically subsets chromosomes 1 to 22.
In contrast, smart_permanova
accepts any number of autosomes with or without the sex chromosomes from an EIGENSTRAT
file.