pcadapt
performs principal component analysis and computes p-values to test for outliers. The test for
outliers is based on the correlations between genetic variation and the first K
principal components.
pcadapt
also handles Pool-seq data for which the statistical analysis is
performed on the genetic markers frequencies. Returns an object of class pcadapt
.pcadapt(input, K = 2, method = "mahalanobis", data.type = "genotype",
min.maf = 0.05, ploidy = 2, output.filename = "pcadapt_output",
clean.files = TRUE, transpose = FALSE)
pcadapt
."mahalanobis"
,
"communality"
, "euclidean"
and "componentwise"
.genotype
matrix (data.type="genotype"
),
or a matrix of allele frequencies (data.type="pool"
).0
and 0.45
specifying the threshold of minor allele frequencies above which p-values are computed.pcadapt
.p x n
where p
is the number of genetic markers and n
is the number of individuals.
If the data contains mx
is an object of class pcadapt
.method
, different test statistics can be used.mahalanobis
(default): the Mahalanobis distance is computed for each genetic marker using a robust
estimate of both mean and covariance matrix between the K
vectors of z-scores.
communality
: the communality statistic measures the proportion of variance explained by the first K
PCs.
euclidean
: the Euclidean distance between the K
z-scores of each genetic marker and the mean of the K
vectors of z-scores is computed.
componentwise
: returns a matrix of z-scores.
To compute p-values, test statistics (stat
) are divided by a genomic inflation factor (gif
) when method="mahalanobis","euclidean"
.
When method="communality"
, the test statistic is first multiplied by K
and divided by the percentage of variance explained by the first K
PCs
before accounting for genomic inflation factor. When using method="mahalanobis","communality","euclidean"
, the scaled statistics (chi2_stat
) should follow
a chi-squared distribution with K
degrees of freedom. When using method="componentwise"
, the z-scores should follow a chi-squared distribution with 1
degree of freedom. For Pool-seq data, pcadapt
provides p-values based on the Mahalanobis distance for each SNP.