poppr.amova: Perform Analysis of Molecular Variance (AMOVA) on genind or genclone objects.

Description

This function simplifies the process necessary for performing AMOVA in R. It gives user the choice of utilizing either the ade4 or the pegas implementation of AMOVA. See amova (ade4) and amova (pegas) for details on the specific implementation.

Usage

poppr.amova(x, hier = NULL, clonecorrect = FALSE, within = TRUE,
  dist = NULL, squared = TRUE, correction = "quasieuclid", sep = "_",
  filter = FALSE, threshold = 0, algorithm = "farthest_neighbor",
  missing = "loci", cutoff = 0.05, quiet = FALSE, method = c("ade4",
  "pegas"), nperm = 0)

Arguments

a '>genind or '>genclone object

hier

a hierarchical formula that defines your population hierarchy. (e.g.: ~Population/Subpopulation). See Details below.

clonecorrect

logical if TRUE, the data set will be clone corrected with respect to the lowest level of the hierarchy. The default is set to FALSE. See clonecorrect for details.

within

logical. When this is set to TRUE (Default), variance within individuals are calculated as well. If this is set to FALSE, The lowest level of the hierarchy will be the sample level. See Details below.

dist

an optional distance matrix calculated on your data. If this is set to NULL (default), the raw pairwise distances will be calculated via diss.dist.

squared

if a distance matrix is supplied, this indicates whether or not it represents squared distances.

correction

a character defining the correction method for non-euclidean distances. Options are quasieuclid (Default), lingoes, and cailliez. See Details below.

sep

Deprecated. As of poppr version 2, this argument serves no purpose.

filter

logical When set to TRUE, mlg.filter will be run to determine genotypes from the distance matrix. It defaults to FALSE. You can set the parameters with algorithm and threshold arguments. Note that this will not be performed when within = TRUE. Note that the threshold should be the number of allowable substitutions if you don't supply a distance matrix.

threshold

a number indicating the minimum distance two MLGs must be separated by to be considered different. Defaults to 0, which will reflect the original (naive) MLG definition.

algorithm

determines the type of clustering to be done.

"farthest_neighbor": (default) merges clusters based on the maximum distance between points in either cluster. This is the strictest of the three.
"nearest_neighbor": merges clusters based on the minimum distance between points in either cluster. This is the loosest of the three.
"average_neighbor": merges clusters based on the average distance between every pair of points between clusters.

missing

specify method of correcting for missing data utilizing options given in the function missingno. Default is "loci".

cutoff

specify the level at which missing data should be removed/modified. See missingno for details.

quiet

logical If FALSE (Default), messages regarding any corrections will be printed to the screen. If TRUE, no messages will be printed.

method

Which method for calculating AMOVA should be used? Choices refer to package implementations: "ade4" (default) or "pegas". See details for differences.

nperm

the number of permutations passed to the pegas implementation of amova.

Value

a list of class amova from the ade4 package. See amova for details.

Details

The poppr implementation of AMOVA is a very detailed wrapper for the ade4 implementation. The output is an amova class list that contains the results in the first four elements. The inputs are contained in the last three elements. The inputs required for the ade4 implementation are:

a distance matrix on all unique genotypes (haplotypes)
a data frame defining the hierarchy of the distance matrix
a genotype (haplotype) frequency table.

All of this data can be constructed from a '>genind object, but can be daunting for a novice R user. This function automates the entire process. Since there are many variables regarding genetic data, some points need to be highlighted:

On Hierarchies:

The hierarchy is defined by different population strata that separate your data hierarchically. These strata are defined in the strata slot of '>genind and '>genclone

objects. They are useful for defining the population factor for your data. See the function strata for details on how to properly define these strata.

On Within Individual Variance:

Heterozygosities within diploid genotypes are sources of variation from within individuals and can be quantified in AMOVA. When within = TRUE, poppr will split diploid genotypes into haplotypes and use those to calculate within-individual variance. No estimation of phase is made. This acts much like the default settings for AMOVA in the Arlequin software package. Within individual variance will not be calculated for haploid individuals or dominant markers.

On Euclidean Distances:

AMOVA, as defined by Excoffier et al., utilizes an absolute genetic distance measured in the number of differences between two samples across all loci. With the ade4 implementation of AMOVA (utilized by poppr), distances must be Euclidean (due to the nature of the calculations). Unfortunately, many genetic distance measures are not always euclidean and must be corrected for before being analyzed. Poppr automates this with three methods implemented in ade4, quasieuclid, lingoes, and cailliez. The correction of these distances should not adversely affect the outcome of the analysis.

On Filtering:

Filtering multilocus genotypes is performed by mlg.filter. This can necessarily only be done AMOVA tests that do not account for within-individual variance. The distance matrix used to calculate the amova is derived from using mlg.filter with the option stats = "distance", which reports the distance between multilocus genotype clusters. One useful way to utilize this feature is to correct for genotypes that have equivalent distance due to missing data. (See example below.)

On Methods:

Both ade4 and pegas have implementations of AMOVA, both of which are appropriately called "amova". The ade4 version is faster, but there have been questions raised as to the validity of the code utilized. The pegas version is slower, but careful measures have been implemented as to the accuracy of the method. It must be noted that there appears to be a bug regarding permuting analyses where within individual variance is accounted for (within = TRUE) in the pegas implementation. If you want to perform permutation analyses on the pegas implementation, you must set within = FALSE. In addition, while clone correction is implemented for both methods, filtering is only implemented for the ade4 version.

References

Excoffier, L., Smouse, P.E. and Quattro, J.M. (1992) Analysis of molecular variance inferred from metric distances among DNA haplotypes: application to human mitochondrial DNA restriction data. Genetics, 131, 479-491.

Examples

Run this code

# NOT RUN {
data(Aeut)
strata(Aeut) <- other(Aeut)$population_hierarchy[-1]
agc <- as.genclone(Aeut)
agc
amova.result <- poppr.amova(agc, ~Pop/Subpop)
amova.result
amova.test <- randtest(amova.result) # Test for significance
plot(amova.test)
amova.test

# }
# NOT RUN {
# You can get the same results with the pegas implementation
amova.pegas <- poppr.amova(agc, ~Pop/Subpop, method = "pegas")
amova.pegas
amova.pegas$varcomp/sum(amova.pegas$varcomp)

# Clone correction is possible
amova.cc.result <- poppr.amova(agc, ~Pop/Subpop, clonecorrect = TRUE)
amova.cc.result
amova.cc.test <- randtest(amova.cc.result)
plot(amova.cc.test)
amova.cc.test


# Example with filtering
data(monpop)
splitStrata(monpop) <- ~Tree/Year/Symptom
poppr.amova(monpop, ~Symptom/Year) # gets a warning of zero distances
poppr.amova(monpop, ~Symptom/Year, filter = TRUE, threshold = 0.1) # no warning

# Correcting incorrect alternate hypotheses with >2 heirarchical levels
# 
mon.amova <- poppr.amova(monpop, ~Symptom/Year/Tree)
mon.test  <- randtest(mon.amova)
mon.test # Note alter is less, greater, greater, less
alt <- c("less", "greater", "greater", "greater") # extend this to the number of levels
with(mon.test, as.krandtest(sim, obs, alter = alt, call = call, names = names))

# }

Run the code above in your browser using DataLab