ia: Index of Association

Description

Calculate the Index of Association and Standardized Index of Association. Obtain p-values from one-sided permutation tests.

Usage

ia(pop, sample = 0, method = 1, quiet = FALSE, missing = "ignore",
  hist = TRUE, valuereturn = FALSE)

Arguments

pop

a genind object OR any fstat, structure, genetix, genpop, or genalex formatted files.

sample

an integer indicating the number of permutations desired (eg 999).

method

an integer from 1 to 4 indicating the sampling method desired. see shufflepop for details.

quiet

Should the function print anything to the screen while it is performing calculations?

TRUE prints nothing.

FALSE (defualt) will print the population name and progress bar.

missing

a character string. see missingno for details.

hist

logical if TRUE, a histogram will be printed for each population if there is sampling.

valuereturn

logical if TRUE, the index values from the reshuffled data is returned. If FALSE (default), the index is returned with associated p-values in a 4 element numeric vector.

Value

If no sampling has occurred:{ A named number vector of length 2 giving the Index of Association, "Ia"; and the Standardized Index of Association, "rbarD" } If there is sampling:{ A a named number vector of length 4 with the following values:
- Ia -
{numeric. The index of association.}
p.Ia -A number indicating the p-value resulting from a one-sided permutation test based on the number of samples indicated in the original call.
rbarD -numeric. The standardized index of association.
p.rD -A factor indicating the p-value resulting from a one-sided permutation test based on the number of samples indicated in the original call.
}

subsection

If there is sampling and valureturn = TRUE

itemize

index

item

samples

Details

The index of association was originally developed by A.H.D. Brown analyzing population structure of wheat (Brown, 1980). It has been widely used as a tool to detect clonal reproduction within populations . Populations whose members are undergoing sexual reproduction, whether it be selfing or out-crossing, will produce gametes via meiosis, and thus have a chance to shuffle alleles in the next generation. Populations whose members are undergoing clonal reproduction, however, generally do so via mitosis. This means that the most likely mechanism for a change in genotype is via mutation. The rate of mutation varies from species to species, but it is rarely sufficiently high to approximate a random shuffling of alleles. The index of association is a calculation based on the ratio of the variance of the raw number of differences between individuals and the sum of those variances over each locus . You can also think of it as the observed variance over the expected variance. If they are the same, then the index is zero after subtracting one (from Maynard-Smith, 1993): $$I_A = \frac{V_O}{V_E}-1$$ Since the distance is more or less a binary distance, any sort of marker can be used for this analysis. In the calculation, phase is not considered, and any difference increases the distance between two individuals. Remember that each column represents a different allele and that each entry in the table represents the fraction of the genotype made up by that allele at that locus. Notice also that the sum of the rows all equal one. Poppr uses this to calculate distances by simply taking the sum of the absolute values of the differences between rows.

The calculation for the distance between two individuals at a single locus with a allelic states and a ploidy of k is as follows (except for Presence/Absence data): $$d = \displaystyle \frac{k}{2}\sum_{i=1}^{a} \mid A_{i} - B_{i}\mid$$ To find the total number of differences between two individuals over all loci, you just take d over m loci, a value we'll call D:

$$D = \displaystyle \sum_{i=1}^{m} d_i$$

These values are calculated over all possible combinations of individuals in the data set, ${n \choose 2}$ after which you end up with ${n \choose 2}\cdot{}m$ values of d and ${n \choose 2}$ values of D. Calculating the observed variances is fairly straightforward (modified from Agapow and Burt, 2001):

$$V_O = \frac{\displaystyle \sum_{i=1}^{n \choose 2} D_{i}^2 - \frac{(\displaystyle\sum_{i=1}^{n \choose 2} D_{i})^2}{{n \choose 2}}}{{n \choose 2}}$$

Calculating the expected variance is the sum of each of the variances of the individual loci. The calculation at a single locus, j is the same as the previous equation, substituting values of D for d:

$$var_j = \frac{\displaystyle \sum_{i=1}^{n \choose 2} d_{i}^2 - \frac{(\displaystyle\sum_{i=1}^{n \choose 2} d_i)^2}{{n \choose 2}}}{{n \choose 2}}$$

The expected variance is then the sum of all the variances over all m loci:

$$V_E = \displaystyle \sum_{j=1}^{m} var_j$$

Agapow and Burt showed that $I_A$ increases steadily with the number of loci, so they came up with an approximation that is widely used, $\bar r_d$. For the derivation, see the manual for multilocus.

$$\bar r_d = \frac{V_O - V_E} {2\displaystyle \sum_{j=1}^{m}\displaystyle \sum_{k \neq j}^{m}\sqrt{var_j\cdot{}var_k}}$$

References

Paul-Michael Agapow and Austin Burt. Indices of multilocus linkage disequilibrium. Molecular Ecology Notes, 1(1-2):101-102, 2001

A.H.D. Brown, M.W. Feldman, and E. Nevo. Multilocus structure of natural populations of Hordeum spontaneum. Genetics, 96(2):523-536, 1980.

J M Smith, N H Smith, M O'Rourke, and B G Spratt. How clonal are bacteria? Proceedings of the National Academy of Sciences, 90(10):4384-4388, 1993.

Examples

Run this code

data(nancycats)
ia(nancycats)

# Get the indices back and plot them using base R graphics:
nansamp <- ia(nancycats, sample = 999, valuereturn = TRUE)
layout(matrix(c(1,1,2,2,), 2, 2, byrow = TRUE))
hist(nansamp$samples$Ia); abline(v = nansamp$index[1])
hist(nansamp$samples$rbarD); abline(v = nansamp$index[3])

# Get the index for each population.
lapply(seppop(nancycats), ia)
# With sampling
lapply(seppop(nancycats), ia, sample=999)

Run the code above in your browser using DataLab