bruvo.dist: Bruvo's distance for microsatellites

Description

Calculate the average Bruvo's distance over all loci in a population.

Usage

bruvo.dist(pop, replen = 1, add = TRUE, loss = TRUE)

Arguments

pop

a genind object

replen

a vector of integers indicating the length of the nucleotide repeats for each microsatellite locus. E.g. a locus with a (CAT) repeat would have a repelen value of 3.

add

if TRUE, genotypes with zero values will be treated under the genome addition model presented in Bruvo et al. 2004. See the Note section for options.

loss

if TRUE, genotypes with zero values will be treated under the genome loss model presented in Bruvo et al. 2004. See the Note section for options.

Value

an object of class dist

Details

Ploidy is irrelevant with respect to calculation of Bruvo's distance. However, since it makes a comparison between all alleles at a locus, it only makes sense that the two loci need to have the same ploidy level. Unfortunately for polyploids, it's often difficult to fully separate distinct alleles at each locus, so you end up with genotypes that appear to have a lower ploidy level than the organism.

To help deal with these situations, Bruvo has suggested three methods for dealing with these differences in ploidy levels:

Infinite Model- The simplest way to deal with it is to count all missing alleles as infinitely large so that the distance between it and anything else is 1. Aside from this being computationally simple, it will tend toinflate distances between individuals.
Genome Addition Model- If it is suspected that the organism has gone through a recent genome expansion,the missing alleles will be replace with all possible combinations of the observed alleles in the shorter genotype. For example, if there is a genotype of [69, 70, 0, 0] where 0 is a missing allele, the possible combinations are: [69, 70, 69, 69], [69, 70, 69, 70], and [69, 70, 70, 70]. The resulting distances are then averaged over the number of comparisons.
Genome Loss Model- This is similar to the genome addition model, except that it assumes that there was a recent genome reduction event and usesthe observed values in the full genotype to fill the missing values in the short genotype. As with the Genome Addition Model, the resulting distances are averaged over the number of comparisons.
Combination Model- Combine and average the genome addition and loss models.

As mentioned above, the infinite model is biased, but it is not nearly as computationally intensive as either of the other models. The reason for this is that both of the addition and loss models requires replacement of alleles and recalculation of Bruvo's distance. The number of replacements required is equal to the multiset coefficient: $\left({n \choose k}\right) == {(n+k-1) \choose k}$ where n is the number of potential replacements and k is the number of alleles to be replaced. So, for the example given above, The genome addition model would require $\left({2 \choose 2}\right) = 3$ calculations of Bruvo's distance, whereas the genome loss model would require $\left({4 \choose 2}\right) = 10$ calculations.

To reduce the number of calculations and assumptions otherwise, Bruvo's distance will be calculated using the largest observed ploidy in pairwise comparisons. This means that when comparing [69,70,71,0] and [59,60,0,0], they will be treated as triploids.

References

Ruzica Bruvo, Nicolaas K. Michiels, Thomas G. D'Souza, and Hinrich Schulenburg. A simple method for the calculation of microsatellite genotype distances irrespective of ploidy level. Molecular Ecology, 13(7):2101-2106, 2004.

Examples

Run this code

# Please note that the data presented is assuming that the nancycat dataset
# contains all dinucleotide repeats, it most likely is not an accurate
# representation of the data.

# Load the nancycats dataset and construct the repeat vector.
data(nancycats)
ssr <- rep(2, 9)

# Analyze the 1st population in nancycats

bruvo.dist(popsub(nancycats, 1), replen = ssr)

# View each population as a heatmap.
sapply(nancycats$pop.names, function(x)
heatmap(as.matrix(bruvo.dist(popsub(nancycats, x), replen = ssr)), symm=TRUE))

Run the code above in your browser using DataLab