recode_polyploids: Recode polyploid microsatellite data for use in frequency based statistics.

Description

As the genind object requires ploidy to be consistent across loci, a workaround to importing polyploid data was to code missing alleles as "0" (for microsatellite data sets). The advantage of this is that users would be able to calculate Bruvo's distance, the index of association, and genotypic diversity statistics. The tradeoff was the fact that this broke all other analyses as they relied on allele frequencies and the missing alleles are treated as extra alleles. This function removes those alleles and returns a genclone or genind object where allele frequencies are coded based on the number of alleles observed at a single locus per individual. See the examples for more details.

Usage

recode_polyploids(poly, newploidy = poly@ploidy)

Arguments

poly

a genclone or genind object that has a ploidy of >2

newploidy

an integer. This gives the user the option to reset the ploidy of the data set. It's default is set to the ploidy of the incoming data set.

Value

a genclone or genind object.

Details

The genind object has two caveats that make it difficult to work with polyploid data sets:

ploidy must be constant throughout the data set
missing data is treated as "all-or-none"

In an ideal world, polyploid genotypes would be just as unambigouous as diploid or haploid genotypes. Unfortunately, the world we live in is far from ideal and a genotype of AB in a tetraploid organism could be AAAB, AABB, or ABBB. In order to get polyploid data in to adegenet or poppr, we must code all loci to have the same number of allelic states as the ploidy or largest observed heterozygote (if ploidy is unknown). The way to do this is to insert zeroes to pad the alleles. So, to import two genotypes of: rrrr{ NA 20 23 24 20 24 26 43 } they should be coded as: rrrr{ 0 20 23 24 20 24 26 43 } This zero is treated as an extra allele and is represented in the genind object as so: rrrrrr{ 0 20 23 24 26 43 0.25 0.25 0.25 0.25 0.00 0.00 0.00 0.25 0.00 0.25 0.25 0.25 }

A homozygote would have the 0 column at a value of 0.75. This function remidies this problem by removing the zero column and rescaling the allele frequencies to those observed. The above table would become: rrrrr{ 20 23 24 26 43 0.333 0.333 0.333 0.00 0.00 0.25 0.00 0.25 0.25 0.25 }

With this, the user is able to calculate frequency based statistics on the data set.

Examples

Run this code

data(Pinf)
iPinf <- recode_polyploids(Pinf)

# Obtaining basic summaries. Note the heterozygosity measures.
summary(Pinf)
summary(iPinf)

library("ape")

# Removing missing data.
Pinf <- missingno(Pinf, "geno", cutoff = 0)
iPinf <- recode_polyploids(Pinf)

# Calculating Rogers' distance.
rog <- rogers.dist(Pinf)
irog <- rogers.dist(iPinf)

# We will now plot neighbor joining trees. Note the decreased distance in the
# original data.
plot(nj(rog), type = "unrooted", show.tip.label = FALSE)
add.scale.bar(lcol = "red")
plot(nj(irog), type = "unrooted", show.tip.label = FALSE)
add.scale.bar(lcol = "red")

Run the code above in your browser using DataLab