psex: Probability of encountering a genotype more than once by chance

Description

Probability of encountering a genotype more than once by chance

Usage

psex(gid, pop = NULL, by_pop = TRUE, freq = NULL, G = NULL,
  method = c("single", "multiple"), ...)

Arguments

gid

a genind or genclone object.

pop

either a formula to set the population factor from the strata slot or a vector specifying the population factor for each sample. Defaults to NULL.

by_pop

When this is TRUE (default), the calculation will be done by population.

freq

a vector or matrix of allele frequencies. This defaults to NULL, indicating that the frequencies will be determined via round-robin approach in rraf. If this matrix or vector is not prov

an integer specifying the number of observed genets. If NULL, this will be the number of original multilocus genotypes.

method

which method of calculating psex should be used? Using method = "single" (default) indicates that the calculation for psex should reflect the probability of encountering a second genotype. Using method = "multiple" gives the p

...

options passed to rare_allele_correction. The default is to correct allele frequencies to 1/n

Value

a vector of Psex for each sample.

Details

Psex is the probability of encountering a given genotype more than once by chance. The basic equation is $$p_{sex} = 1 - (1 - p_{gen})^{G})$$ where G is the number of multilocus genotypes. See pgen for its calculation. For a given value of alpha (e.g. alpha = 0.05), genotypes with psex < alpha can be thought of as a single genet whereas genotypes with psex > alpha do not have strong evidence that members belong to the same genet (Parks and Werth, 1993). When method = "multiple", the method from Arnaud-Haond et al. (1997) is used where the sum of the binomial density is taken: $$p_{sex} = \sum_{i = 1}^N {N \choose i} \left(p_{gen}\right)^i\left(1 - p_{gen}\right)^{N - i}$$ where N is the number of samples with the same genotype, i is the ith sample, and pgen is the value of pgen for that genotype. The function will automatically calculate the round-robin allele frequencies with rraf and G with nmll.

References

Arnaud-Haond, S., Duarte, C. M., Alberto, F., & Serrão, E. A. 2007. Standardizing methods to address clonality in population studies. Molecular Ecology, 16(24), 5115-5139.

Parks, J. C., & Werth, C. R. 1993. A study of spatial features of clones in a population of bracken fern, Pteridium aquilinum (Dennstaedtiaceae). American Journal of Botany, 537-544.

Examples

Run this code

data(Pram)
Pram_psex <- psex(Pram, by_pop = FALSE)
plot(Pram_psex, log = "y", col = ifelse(Pram_psex > 0.05, "red", "blue"))
abline(h = 0.05, lty = 2)
# With multiple encounters
Pram_psex <- psex(Pram, by_pop = FALSE, method = "multiple")
plot(Pram_psex, log = "y", col = ifelse(Pram_psex > 0.05, "red", "blue"))
abline(h = 0.05, lty = 2)

# This can be also done assuming populations structure
Pram_psex <- psex(Pram, by_pop = TRUE)
plot(Pram_psex, log = "y", col = ifelse(Pram_psex > 0.05, "red", "blue"))
abline(h = 0.05, lty = 2)

# The above, but correcting zero-value alleles by 1/(2*rrmlg) with no 
# population structure assumed
# See the documentation for rare_allele_correction for details.
Pram_psex2 <- psex(Pram, by_pop = FALSE, d = "rrmlg", mul = 1/2)
plot(Pram_psex2, log = "y", col = ifelse(Pram_psex2 > 0.05, "red", "blue"))
abline(h = 0.05, lty = 2)

## An example of supplying previously calculated frequencies and G
# From Parks and Werth, 1993, using the first three genotypes.

# The row names indicate the number of samples found with that genotype
x <- "
 Hk Lap Mdh2 Pgm1 Pgm2 X6Pgd2
54 12 12 12 23 22 11
36 22 22 11 22 33 11
10 23 22 11 33 13 13"

# Since we aren't representing the whole data set here, we are defining the
# allele frequencies before the analysis.
afreq <- c(Hk.1 = 0.167, Hk.2 = 0.795, Hk.3 = 0.038, 
           Lap.1 = 0.190, Lap.2 = 0.798, Lap.3 = 0.012,
           Mdh2.0 = 0.011, Mdh2.1 = 0.967, Mdh2.2 = 0.022,
           Pgm1.2 = 0.279, Pgm1.3 = 0.529, Pgm1.4 = 0.162, Pgm1.5 = 0.029,
           Pgm2.1 = 0.128, Pgm2.2 = 0.385, Pgm2.3 = 0.487,
           X6Pgd2.1 = 0.526, X6Pgd2.2 = 0.051, X6Pgd2.3 = 0.423)

xtab <- read.table(text = x, header = TRUE, row.names = 1)

# Here we are expanding the number of samples to their observed values.
# Since we have already defined the allele frequencies, this step is actually
# not necessary. 
all_samples <- rep(rownames(xtab), as.integer(rownames(xtab)))
xgid        <- df2genind(xtab[all_samples, ], ncode = 1)

freqs <- afreq[colnames(tab(xgid))] # only used alleles in the sample
pSex  <- psex(xgid, by_pop = FALSE, freq = freqs, G = 45)

# Note, pgen returns log values for each locus, here we take the sum across
# all loci and take the exponent to give us the value of pgen for each sample
pGen <- exp(rowSums(pgen(xgid, by_pop = FALSE, freq = freqs)))

res  <- matrix(c(unique(pGen), unique(pSex)), ncol = 2)
colnames(res) <- c("Pgen", "Psex")
res <- cbind(xtab, nRamet = rownames(xtab), round(res, 5))
rownames(res) <- 1:3
res # Compare to the first three rows of Table 2 in Parks & Werth, 1993

Run the code above in your browser using DataLab