pgen: Genotype Probability

Description

Genotype Probability

Usage

pgen(gid, pop = NULL, by_pop = TRUE, log = TRUE, freq = NULL)

Arguments

gid

a genind or genclone object.

pop

either a formula to set the population factor from the strata slot or a vector specifying the population factor for each sample. Defaults to NULL.

by_pop

When this is TRUE (default), the calculation will be done by population.

log

a logical if log =TRUE (default), the values returned will be log(Pgen). If log = FALSE, the values returned will be Pgen.

freq

a vector or matrix of allele frequencies. This defaults to NULL, indicating that the frequencies will be determined via round-robin approach in rraf. If by_pop = TRUE, and t

Value

A vector containing Pgen values per locus for each genotype in the object.

Details

Pgen is the probability of a given genotype occuring in a population assuming HWE. Thus, the value for diploids is $$P_{gen} = \left(\prod_{i=1}^m p_i\right)2^h$$ where $p_i$ are the allele frequencies and h is the count of the number of heterozygous sites in the sample (Arnaud-Haond et al. 2007; Parks and Werth, 1993). The allele frequencies, by default, are calculated using a round-robin approach where allele frequencies at a particular locus are calculated on the clone-censored genotypes without that locus. To avoid issues with numerical precision of small numbers, this function calculates pgen per locus by adding up log-transformed values of allele frequencies. These can easily be transformed to return the true value (see examples).

References

Arnaud-Haond, S., Duarte, C. M., Alberto, F., & Serrão, E. A. 2007. Standardizing methods to address clonality in population studies. Molecular Ecology, 16(24), 5115-5139.

Parks, J. C., & Werth, C. R. 1993. A study of spatial features of clones in a population of bracken fern, Pteridium aquilinum (Dennstaedtiaceae). American Journal of Botany, 537-544.

Examples

Run this code

data(Pram)
head(pgen(Pram, log = FALSE))

# You can get the Pgen values over all loci by summing over the logged results:
exp(rowSums(pgen(Pram, log = TRUE, na.rm = TRUE)))

# You can also take the product of the non-logged results:
apply(pgen(Pram, log = FALSE), 1, prod, na.rm = TRUE)

## Dealing with zero-frequency allele correction
# By default, allele frequencies are calculated with rraf with 
# correction = TRUE. This is normally benign when analyzing large populations,
# but it can have a great effect on small populations. Here's a way to supply
# your own correction.

# First, calculate round robin allele frequencies by population with no 
# correction. There are many zero values.
(my_rraf <- rraf(Pram, by_pop = TRUE, correction = FALSE))

# When you run pgen with these zero value allele frequencies, the
# probabilities of some genotypes crash to zero.
head(pgen(Pram, log = FALSE, freq = my_rraf))

# One solution: set the allele frequencies to 1/[samples in data]:
my_rraf[my_rraf == 0] <- 1/nInd(Pram)

# Now we don't have genotype probabilites of zero.
head(pgen(Pram, log = FALSE, freq = my_rraf))

Run the code above in your browser using DataLab