AddGenotypeLikelihood: Estimate Genotype Likelihoods in a RADdata object

Description

For each possible allele copy number across each possible ploidy in each taxon, AddGenotypeLikelihood estimates the probability of observing the distribution of read counts that are recorded for that taxon and locus. AddDepthSamplingPermutations is called by AddGenotypeLikelihood the first time it is run, so that part of the likelihood calcluation is stored in the RADdata object and doesn't need to be re-run on each iteration of the pipeline functions.

Usage

AddGenotypeLikelihood(object, ...)
# S3 method for RADdata
AddGenotypeLikelihood(object, overdispersion = 9, …)
AddDepthSamplingPermutations(object, ...)

Arguments

object

A "RADdata" object.

overdispersion

An overdispersion parameter. Higher values will cause the expected read depth distribution to more resemble the binomial distribution. Lower values indicate more overdispersion, i.e. sample-to-sample variance in the probability of observing reads from a given allele.

…

Other arguments; none are currently used.

Value

A "RADdata" object identical to that passed to the function, but with genotype likelihoods stored in object$genotypeLikelihood. This item is a list, with one item for each possible ploidy, ignoring differences between autopolyploids and allopolyploids. For each ploidy there is a three-dimensional array with number of allele copies in the first dimension, taxa in the second dimension, and alleles in the third dimension.

Details

If allele frequencies are not already recorded in object, they will be added using AddAlleleFreqHWE. Allele frequencies are then used for estimating the probability of sampling an allele from a genotype due to sample contamination. Given a known genotype with $x$ copies of allele $i$, ploidy $k$, allele frequency $p_i$ in the population used for making sequencing libraries, and contamination rate $c$, the probabiity of sampling a read $r_i$ from that locus corresponding to that allele is

$$P(r_i | x) = \frac{x}{k} * (1 - c) + p_i * c$$

To estimate the genotype likelihood, where $nr_i$ is the number of reads corresponding to allele $i$ for a given taxon and locus and $nr_j$ is the number of reads corresponding to all other alleles for that taxon and locus:

$$P(nr_i, nr_j | x) = {{nr_i + nr_j}\choose{nr_i}} * \frac{B[P(r_i | x) * d + nr_i, [1 - P(r_i | x)] * d + nr_j]]}{B[P(r_i | x) * d, [1 - P(r_i | x)] * d]}$$

where

$${{nr_i + nr_j}\choose{nr_i}} = \frac{(nr_i + nr_j)!}{nr_i! * nr_j!}$$

B is the beta function, and $d$ is the overdispersion parameter set by overdispersion. ${{nr_i + nr_j}\choose{nr_i}}$ is calculated by AddDepthSamplingPermutations.

Examples

Run this code

# NOT RUN {
# load example dataset and add allele frequency
data(exampleRAD)
exampleRAD <- AddAlleleFreqHWE(exampleRAD)

# estimate genotype likelihoods
exampleRAD <- AddGenotypeLikelihood(exampleRAD)

# inspect the results
# the first ten individuals and first two alleles, assuming diploidy
exampleRAD$alleleDepth[1:10,1:2]
exampleRAD$genotypeLikelihood[[1]][,1:10,1:2]
# assuming tetraploidy
exampleRAD$genotypeLikelihood[[2]][,1:10,1:2]
# }