AddAlleleLinkages: Identify and Utilize Linked Alleles for Estimating Genotype Priors

Description

AddAlleleLinkages finds alleles, if any, in linkage disequilibrium with each allele in a RADdata object, and computes a correlation coefficient representing the strength of the linkage. AddGenotypePriorProb_LD adds a second set of prior genotype probabilities to a RADdata object based on the genotype posterior probabilities at linked alleles.

Usage

AddAlleleLinkages(object, ...)
# S3 method for RADdata
AddAlleleLinkages(object, type, linkageDist, minCorr,
                  excludeTaxa = character(0), …)
                  
AddGenotypePriorProb_LD(object, ...)
# S3 method for RADdata
AddGenotypePriorProb_LD(object, type, …)

Arguments

object

A RADdata object with genomic alignment data stored in object$locTable$Chr and object$locTable$pos.

type

A character string, either “mapping”, “hwe”, or “popstruct”, to indicate the type of population being analyzed.

linkageDist

A number, indicating the distance in basepairs from a locus within which to search for linked alleles.

minCorr

A number ranging from zero to one indicating the minimum correlation needed for an allele to be used for genotype prediction at another allele.

excludeTaxa

A character vector listing taxa to be excluded from correlation estimates.

…

Additional arguments (none implemented).

Value

A RADdata object is returned. For AddAlleleLinkages, it has a new slot called $alleleLinkages that is a list, with one item in the list for each allele in the dataset. Each item is a data frame, with indices for linked alleles in the first column, and correlation coefficients in the second column.

For AddGenotypePriorProb_LD, the object has a new slot called $priorProbLD. This is a list much like $posteriorProb, with one list item per inheritance mode, and each item being an array with allele copy number in the first dimension, taxa in the second dimension, and alleles in the third dimension. Values indicate genotype prior probabilities based on linked alleles alone.

Details

These functions are primarily designed to be used internally by the pipeline functions.

AddAlleleLinkages obtains genotypic values using GetWeightedMeanGenotypes, then regresses those values for a given allele against those values for nearby alleles to obtain correlation coefficients. For the population structure model, the genotypic values for an allele are first regressed on the PC axes from object$PCA, then the residuals are regressed on the genotypic values at nearby alleles to obtain correlation coefficients.

AddGenotypePriorProb_LD makes a second set of priors in addition to object$priorProb. This second set of priors has one value per inheritance mode per taxon per allele per possible allele copy number. Where $K$ is the ploidy, with allele copy number $c$ ranging from 0 to $K$, $i$ is an allele, $j$ is a linked allele at a different locus out of $J$ total alleles linked to $i$, $r_{ij}$ is the correlation coefficient between those alleles, $t$ is a taxon, $post_{cjt}$ is the posterior probability of a given allele copy number for a given allele in a given taxon, and $prior_{cit}$ is the prior probability for a given allele copy number for a given allele in a given taxon based on linkage alone:

$$prior_{cit} = \frac{\prod_{j = 1}^J{post_{cjt} * r_{ij} + (1 - r_{ij})/(K + 1)}}{\sum_{c = 0}^K{\prod_{j = 1}^J{post_{cjt} * r_{ij} + (1 - r_{ij})/(K + 1)}}}$$

For mapping populations, AddGenotypePriorProb_LD uses the above formula when each allele only has two possible genotypes (i.e. test-cross segregation). When more genotypes are possible, AddGenotypePriorProb_LD instead estimates prior probabilities as fitted values when the posterior probabilities for a given allele are regressed on the posterior probabilities for a linked allele. This allows loci with different segregation patterns to be informative for predicting genotypes, and for cases where two alleles are in phase for some but not all parental copies.

Examples

Run this code

# NOT RUN {
# load example dataset
data(Msi01genes)

# Run non-LD pop structure pipeline
Msi01genes <- IteratePopStruct(Msi01genes, tol = 0.01, nPcsInit = 10)

# Add linkages
Msi01genes <- AddAlleleLinkages(Msi01genes, "popstruct", 1e4, 0.05)
# Get new posterior probabilities based on those linkages
Msi01genes <- AddGenotypePriorProb_LD(Msi01genes, "popstruct")

# Preview results
Msi01genes$priorProbLD[[1]][,1:10,1:10]
# }