A.mat: Additive relationship matrix

Description

Calculates the realized additive relationship matrix.

Usage

A.mat(X,min.MAF=NULL,max.missing=NULL,impute.method="mean",tol=0.02,
      n.core=1,shrink=NULL,return.imputed=FALSE)

Arguments

Matrix ($n \times m$) of unphased genotypes for $n$ lines and $m$ biallelic markers, coded as {-1,0,1}. Fractional (imputed) and missing values (NA) are allowed.

min.MAF

Minimum minor allele frequency. The A matrix is not sensitive to rare alleles, so by default only monomorphic markers are removed.

max.missing

Maximum proportion of missing data; default removes completely missing markers.

impute.method

There are two options. The default is "mean", which imputes with the mean for each marker. The "EM" option imputes with an EM algorithm (see details).

tol

Specifies the convergence criterion for the EM algorithm (see details).

n.core

Specifies the number of cores to use for parallel execution of the EM algorithm.

shrink

Default behavior (shrink=NULL) is to use shrinkage estimation (see details) when the # of markers is less than 5 times the # of lines (m < 5n). To use shrinkage when m > 5n, set shrink=TRUE. To turn off shrinkage when m < 5n, set shrink=FALSE.

return.imputed

When TRUE, the imputed marker matrix is returned.

Value

If return.imputed = FALSE, the $n \times n$ additive relationship matrix is returned. If return.imputed = TRUE, the function returns a list containing [object Object],[object Object]

Details

At high marker density, the relationship matrix is estimated as $A=W W'/c$, where $W_{ik} = X_{ik} + 1 - 2 p_k$ and $p_k$ is the frequency of the 1 allele at marker k. By using a normalization constant of $c = 2 \sum_k {p_k (1-p_k)}$, the mean of the diagonal elements is $1 + f$ (Endelman and Jannink 2012). The EM imputation algorithm is based on the multivariate normal distribution and was designed for use with GBS (genotyping-by-sequencing) markers, which tend to be high density but with lots of missing data. Details are given in Poland et al. (2012). The EM algorithm stops at iteration $t$ when the RMS error = $n^{-1} \|A_{t} - A_{t-1}\|_2$ < tol. At low marker density, shrinkage estimation can improve the estimate of the relationship matrix and the accuracy of GEBV for lines with low accuracy phenotypes (Endelman and Jannink 2012). The shrinkage intensity ranges from 0 (no shrinkage, same estimator as high density formula) to 1 (completely shrunk to $(1+f)I$). The shrinkage intensity is chosen to minimize the expected mean-squared error and printed to the screen as output. The shrinkage and EM options are designed for opposite scenarios (low vs. high density) and cannot be used simultaneously. The multicore functionality only works for Mac, Linux, and UNIX users who have installed R package multicore. Do not use this option from the R GUI; you must execute from the command line. When the EM algorithm is used, the imputed alleles can lie outside the interval [-1,1]. Polymorphic markers that do not meet the min.MAF and max.missing criteria are not imputed.

References

Endelman, J.B., and J.-L. Jannink. 2012. Shrinkage estimation of the realized relationship matrix. G3:Genes, Genomes, Genetics. Poland, J., J. Endelman et al. 2012. Genomic selection in wheat breeding using genotyping-by-sequencing. Plant Genome. doi: 10.3835/plantgenome2012.06.0006

Examples

Run this code

#random population of 200 lines with 1000 markers
X <- matrix(rep(0,200*1000),200,1000)
for (i in 1:200) {
  X[i,] <- ifelse(runif(1000)<0.5,-1,1)
}

A <- A.mat(X)

Run the code above in your browser using DataLab