A.mat(X,min.MAF=NULL,max.missing=NULL,impute.method="mean",tol=0.02,
n.core=1,shrink=FALSE,return.imputed=FALSE, ploidy=2)
If return.imputed = TRUE, the function returns a list containing [object Object],[object Object]
The EM imputation algorithm is based on the multivariate normal distribution and was designed for use with GBS (genotyping-by-sequencing) markers, which tend to be high density but with lots of missing data. Details are given in Poland et al. (2012). The EM algorithm stops at iteration $t$ when the RMS error = $n^{-1} \|A_{t} - A_{t-1}\|_2$ < tol.
At low marker density (m < n), shrinkage estimation can improve the estimate of the relationship matrix and the accuracy of GEBVs for lines with low accuracy phenotypes (Endelman and Jannink 2012). The shrinkage intensity ranges from 0 (no shrinkage, same estimator as high density formula) to 1 (completely shrunk to $(1+f)I$). The shrinkage intensity is chosen to minimize the expected mean-squared error and printed to the screen as output.
The shrinkage and EM options are designed for opposite scenarios (low vs. high density) and cannot be used simultaneously.
When the EM algorithm is used, the imputed alleles can lie outside the interval [-1,1]. Polymorphic markers that do not meet the min.MAF and max.missing criteria are not imputed.
Covarrubias-Pazaran G. 2016. Genomic prediction using the R package sommer. Genetics X:yyy-yyy.
Poland, J., J. Endelman et al. 2012. Genomic selection in wheat breeding using genotyping-by-sequencing. Plant Genome 5:103-113. doi: 10.3835/plantgenome2012.06.0006
####=========================================####
#### random population of 200 lines with 1000 markers
####=========================================####
X <- matrix(rep(0,200*1000),200,1000)
for (i in 1:200) {
X[i,] <- ifelse(runif(1000)<0.5,-1,1)
}
A <- A.mat(X)
Run the code above in your browser using DataLab