kin.blup(data,geno,pheno,GAUSS=FALSE,K=NULL,fixed=NULL,covariate=NULL,
PEV=FALSE,n.core=1,theta.seq=NULL,reduce=TRUE)
mixed.solve
and thus solves mixed models of the form:
$$y = X \beta + [Z \: 0] g + \varepsilon$$
where $\beta$ is a vector of fixed effects, $g$ is a vector of random genotypic values with covariance
$G = Var[g]$, and the residuals are i.i.d. ($Var[\varepsilon] = I \: V_e$). The design matrix for the genetic values has been partitioned to illustrate that not all lines need phenotypes (i.e., for genomic selection). Unlike mixed.solve
, this function does not return estimates of the fixed effects, only the BLUP solution for the genotypic values. It was designed to replace kinship.BLUP
and to relieve the user of having to explicitly construct design matrices. Variance components are estimated by REML and BLUP values are returned for every entry in K, regardless of whether it has been phenotyped. The rownames of K must match the genotype labels in the data frame for phenotyped lines; missing phenotypes (NA) are simply omitted.
Unlike its predecessor, this function does not handle marker data directly. For breeding value prediction, the user must supply a relationship matrix, which can be calculated from markers with A.mat
. For Gaussian kernel predictions, pass the Euclidean distance matrix for K, which can be calculated with dist
.
In the terminology of mixed models, both the "fixed" and "covariate" variables are fixed effects ($\beta$ in the above equation): the former are treated as factors with distinct levels while the latter are continuous with one coefficient per variable. If no fixed effects of any kind are specified, an overall mean is included in the model.
The prediction error variance (PEV) is the square of the SE of the BLUPs (see mixed.solve
) and can be used to estimate the accuracy of BLUP predictions.
When the number of observations exceeds the number of lines, as happens with replicated trials or measurements, genomic predictions can be done by explicitly modeling the environmental variables in a one-step approach. For unbalanced trials, this accounts for differing degrees of replication but is more computationally intensive than a two-step model. When reduce=TRUE, the original mixed model is multiplied by $L = (Z'Z)^{-1/2}Z'$ to generate a new mixed model with dimension equal to the number of lines, i.i.d. residuals, and the same variance parameters as in the original model. (In general, only linear combinations of the original fixed effects parameters $\beta$ are estimable in the reduced model--this is handled by the software.) The reduced problem is similar to a two-step prediction with rotation of the adjusted means (Schulz-Streeck et al. 2012), but it is one-step in the sense that variance components and BLUPs are only estimated once.#random population of 200 lines with 1000 markers
M <- matrix(rep(0,200*1000),200,1000)
for (i in 1:200) {
M[i,] <- ifelse(runif(1000)<0.5,-1,1)
}
rownames(M) <- 1:200
A <- A.mat(M)
#random phenotypes
u <- rnorm(1000)
g <- as.vector(crossprod(t(M),u))
h2 <- 0.5 #heritability
y <- g + rnorm(200,mean=0,sd=sqrt((1-h2)/h2*var(g)))
data <- data.frame(y=y,gid=1:200)
#predict breeding values
ans <- kin.blup(data=data,geno="gid",pheno="y",K=A)
accuracy <- cor(g,ans$g)
Run the code above in your browser using DataLab