Learn R Programming

SFSI (version 1.2.0)

fitBLUP: Function fitBLUP

Description

Solves the Linear Mixed Model and calculates the Best Linear Unbiased Predictor (BLUP)

Usage

fitBLUP(y, X = NULL, Z = NULL, K = NULL, U = NULL, 
        d = NULL, theta = NULL, BLUP = TRUE, 
        method = c("REML","ML"), return.Hinv = FALSE, 
        tol = 1E-5, maxiter = 1000, interval = c(1E-9,1E9),
        warn = TRUE)

Value

Returns a list object that contains the elements:

  • b: (vector) fixed effects solutions (including the intercept).

  • u: (vector) random effects solutions.

  • varU: random effect variance.

  • varE: residual variance.

  • h2: heritability.

  • convergence: (logical) whether Brent's method converged.

  • method: either 'REML' or 'ML' method used.

Arguments

y

(numeric vector) Response variable

X

(numeric matrix) Design matrix for fixed effects. When X=NULL a vector of ones is constructed only for the intercept (default)

Z

(numeric matrix) Design matrix for random effects. When Z=NULL an identity matrix is considered (default) thus G = K; otherwise G = Z K Z' is used

K

(numeric matrix) Kinship relationships. This can be of the "float32" type as per the 'float' R-package, or a (character) name of a binary file where the matrix is stored

U

(numeric matrix) Eigenvectors from spectral value decomposition of G = U D U'

d

(numeric vector) Eigenvalues from spectral value decomposition of G = U D U'

theta

(numeric) Residual/genetic variances ratio. When it is not NULL, the optimization of the likelihood function (REML or ML) is not performed

BLUP

TRUE or FALSE to whether return the random effects estimates

method

(character) Either 'REML' (Restricted Maximum Likelihood) or 'ML' (Maximum Likelihood)

return.Hinv

TRUE or FALSE to whether return the inverse of the matrix H

tol

(numeric) Maximum error between two consecutive solutions (convergence tolerance) when finding the root of the log-likelihood's first derivative

maxiter

(integer) Maximum number of iterations to run before convergence is reached

interval

(numeric vector) Range of values in which the root is searched

warn

TRUE or FALSE to whether show warnings

Author

Paulino Perez-Rodriguez, Marco Lopez-Cruz (maraloc@gmail.com) and Gustavo de los Campos

Details

The basic linear mixed model that relates phenotypes with genetic values is of the form

y = X b + Z u + e

where y is a vector with the response, b is the vector of fixed effects, u is the vector of the (random) genetic values of the genotypes, e is the vector of environmental residuals (random error), and X and Z are design matrices conecting the fixed and genetic effects with replicates. Genetic values are assumed to follow a Normal distribution as u ~ N(02uK), and the error terms are assumed e ~ N(02eD), with D=I being an identity matrix.

The vector of genetic values g = Z u will therefore follow g ~ N(02uG) where G = Z K Z'. In the un-replicated case, Z = I is an identity matrix, and hence g = u and G = K.

The predicted values utrn = (ui), i = 1,2,...,ntrn, corresponding to observed data (training set) are derived as

utrn = H (ytrn - Xtrnb)

where H is a matrix of weights given by

H = Gtrn (Gtrn + θD)-1

where Gtrn is the sub-matrix corresponding to the training set, and θ = σ2e2u is the residual/genetic variances ratio representing a shrinkage parameter. This parameter is expressed in terms of the heritability, h2 = σ2u/(σ2u + σ2e), as θ = (1 - h2)/h2.

The predictions of utst corresponding to un-observed data (testing set) can be obtained by using

H = Gtst,trn (Gtrn + θD)-1

where Gtst,trn is the sub-matrix of G corresponding to the testing set (in rows) and training set (in columns).

Solutions are found using the GEMMA (Genome-wide Efficient Mixed Model Analysis) approach (Zhou & Stephens, 2012). First, the Brent's method is implemented to solve for the genetic/residual variances ratio (i.e., 1/θ) from the first derivative of the log-likelihood (either REML or ML). Then, variances σ2u and σ2e are calculated. Finally, b is obtained using Generalized Least Squares.

References

VanRaden PM (2008). Efficient methods to compute genomic predictions. Journal of Dairy Science, 91(11), 4414–4423.

Zhou X, Stephens M (2012). Genome-wide efficient mixed-model analysis for association studies. Nature Genetics, 44(7), 821-824

Examples

Run this code
  require(SFSI)
  data(wheatHTP)
  
  index = which(Y$trial %in% 1:6)     # Use only a subset of data
  Y = Y[index,]
  M = scale(M[index,])/sqrt(ncol(M))  # Subset and scale markers
  G = tcrossprod(M)                   # Genomic relationship matrix
  y = as.vector(scale(Y[,"E1"]))      # Scale response variable
  
  # Training and testing sets
  tst = which(Y$trial == 2)
  trn = which(Y$trial != 2)

  yNA <- y
  yNA[tst] <- NA
  fm1 = fitBLUP(yNA, K=G)
  plot(y[tst],fm1$u[tst])      # Predicted vs observed values in testing set
  cor(y[tst],fm1$u[tst])       # Prediction accuracy in testing set
  cor(y[trn],fm1$u[trn])       # Prediction accuracy in training set
  fm1$theta                    # Residual/Genetic variances ratio
  fm1$h2                       # Heritability
  # \donttest{
  if(requireNamespace("float")){
   # Using a 'float' type variable
   G2 = float::fl(G)
   fm2 = fitBLUP(yNA, K=G2)
   max(abs(fm1$u-fm2$u))  # Check for discrepances
  }
  # }

Run the code above in your browser using DataLab