mmer2: Mixed Model Equations in R 2

Description

This function solves the mixed model equations proposed by Henderson (1975). It has been implemented to be more user-friendly than mmer(which works directly with incidence matrices and variance covariance matrices for each random effect). This version creates the incidence matrices for the user and is more similar to the 'lm' type of functions, with asreml type of specification. Please refer to mmer for further details about methods. In general, the methods implemented by this function currently are; "EMMA" efficient mixed model association (Kang et al. 2008), "AI" average information (Gilmour et al. 1995; Lee et al. 2015), and "EM" expectation maximization (Searle 1993; Bernardo 2010). The EMMA method can be implemented when only one variance component other than the error variance component (Var(e)) is estimated, is based on optimizing the REML function (see details). On the other hand when more than one variance component needs to be estimated the "AI" and "EM"" methods should be used. Please keep in mind that THIS VERSION NAMED 'mmer2' IS LIMITED compared with mmer since the matrices for random effects are built based on a dataframe provided in the 'data' argument, and genetic models sometimes require an incidence matrix with markers (several columns) to count as single random effect, for fitting more general models please use the mmer function which works directly with the incidence matrices and variance covariance matrices for each random effect. The package provides kernels to estimate additive (A.mat), dominant (D.mat), and epistatic (E.mat) relationship matrices that have been shown to increase prediction accuracy. The package provides flexibility to fit other genetic models such as full and half diallel models as well, see hdm function information to see how to fit those models using sommer.

Usage

mmer2(fixed=NULL, random=NULL, G=NULL, R=NULL, method="AI", REML=TRUE, 
      iters=50, draw=FALSE, init=NULL, data=NULL, family=gaussian, 
      silent=FALSE, constraint=TRUE, sherman=FALSE, MTG2=FALSE)

Arguments

fixed

a formula specifying the response variable and the fixed effects, i.e. Yield ~ Location

random

a formula specifying the name of the random effects related to environmental effects, i.e. random= ~ genotype + year.

a list containing the variance-covariance matrices for the random effects, i.e. G=list(genotype=M1, year=M2), where M1 and M2 are the variance-covariance matrices. if not passed is assumed an identity matrix.

a matrix for variance-covariance structures for the residuals, i.e. for longitudinal data. if not passed is assumed an identity matrix.

method

this refers to the method or algorithm to be used for estimating variance components. The package currently is supported by 3 algorithms; "EMMA" efficient mixed model association (Kang et al. 2008), "AI" average information (Gilmour et al. 1995; Lee et al

REML

a TRUE/FALSE value indicating if restricted maximum likelihood should be used instead of ML. The default is TRUE.

iters

a scalar value indicating how many iterations have to be performed if the EM or AI algorithms are selected. There is no rule of tumb for the number of iterations. The default value is 50 iterations or EM steps, but could take less or much longer than that

draw

a TRUE/FALSE value indicating if a plot of updated values for the variance components and the likelihood should be drawn or not. The default is FALSE. COMPUTATION TIME IS SHORTER IF YOU DON'T PLOT SETTING draw=FALSE

init

an vector of initial values for the EM algorithm if this is the method selected. If not provided the program uses a starting values the variance(y)/#random.eff which is usually a good starting value.

data

a data frame containing the variables specified in the formulas for response, fixed, and random effects.

family

a family object to specify the distribution of the response variable. The program will only use the link function to transform the response. For details see family help page. The argument would look somethin

silent

a TRUE/FALSE value indicating if the function should draw the progress bar and poems (see poe function) while working or should not be displayed. The default is FALSE, which means is not silent and will display

constraint

a TRUE/FALSE value indicating if the program should use the boundary constraint when one or more variance component is close to the zero boundary. The default is TRUE but needs to be used carefully. It works ideally when few variance components are close

sherman

a TRUE/FALSE value indicating if Sherman-Morrison-Woodbury formula (Seber, 2003, p. 467) should be used when estimating variance components in order to perform faster when a mixed model with no covariance structure using the average information algorithm

MTG2

a TRUE/FALSE value indicating if an eigen decomposition for the additive relationship matrix should be performed or not. This is based on Lee (2015). The limitations of this methos are: 1) can only be applied to one relationship matrix 2) The

Value

If all parameters are correctly indicated the program will return a list with the following information: [object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Details

The package has been developed to provide R users with code to understand how most common algorithms in mixed model analysis work related to genetics field, but also allowing to perform their real analysis. This package allows the user to calculate the variance components for a mixed model with the advantage of specifying the variance-covariance structure of the random effects. This program focuses in the mixed model of the form:

y = Xb + Zu + e

b ~ N[b.hat, 0] ............zero variance because is a fixed term

u ~ N[0, K*sigma(u)] .......where: K*sigma(u) = G

e ~ N[0, I*sigma(e)] .......where: I*sigma(e) = R

y ~ N[Xb, var(Zu+e)] ......where;

var(y) = var(Zu+e) = ZGZ+R = V which is the phenotypic variance

The function allows the user to specify the incidence matrices with their respective variance-covariance matrix in a 2 level list structure. For example imagine a mixed model with the following design:

fixed = only intercept.....................b ~ N[b.hat, 0]

random = GCA1 + GCA2 + SCA.................u ~ N[0, G]

where G is:

|K*sigma2(gca1).....................0..........................0.........| |.............0.............S*sigma2(gca2).....................0.........| = G

|.............0....................0......................W*sigma2(sca)..|

PLEASE KEEP IN MIND THAT THIS VERSION NAMED 'mmer2' IS LIMITED compared with mmer since the matrices for random effects are built based on a dataframe provided, and genetic models sometimes require a matrix with markers (several columns) to count as single random effect. For fitting more general models please use the mmer function which works directly with the incidence matrices and variance covariance matrices for each random effect.

FOR DETAILS ON HOW THE "AI", EM" AND "EMMA" ALGORITHMS WORK PLEASE REFER TO mmer, and methods AI , EM AND EMMA

In addition, the package contains a very nice function to plot genetic maps with numeric variable or traits next to the LGs, see the map.plot2 function to see how easy can be done. Other functions contained in the package are:

transp function transform a vector of colors in transparent colors.

fdr calculates the false discovery rate for a vector of p-values.

A.mat is a wrapper of the A.mat function from the rrBLUP package.

score.calc is a function that can be used to calculate a -log10 p-value for a vector of BLUEs for marker effects.

Other functions such as summary, fitted, randef (notice sommer uses randef not ranef), anova, residuals, coef and plot applicable to typical linear models can also be applied to models fitted using this function which is a wrapper of mmer, the core of the sommer package.

References

Bernardo Rex. 2010. Breeding for quantitative traits in plants. Second edition. Stemma Press. 390 pp.

Gilmour et al. 1995. Average Information REML: An efficient algorithm for variance parameter estimation in linear mixed models. Biometrics 51(4):1440-1450.

Kang et al. 2008. Efficient control of population structure in model organism association mapping. Genetics 178:1709-1723.

Lee et al. 2015. MTG2: An efficient algorithm for multivariate linear mixed model analysis based on genomic information. Cold Spring Harbor. doi: http://dx.doi.org/10.1101/027201.

Searle. 1993. Applying the EM algorithm to calculating ML and REML estimates of variance components. Paper invited for the 1993 American Statistical Association Meeting, San Francisco.

Yu et al. 2006. A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Genetics 38:203-208.

Examples

Run this code

####=========================================####
#### For CRAN time limitations most lines in the 
#### examples are silenced with one '#' mark, 
#### remove them and run the examples
####=========================================####

####=========================================####
####=========================================####
#### EXAMPLE 1
#### breeding values with 3 variance components
#### for hybrid prediction
####=========================================####
####=========================================####
data(cornHybrid)
hybrid2 <- cornHybrid$hybrid # extract cross data
A <- cornHybrid$K
y <- hybrid2$Yield

####=========================================####
#### Realized IBS relationships for set of parents 1
####=========================================####
K1 <- A[levels(hybrid2$GCA1), levels(hybrid2$GCA1)]; dim(K1)     
####=========================================####
#### Realized IBS relationships for set of parents 2
####=========================================####
K2 <- A[levels(hybrid2$GCA2), levels(hybrid2$GCA2)]; dim(K2)     
####=========================================####
#### SCA relationship matrix (kronecker)
####=========================================####
S <- kronecker(K1, K2) ; dim(S)   
rownames(S) <- colnames(S) <- levels(hybrid2$SCA)

#head(hybrid2)
#ans <- mmer2(Yield ~ Location, random = ~ GCA1 + GCA2 + SCA, 
#       G=list(GCA1=K1, GCA2=K2, SCA=S),data=hybrid2)
#ans$var.comp
#summary(ans)

####=========================================####
#### please remember THIS FUNCTION IS LIMITED since the matrices for random 
#### effects are built based on a dataframe provided, for more general models 
#### including the genomic analysis please use the 'mmer' function which 
#### works directly with matrices and is more flexible
####=========================================####

########################################################
########################################################
########################################################
########################################################
########################################################
########################################################

####=========================================####
#### EXAMPLE 2
#### comparison with lmer, install 'lme4' 
#### and run the code below
####=========================================####

#### lmer cannot use var-cov matrices so we will not 
#### use them in this comparison example

#library(lme4)
#library(sommer)
#data(cornHybrid)
#hybrid2 <- cornHybrid$hybrid
#fm1 <- lmer(Yield ~ Location + (1|GCA1) + (1|GCA2) + (1|SCA),
#        data=hybrid2 )
#out <- mmer2(Yield ~ Location, random = ~ GCA1 + GCA2 + SCA,
#        data=hybrid2)
#summary(fm1)
#summary(out)
#### same BLUPs for GCA1, GCA2, SCA than lme4
#plot(out$cond.residuals, residuals(fm1))
#plot(out$u.hat[[1]], ranef(fm1)$GCA1[,1])
#plot(out$u.hat[[2]], ranef(fm1)$GCA2[,1])
#vv=which(abs(out$u.hat[[3]]) > 0)
#plot(out$u.hat[[3]][vv,], ranef(fm1)$SCA[,1])

########################################################
########################################################
########################################################
########################################################
########################################################
########################################################

####=========================================####
#### EXAMPLE 3
#### comparison with lmer, install 'lme4' 
#### and run the code below
####=========================================####

#library(lme4)
#data(sleepstudy)
#fm1 <- lmer(Reaction ~ Days + (1 | Subject), sleepstudy)
#summary(fm1)

#out <- mmer2(Reaction ~ Days, random = ~ Subject, data=sleepstudy)
#summary(out)

# plot(out$cond.residuals, residuals(fm1))
# plot(out$u.hat[[1]], ranef(fm1)[[1]][,1])

Run the code above in your browser using DataLab