lmebreed: Fit breeding-related mixed-effects models

Description

Fits linear or generalized linear mixed models incorporating known relationships (e.g., genetic relationship matrices) and customized random effects (e.g., overlay models). Big-data genomic models can be fitted using the eigen decomposition proposed by Lee and Van der Werf (2006).

Usage

lmebreed(formula, data, REML = TRUE, control = list(), start = NULL, 
         verbose = 1L, subset, weights, na.action, offset, contrasts = NULL, 
         calc.derivs = FALSE, nIters=100,
         # lmebreed special params
         family = NULL, relmat = list(), addmat = list(), trace=1L,
         dateWarning=TRUE, rotation=FALSE,  rotationK=NULL, coefOutRotation=8, 
         returnFormula=FALSE, suppressOpt=FALSE, ...)

Value

a lmebreed object.

Arguments

formula

as in lmer

data

as in lmer

REML

as in lmer

control

as in lmer

start

as in lmer

verbose

as in lmer

subset

as in lmer

weights

as in lmer

na.action

as in lmer

offset

as in lmer

contrasts

as in lmer

calc.derivs

as in lmerControl. The intention of having this argument at the level of main argument is to set it as FALSE as default to speed up computation. This means that in lme4breeding if you pass calc.derivs inside the lmerControl object it won't have any effect. You need to set it using the argument in the lmebreed function.

nIters

the number of iterations used by the optimizeGlmer and optimizeLmer functions. Internally, it replaces the values of the maxfun and maxeval arguments in the lmerControl optCtrl passed.This means that in lme4breeding if you pass calc.derivs inside the lmerControl object it won't have any effect. You need to set it using the argument in the lmebreed function.

family

as in glmer

relmat

an optional named list of relationship matrices between levels of a given random effect (not the inverse). Internally the Cholesky decomposition of those matrices is computed to adjust the incidence matrices. The names of the elements in the list must correspond to the names of slope factors for random-effects terms in the formula argument.

addmat

an optional named list of customized incidence matrices. The names of the elements must correspond to the names of grouping factors for random-effects terms in the formula argument. Depending on the use-case the element in the list may be a single matrix or a list of matrices. Please see examples and vignettes to learn how to use it.

trace

integer scalar. If > 0 verbose output is generated during the progress of the model fit.

dateWarning

a logical value indicating if you want to be warned when a new version of lme4breeding is available on CRAN. Default is TRUE.

rotation

a logical value indicating if you want to compute the eigen decomposition of the relationship matrix to rotate the response vector y and the fixed effect matrix X in order to accelerate the computation. This argument requires the dataset to be balanced and without missing data for the slope variable, intercept variables, and the response involved in the model. See details to understand more about this argument and vignettes for examples.

rotationK

an integer value indicating the number of eigen vectors to compute when the rotation argument is set to TRUE. By default all eigen vectors are computed.

coefOutRotation

a numeric value denoting the inter-quantile outlier coefficient to be used in the rotation of the response when using the eigen decomposition to avoid overshooting.

returnFormula

a logical value indicating if you want to only get the results from the lFormula after the relmat and addmat arguments have been applied. This is normally just used for debugging.

suppressOpt

a logical value indicating if you want to force the model to use your customized variance components without estimating them. It skips the optimize(g)Lmer step to force the customized variance components. The variance components should be provided in the start argument. If you want to provide the variance components from a previous model you can get the initial values by running:

getME(mix1, 'sigma')

which returns a vector with theta values.

...

as in lmer

Author

Giovanny Covarrubias-Pazaran

Details

All arguments to this function are the same as those to the function lmer except relmat and addmat which must be named lists. Each name must correspond to the name of a grouping factor in a random-effects term in the formula. The observed levels of that factor must be contained in the rownames and columnames of the relmat. Each relmat is the relationship matrix restricted to the observed levels and applied to the model matrix for that term. The incidence matrices in the addmat argument must match the dimensions of the final fit (pay attention to missing data in responses).

When you use the relmat argument the square root of the relationship matrix will be computed internally. Therefore, to recover the correct BLUPs for those effects you need to use the ranef function which internally multiple the obtained BLUPs by the square root of the relationship matrix one more time to recover the correct BLUPs.

The argument rotation applies the eigen decomposition proposed by Lee and Van der Werf in 2016 and makes the genetic evaluation totally sparse leading to incredible gains in speed compared to the classical approach. Internally, the eigen decomposition UDU' is carried in the relationship matrix. The U matrix is then taken to the n x n level (n being the number of records), and post-multiplied by a matrix of records presence (n x n) using the element-wise multiplication of two matrices (Schur product). By default is not activated since this may not provide the exact same variance components than other software due to numerical reasons. If you would like to obtain the exact same variance components than other software you will have to keep rotation=FALSE. This will slow down considerably the speed. Normally when the rotation is activated and variance components differ slightly with other software they will still provide extremely similar estimates at the speed of hundreds or thousands of times faster. Please consider this.

Additional useful functions are; tps for spatial kernels, rrm for reduced rank matrices, atcg1234 for conversion of genetic markers, overlay for overlay matrices, reshape for moving wide format multi-trait datasets into long format.

Normally, using the optimizer argument inside the lmerControl keep in mind that the number of iterations is called differently depending on the optimizer. For Nelder_Mead, bobyqa and nlminbwrap is called "maxfun" whereas for nloptwrap is called "maxeval". This should be passed inside a list in the optCtrl argument. For example:

lmebreed(... , control = lmerControl( optimizer="Nelder_Mead", optCtrl=list(maxfun=100) ), ... )

But here, we have created the nIters argument to control the number of iterations in the optimizer. To predict values for unobserved levels you will need to impute the data and update your model with the new dataset and the initial starting values:

newModel <- update(oldModel, suppressOpt = TRUE, start=getME(oldModel, 'sigma'), data=imputedData)

It is also worth mentioning that when the user does not provide the (g)lmerControl we take the freedom to specify it to avoid some common warning or error messages such as calculating the second derivatives for big models or not allowing to have a single record per treatment level:

control <- (g)lmerControl( calc.derivs = FALSE, restart_edge = FALSE, check.nobs.vs.nlev = "ignore", check.nobs.vs.rankZ = "ignore", check.nobs.vs.nRE="ignore" )

This is important to notice because if the user decides to specify its own controls then the user should also consider some of these arguments that are the defaults in lmebreed.

Methods

Some important methods to keep in mind are:

fixef: allows you to extract fixed coefficients (BLUEs)

vcov: allows you to extract the standard errors of fixed effects

ranef: allows you to extract random coefficients (BLUPs) and their standard errors

getCi: allows you to extract the predicted error variance (PEV) matrix from a model

mkMmeIndex: alllows you to extract the indices of the different fixed and random coefficients

predict: allows you to extract fixed coefficients (BLUEs)

Example Datasets

The package has been equiped with several datasets to learn how to use the lme4breeding package:

* DT_big simulated dataset containing 1K individuals in 50 environments to show how to fit big models.

* DT_btdata dataset contains an animal (birds) model.

* DT_cornhybrids dataset to perform genomic prediction in hybrid single crosses

* DT_cpdata dataset to fit genomic prediction models within a biparental population coming from 2 highly heterozygous parents including additive, dominance and epistatic effects.

* DT_fulldiallel dataset with examples to fit full diallel designs.

* DT_gryphon data contains an example of an animal model including pedigree information.

* DT_halfdiallel dataset with examples to fit half diallel designs.

* DT_h2 to calculate heritability

* DT_ige dataset to show how to fit indirect genetic effect models.

* DT_legendre simulated dataset for random regression model.

* DT_mohring datasets with examples to fit half diallel designs.

* DT_polyploid to fit genomic prediction and GWAS analysis in polyploids.

* DT_sleepstudy dataset to know how to translate lme4 models to lme4breeding models.

* DT_technow dataset to perform genomic prediction in hybrid single crosses

* DT_wheat dataset to do genomic prediction in single crosses in species displaying only additive effects.

References

Giovanny Covarrubias-Pazaran (2024). lme4breeding: enabling genetic evaluation in the age of genomic data. To be submitted.

Douglas Bates, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48.

Lee & Van der Werf (2016). MTG2: an efficient algorithm for multivariate linear mixed model analysis based on genomic information. Bioinformatics, 32(9), 1420-1422.

Examples

Run this code


data(DT_example)
DT <- DT_example
A <- A_example

ansMain <- lmebreed(Yield ~ Env + (1|Name),
                        relmat = list(Name = A ),
                        data=DT)
vc <- VarCorr(ansMain); print(vc,comp=c("Variance"))
sigma(ansMain)^2 # error variance

BLUP <- ranef(ansMain, condVar=TRUE)
PEV <- lapply(BLUP, function(x){attr(x, which="postVar")}) # take sqrt() for SEs

Run the code above in your browser using DataLab