Fits linear or generalized linear mixed models incorporating known relationships (e.g., genetic relationship matrices) and customized random effects (e.g., overlay models). Big-data genomic models can be fitted using the eigen decomposition proposed by Lee and Van der Werf (2006).
lmebreed(formula, data, REML = TRUE, control = list(), start = NULL,
verbose = 1L, subset, weights, na.action, offset, contrasts = NULL,
calc.derivs = FALSE, nIters=100,
# lmebreed special params
family = NULL, relmat = list(), addmat = list(), trace=1L,
dateWarning=TRUE, rotation=FALSE, rotationK=NULL, coefOutRotation=8,
returnFormula=FALSE, suppressOpt=FALSE, ...)
a lmebreed
object.
as in lmer
as in lmer
as in lmer
as in lmer
as in lmer
as in lmer
as in lmer
as in lmer
as in lmer
as in lmer
as in lmer
as in lmerControl
. The intention of having
this argument at the level of main argument is to set it as FALSE as default
to speed up computation. This means that in lme4breeding if you pass calc.derivs
inside the lmerControl object it won't have any effect. You need to set it using the
argument in the lmebreed function.
the number of iterations used by the optimizeGlmer
and optimizeLmer
functions. Internally, it replaces the values of
the maxfun
and maxeval
arguments in the lmerControl
optCtrl passed.This means that in lme4breeding if you pass calc.derivs
inside the lmerControl object it won't have any effect. You need to set it using the
argument in the lmebreed function.
as in glmer
an optional named list of relationship matrices between levels of a
given random effect (not the inverse).
Internally the Cholesky decomposition of those matrices is computed to adjust the
incidence matrices. The names of the elements in the list must correspond to
the names of slope factors for random-effects terms in the formula
argument.
an optional named list of customized incidence matrices.
The names of the elements must correspond to the names of grouping factors for
random-effects terms in the formula
argument. Depending on the use-case
the element in the list may be a single matrix or a list of matrices. Please see
examples and vignettes to learn how to use it.
integer scalar. If > 0 verbose output is generated during the progress of the model fit.
a logical value indicating if you want to be warned when a new version of lme4breeding is available on CRAN. Default is TRUE.
a logical value indicating if you want to compute the eigen decomposition of the relationship matrix to rotate the response vector y and the fixed effect matrix X in order to accelerate the computation. This argument requires the dataset to be balanced and without missing data for the slope variable, intercept variables, and the response involved in the model. See details to understand more about this argument and vignettes for examples.
an integer value indicating the number of eigen vectors to compute when the rotation argument is set to TRUE. By default all eigen vectors are computed.
a numeric value denoting the inter-quantile outlier coefficient to be used in the rotation of the response when using the eigen decomposition to avoid overshooting.
a logical value indicating if you want to only get the results from
the lFormula
after the relmat and addmat arguments have been applied.
This is normally just used for debugging.
a logical value indicating if you want to force the model to use
your customized variance components without estimating them. It skips the
optimize(g)Lmer step to force the customized variance components. The variance components
should be provided in the start
argument. If you want to provide the variance
components from a previous model you can get the initial values by running:
getME(mix1, 'sigma')
which returns a vector with theta values.
as in lmer
Giovanny Covarrubias-Pazaran
All arguments to this function are the same as those to the function
lmer
except relmat
and addmat
which must be
named lists. Each name must correspond to the name of a grouping factor in a
random-effects term in the formula
. The observed levels
of that factor must be contained in the rownames and columnames of the relmat.
Each relmat is the relationship matrix restricted
to the observed levels and applied to the model matrix for that term. The incidence
matrices in the addmat argument must match the dimensions of the final fit (pay
attention to missing data in responses).
When you use the relmat
argument the square root of the relationship matrix
will be computed internally. Therefore, to recover the correct BLUPs for those effects
you need to use the ranef
function which internally multiple the
obtained BLUPs by the square root of the relationship matrix one more time to recover the correct BLUPs.
The argument rotation
applies the eigen decomposition proposed by Lee and Van der Werf in 2016
and makes the genetic evaluation totally sparse leading to incredible gains in speed compared
to the classical approach. Internally, the eigen decomposition UDU' is carried in the relationship
matrix. The U matrix is then taken to the n x n level (n being the number of records), and post-multiplied
by a matrix of records presence (n x n) using the element-wise multiplication of two matrices (Schur product).
By default is not activated since this may not provide the exact same variance components than other software due to
numerical reasons. If you would like to obtain the exact same variance components than other software you will
have to keep rotation=FALSE
. This will slow down considerably the speed. Normally when the rotation is
activated and variance components differ slightly with other software they will still provide extremely similar
estimates at the speed of hundreds or thousands of times faster. Please consider this.
Additional useful functions are; tps
for spatial kernels, rrm
for reduced rank matrices, atcg1234
for conversion of genetic markers,
overlay
for overlay matrices, reshape
for moving wide
format multi-trait datasets into long format.
Normally, using the optimizer argument inside the lmerControl
keep
in mind that the number of iterations is called differently depending on the optimizer.
For Nelder_Mead
, bobyqa and nlminbwrap
is
called "maxfun" whereas for nloptwrap
is called "maxeval".
This should be passed inside a list in the optCtrl
argument. For example:
lmebreed(... ,
control = lmerControl(
optimizer="Nelder_Mead",
optCtrl=list(maxfun=100)
), ...
)
But here, we have created the nIters
argument to control the number of iterations
in the optimizer. To predict values for unobserved levels you will need to impute the data and update
your model with the new dataset and the initial starting values:
newModel <- update(oldModel, suppressOpt = TRUE,
start=getME(oldModel, 'sigma'),
data=imputedData)
It is also worth mentioning that when the user does not provide the (g)lmerControl we take the freedom to specify it to avoid some common warning or error messages such as calculating the second derivatives for big models or not allowing to have a single record per treatment level:
control <- (g)lmerControl(
calc.derivs = FALSE,
restart_edge = FALSE,
check.nobs.vs.nlev = "ignore",
check.nobs.vs.rankZ = "ignore",
check.nobs.vs.nRE="ignore"
)
This is important to notice because if the user decides to specify its own controls then the user should also consider some of these arguments that are the defaults in lmebreed.
Methods
Some important methods to keep in mind are:
fixef
: allows you to extract fixed coefficients (BLUEs)
vcov
: allows you to extract the standard errors of fixed effects
ranef
: allows you to extract random coefficients (BLUPs) and their standard errors
getCi
: allows you to extract the predicted error variance (PEV) matrix from a model
mkMmeIndex
: alllows you to extract the indices of the different fixed and random coefficients
predict
: allows you to extract fixed coefficients (BLUEs)
Example Datasets
The package has been equiped with several datasets to learn how to use the lme4breeding package:
* DT_big
simulated dataset containing 1K individuals in 50 environments to show how to fit big models.
* DT_btdata
dataset contains an animal (birds) model.
* DT_cornhybrids
dataset to perform genomic prediction in hybrid single crosses
* DT_cpdata
dataset to fit genomic prediction models within a biparental population coming from 2 highly heterozygous parents including additive, dominance and epistatic effects.
* DT_fulldiallel
dataset with examples to fit full diallel designs.
* DT_gryphon
data contains an example of an animal model including pedigree information.
* DT_halfdiallel
dataset with examples to fit half diallel designs.
* DT_h2
to calculate heritability
* DT_ige
dataset to show how to fit indirect genetic effect models.
* DT_legendre
simulated dataset for random regression model.
* DT_mohring
datasets with examples to fit half diallel designs.
* DT_polyploid
to fit genomic prediction and GWAS analysis in polyploids.
* DT_sleepstudy
dataset to know how to translate lme4 models to lme4breeding models.
* DT_technow
dataset to perform genomic prediction in hybrid single crosses
* DT_wheat
dataset to do genomic prediction in single crosses in species displaying only additive effects.
Giovanny Covarrubias-Pazaran (2024). lme4breeding: enabling genetic evaluation in the age of genomic data. To be submitted.
Douglas Bates, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48.
Lee & Van der Werf (2016). MTG2: an efficient algorithm for multivariate linear mixed model analysis based on genomic information. Bioinformatics, 32(9), 1420-1422.
data(DT_example)
DT <- DT_example
A <- A_example
ansMain <- lmebreed(Yield ~ Env + (1|Name),
relmat = list(Name = A ),
data=DT)
vc <- VarCorr(ansMain); print(vc,comp=c("Variance"))
sigma(ansMain)^2 # error variance
BLUP <- ranef(ansMain, condVar=TRUE)
PEV <- lapply(BLUP, function(x){attr(x, which="postVar")}) # take sqrt() for SEs
Run the code above in your browser using DataLab