HLfit: Fit mixed models with given correlation matrix

Description

This function fits GL(M)Ms as well as some hierarchical generalized linear models (HGLM; Lee and Nelder 2001). It may be called on its own but is now better seen as a backend for the main fitting function fitme (or fitmv for multivariate-response models). This documentation completes the documentation of the latter functions with respect to some arguments they pass to HLfit and with respect to the structure of the objects they return.

On its own, HLfit fits both fixed effects parameters, and dispersion parameters i.e. the variance of the random effects (full covariance for random-coefficient models), and the variance of the residual error. The linear predictor is of the standard form offset+ X beta + Z b, where X is the design matrix of fixed effects and Z is a design matrix of random effects (typically an incidence matrix with 0s and 1s, but not necessarily). Models are fitted by an iterative algorithm alternating estimation of fixed effects and of dispersion parameters. The residual dispersion may follow a “structured-dispersion model” modeling heteroscedasticity. Estimation of the latter parameters is performed by a form of fit of debiased residuals, which allows fitting a structured-dispersion model (Smyth et al. 2001). However, evaluation of the debiased residuals can be slow in particular for large datasets. For models without structured dispersion, it is then worth using the fitme function. Ths function (as well as corrHLfit) can optimize the likelihood of HLfit fits for different given values of the dispersion parameters (“outer optimization”), thereby avoiding the need to estimate debiased residuals.

Usage

HLfit(formula, data, family = gaussian(), rand.family = gaussian(), 
      resid.model = ~1, REMLformula = NULL, verbose = c(inner = FALSE), 
      HLmethod = "HL(1,1)", method="REML", control.HLfit = list(), 
      control.glm = list(), init.HLfit = list(), fixed=list(), ranFix, 
      etaFix = list(), prior.weights = NULL, weights.form = NULL, X2X=NULL, 
      processed = NULL)
## see 'rand.family' argument for inverse.Gamma

Value

An object of class HLfit, which is a list with many elements, not all of which are documented.

Various extractor functions are available (see extractors, vcov, get_fittedPars, get_matrix, and so on). They should be used as far as possible as they should be backward-compatible from version 2.0.0 onwards, while the structure of the return object may still evolve. The following information may be useful for extracting further elements of the object.

Elements include descriptors of the fit:

eta: Fitted values on the linear scale (including the predicted random effects). predict(.,type="link") can be used as a formal extractor;
fv: Fitted values (\(\mu=\)<inverse-link>(\(\eta\))) of the response variable. fitted(.) or predict(.) can be used as formal extractors;
fixef: The fixed effects coefficients, \(\beta\) (returned by the fixef function);
v_h: The random effects on the linear scale, \(v\), with atttribute the random effects \(u\) (returned by ranef(*,type="uncorrelated");
phi: The residual variance \(\phi\). See residVar for one extractor;
phi.object: A possibly more complex object describing \(\phi\) (see residVar again);
lambda: The random-effect (\(u\)) variance(s) \(\lambda\) in compact form;
lambda.object: A possibly more complex object describing \(\lambda\) (see get_ranPars(.,which="lambda")) and VarCorr extractors);
ranef_info: environment where information about the structure of random effects is stored (see Corr);
corrPars: Agglomerates information on correlation parameters, either fixed, or estimated ((see get_ranPars(.,which="corrPars")));
APHLs: A list whose elements are various likelihood components, including conditional likelihood, h-likelihood, and the Laplace approximations: the (approximate) marginal likelihood p_v and the (approximate) restricted likelihood p_bv (the latter two available through the logLik function). See the extractor function get_any_IC for information criteria (“AIC”) and effective degrees of freedom;

The covariance matrix of \(\beta\) estimates is not included as such, but can be extracted by vcov.

Information about the input is contained in output elements named as arguments of the fitting function calls (data,family,resid.family,ranFix,prior.weights), with the following notable exceptions or modifications:

predictor: The formula, possibly reformatted (returned by the formula extractor);
resid.predictor: Analogous to predictor, for the residual variance (see residVar(., which="formula"));
rand.families: corresponding to the rand.family input;

Further miscellaneous diagnostics and descriptors of model structure:

X.pv: The design matrix for fixed effects (returned by the model.matrix extractor);
ZAlist,strucList: Two lists of matrices, respectively the design matrices “Z”, and the “L” matrices, for the different random-effect terms. The extractor get_ZALMatrix can be used to reconstruct a single “ZL” matrix for all terms.
BinomialDen: (binomial data only) the binomial denominators;
y: the response vector; for binomial data, the frequency response.
models: Additional information on model structure for \(\eta\), \(\lambda\) and \(\phi\);
HL: A set of indices that characterize the approximations used for likelihood;
leve_phi,lev_lambda: Leverages (see hatvalues extractor);
dfs: list (possibly structured): some information about degrees of freedom for different components of the model. But its details may be difficult to interpret and the DoF extractor should be used;
how: A list containing the information properly extracted by the how function;
warnings: A list of warnings for events that may have occurred during the fit.

Finally, the object includes programming tools: call, spaMM.version, fit_time and an environment envir that may contain whatever may be needed in some post-fit operations..

Arguments

formula: A formula; or a predictor, i.e. a formula with attributes created by Predictor, if design matrices for random effects have to be provided. See Details in spaMM for allowed terms in the formula (except spatial ones).
data: A data frame containing the variables named in the model formula.
family: A family object describing the distribution of the response variable. See Details in spaMM for handled families.
rand.family: A family object describing the distribution of the random effect, or a list of family objects for different random effects (see Examples). Possible options are gaussian(), Gamma(log), Gamma(identity) (see Details), Beta(logit), inverse.Gamma(-1/mu), and inverse.Gamma(log). For discussion of these alternatives see Lee and Nelder 2001 or Lee et al. 2006, p. 178-. Here the family gives the distribution of a random effect \(u\) and the link gives v as function of \(u\) (see Details). If there are several random effects and only one family is given, this family holds for all random effects.
resid.model: Used to specify a model for the dispersion parameter of the mean-response family. See the resid.model documentation, and the more specific phi-resid.model one for the \(phi\) parameter of gaussian and Gamma response families.
REMLformula: A model formula that controls the estimation of dispersion parameters and the computation of restricted likelihood (p_bv), where the conditioning inherent in REML is defined by a model different from the predictor formula. A simple example (useless in practice) of its effect is to replicate an ML fit by specifying method="REML" and an REMLformula with no fixed effect. The latter implies that no conditioning is performed and that p_bv equals the marginal likelihood (or its approximation), p_v. One of the examples in update.HLfit shows how REMLformula can be useful, but otherwise this argument may never be needed for standard REML or ML fits. For non-standard likelihood ratio tests using REMLformula, see fixedLRT.
verbose: A vector of booleans or integers. The inner element controls various diagnostic messages (possibly messy) about the iterations. This should be distinguished from the TRACE element, meaningful in fitme or corrHLfit calls, and much more useful. The phifit element controls messages about the progress of phi-resid.model fits (see the latter documentation).
method: Character: the fitting method. allowed values include "REML", "ML", "EQL-" and "EQL+" for all models, and "PQL" (="REPQL") and "PQL/L" for GLMMs only. method=c(<"ML" or "REML">,"exp") can be distinctly useful for slow fits of models with Gamma(log) response family. See (see method) for details, and further possible values for those curious to experiment. The default is REML (standard REML for LMMs, an extended definition for other models). REML can be viewed as a form of conditional inference, and non-standard conditionings can be called by using a non-standard REMLformula.
HLmethod: Same as method. It is useless to specify HLmethod when method is specified. The default value "HL(1,1)" means the same as method="REML", but more accurately relates to definitions of approximations of likelihood in the \(h\)-likelihood literature.
control.HLfit: A list of parameters controlling the fitting algorithms, which should mostly be ignored in routine use. See control.HLfit for possible controls.
control.glm: List of parameters controlling calls to glm-“like” fits, passed to glm.control; e.g.
control.glm=list(maxit=100). See glm.control for further details. glm-“like” fits may be performed as part of mixed-effect model fitting procedures, in particular to provide initial values (possibly using llm.fit for non-GLM families), and for “inner” estimation of dispersion parameters.
init.HLfit: A list of initial values for the iterative algorithm, with possible elements of the list are fixef for fixed effect estimates (beta), v_h for random effects vector v in the linear predictor, lambda for the parameter determining the variance of random effects \(u\) as drawn from the rand.family distribution, and phi for the residual variance. However, this argument can be ignored in routine use.

fixed, ranFix: A list of fixed values of random effect parameters. ranFix is the old argument, maintained for back compatibility; fixed is the new argument, uniform across spaMM fitting functions. See ranFix for further information.
etaFix: A list of given values of the coefficients of the linear predictor. See etaFix for further information.
prior.weights: An optional vector of prior weights as in glm. This fits the data to a probability model with residual variance parameter given as phi/prior.weights instead of the canonical parameter phi of the response family, and all further outputs are defined to be consistent with this (see section IV in Details).
weights.form: Specification of prior weights by a one-sided formula: use weights.form = ~ pw instead of prior.weights = pw. The effect will be the same except that such an argument, known to evaluate to an object of class "formula", is suitable to enforce safe programming practices (see good-practice).
X2X: For development purposes, not documented.
processed: A list of preprocessed arguments, for programming purposes only.

Details

I. Approximations of likelihood: see method.

II. Possible structure of Random effects: see random-effects, but note that HLfit does not fit models with autocorrelated random effects.

III. The standard errors reported may sometimes be misleading. For each set of parameters among \(\beta\), \(\lambda\), and \(\phi\) parameters these are computed assuming that the other parameters are known without error. This is why they are labelled Cond. SE (conditional standard error). This is most uninformative in the unusual case where \(\lambda\) and \(\phi\) are not separately estimable parameters. Further, the SEs for \(\lambda\) and \(\phi\) are rough approximations as discussed in particular by Smyth et al. (2001; \(V_1\) method).

IV. prior weights. This controls the likelihood analysis of heteroscedastic models. In particular, changing the weights by a constant factor f should, and will, yield a fit with unchanged likelihood and (Intercept) estimates of phi also increased by f (except if a non-trivial resid.formula with log link is used). This is consistent with what glm does, but other packages may not follow this logic (whatever their documentation may say: check by yourself by changing the weights by a constant factor). Further, post-fit functiosn (in particular those extracting various forms of residuals) may be inconsistent in their handling of prior weights.

References

Lee, Y., Nelder, J. A. (2001) Hierarchical generalised linear models: A synthesis of generalised linear models, random-effect models and structured dispersions. Biometrika 88, 987-1006.

Lee, Y., Nelder, J. A. and Pawitan, Y. (2006). Generalized linear models with random effects: unified analysis via h-likelihood. Chapman & Hall: London.

Smyth GK, Huele AF, Verbyla AP (2001). Exact and approximate REML for heteroscedastic regression. Statistical Modelling 1, 161-175.

Examples

Run this code

data("wafers")
## Gamma GLMM with log link

HLfit(y ~ X1+X2+X1*X3+X2*X3+I(X2^2)+(1|batch), family=Gamma(log),
          resid.model = ~ X3+I(X3^2) ,data=wafers)

## Gamma - inverseGamma HGLM with log link
HLfit(y ~ X1+X2+X1*X3+X2*X3+I(X2^2)+(1|batch), family=Gamma(log),
          rand.family=inverse.Gamma(log),
          resid.model = ~ X3+I(X3^2) , data=wafers)

Run the code above in your browser using DataLab