HLfit: Fit mixed models with given correlation matrix

Description

This fonction fits GLMMs as well as some hierarchical generalized linear models (HGLM; Lee and Nelder 2001). HLfit fits both fixed effects parameters, and dispersion parameters i.e. the variance of the random effects and the variance of the residual error. The linear predictor is of the standard form offset+ X beta + Z v, where X is the design matrix of fixed effects and Z is a design matrix of random effects. The function also handles a linear predictor (with only fixed effects) for the residual variance.

Usage

HLfit(formula, data, family = gaussian(), rand.family = gaussian(), 
      resid.formula = ~1, REMLformula = NULL, 
      verbose=c(warn=TRUE,trace=FALSE,summary=FALSE),
      HLmethod = "HL(1,1)", control.HLfit = list(), init.HLfit = list(), 
      ranFix = list(), etaFix = list(), prior.weights = rep(1, nobs), processed = NULL)
## see 'rand.family' argument for inverse.Gamma
## see 'control.HLfit' argument for AIC

Arguments

formula

A formula; or a predictor, i.e. a formula with attributes created by Predictor, if design matrices for random effects have to be provided. See

data

A data frame containing the variables named in the model formula.

family

A family object describing the distribution of the response variable. Possible values include the gaussian, poisson, binomial and Gamma families. Possible links are identity, log, inverse, logit, probit, and cloglog. Possible combina

rand.family

A family object describing the distribution of the random effect, or a list of family objects for different random effects (see Examples). Possible options are gaussian(), Gamma(log), Beta(l

resid.formula

A formula (without left-hand side) for the variance phi of the residual error. Currently can only contain fixed effects, including an offset. A log link is assumed by default, but an identity link can be used (see control.HLfit).

REMLformula

A model formula that allows the estimation of dispersion parameters, and computation of restricted likelihood (p_bv) under a model different from the predictor formula. For example, if only random effects ar

verbose

A vector of booleans. trace controls various diagnostic (possibly messy) messages about the iterations. summary controls whether a summary of the fit is called by HLfit. warn is for programming

HLmethod

Allowed values are "REML", "ML", "EQL-" and "EQL+" for all models; "PQL" (="REPQL") and "PQL/L" for GLMMs only; and (only for those curious to experiment) e

control.HLfit

A list of parameters controlling (1) AIC computation; and (2) the fitting algorithms. AIC=TRUE provides several information criteria and a related effective degrees of freedom (see Details). The default is FALS

init.HLfit

A list of initial values for the iterative algorithm, with possible elements of the list are fixef for fixed effect estimates (beta), v_h for random effects vector v in the linear predictor, lambda

ranFix

A list of fixed values of random effect parameters, with possible elements lambda, and also phi for gaussian and Gamma HGLMs. Inhibits the estimation of these parameters.

etaFix

A list of fixed values of the coefficients of the linear predictor, with currently documented element beta. etaFix$beta should be a vector with names matching (a subset of) coefficient names of a fit without fixed values. It prov

prior.weights

An optional vector of prior weights as in glm. This fits the data to a model with residual variance phi/prior.weights, so that increasing the weights by a constant factor f will yield (

processed

A list of preprocessed arguments, for programming purposes only (as in corrHLfit code).

Value

An object of class HLfit, actually a list with many elements, several of which represent input arguments. Some elements may be undocumented. A few extractor functions are available (see extractors), and should be used as far as possible as they should be backward-compatible from version 1.4 onwards, while the structure of the return object may still evolve (to further deal with this, the return object includes a version tag as element spaMM.version). The following information will be useful for extracting further elements of the object. Elements describing the fit include:
fixefThe fixed effects coefficients, $\beta$ (returned by the fixef function)
ranefThe random effects $u$ (returned by the ranef function)
fvFitted values ($\mu=$($\eta$)) of the response variable (returned by the fitted function)
APHLsA list with usually four elements, the conditional likelihood, the h-likelihood, and the two adjusted profile h-likelihoods: the (approximate) marginal likelihood p_v and the (approximate) restricted likelihood p_bv (the latter two available through the logLik function).
beta_covCovariance matrix of $\beta$ estimates
v_hThe random effects on the linear scale, $v$
phiThe residual variance $\phi$
phi.objectA possibly more complex object describing $\phi$
lambdaThe random effects ($u$) variance $\lambda$
lambda.objectA possibly more complex object describing $\lambda$
etaFitted values on the linear scale (including the predicted random effects)
It may also be worth checking the following elements
HLA set of indices that characterize the approximations used for likelihood.
warningsA list of warnings for events that may have occurred during the fit.
Additional information about the fit is contained in
ZALmatrixThe design matrix for random effects (see Details).
and in (possibly reformatted) input arguments
predictorThe response predictor.
dataThe input data.
familya family object corresponding to the family input
rand.familya family object corresponding to the rand.family input
ythe response vector; for binomial data, the frequency response
formulaDispModel formula for dispersion response
resid.familyFamily for estimation of dispersion from residuals
XThe design matrix for fixed effects
ranFixA ranFix input
corrParsAdditional information on correlation parameters, not necessarily used by HLfit itself but in upper calling functions such as HLCor or corrHLfit
modelsAdditional information on model structure for $\eta$, $\lambda$ and $\phi$
weights(binomial data only) the binomial denominators

Details

Fitting methods: Many approximations for likelihood have been defined to fit mixed models (e.g. Noh and Lee (2007) for some overview), and this function only considers a subset of them, but it adds a new complication in terms of REML methods. For example, PQL as originally defined by Breslow and Clayton uses REML to estimate dispersion parameters, but this function allows one to use ML instead. Moreover, it allows some non-standard specification of the model formula that determines the conditional distribution used in REML. In the more general syntax for HLmethod, used as e.g. HLmethod="RE(1,1)" the first '1' means that a first order Laplace approximation to the likelihood is used to estimate fixed effects (a '0' would instead mean that the h likelihood is used as the objective function). The second '1' means that a first order Laplace approximation to the likelihood or restricted likelihood is used to estimate dispersion parameters, including the dv/d tau term specifically discussed by Lee & Nelder 2001, p. 997 (a '0' would instead mean that these terms are ignored). It is possible to enforce the EQL approximation for estimation of dispersion parameter by adding a third index with value 0. "HL(0,1,0)" is Lee & Nelder's (2001) method, i.e. "EQL+". For a Gamma GLM with log link, ML and EQL results will differ in their phi estimates, and the EQL estimate will match that from the glm function. Random effects are constructed in several steps. first, a vector u of independent and identically distributed (iid) random effects is drawn from some distribution; second, a transformation v=f(u) is applied to each element (this defines v which elements are still iid); third, correlated random effects are obtained as Lv where L is the square root of a correlation matrix (this may be meaningful only for Gaussian random effects). Finally, a matrix Z (or sometimes ZA, see Predictor) allows to specify how the correlated random effects affect the response values. In particular, Z is the identity matrix if there is a single observation (response) for each location, but otherwise its elements $z_{ji}$ are 1 for the $j$th observation in the $i$th location. The design matrix for v is then of the form ZL. The specification of the random effects u and v handles the following cases: [object Object],[object Object],[object Object],[object Object] The standard errors reported may sometimes be misleading. For each set of parameters among $\beta$, $\lambda$, and $\phi$ parameters these are computed assuming that the other parameters are known without error. This is why they are labelled Cond. SE (conditional standard error). This is most uninformative in the unusual case where $\lambda$ and $\phi$ are not separately estimable parameters. Further, the SEs for $\lambda$ and $\phi$ are rough approximations as discussed in particular by Smyth et al. (2001; $V_1$ method). Effective degrees of freedom for the random effects: this number (approximately) characterizes the expectation of a goodness of fit statistic discussed by Lee and Nelder (2001), which gave a general formula for it in HGLMs. It is computed if AIC=TRUE and returned as the object's $APHLs$GoFdf. AIC: The literature considers several information criteria for mixed models. The conditional AIC (Vaida and Blanchard 2005) is notable in involving the conditional likelihood and the effective degrees of freedom. Lee et al. (2006) and Ha et al (2007) similarly define a corrected AIC [i.e., AIC(D*) in their eq. 7]. The conditional AIC returned by HLfit includes both this effective df, df for estimated fixed effects, and df for estimated parameters of the variance of random effects. HLfit will also return the classical AIC (marginal AIC), and a focussed AIC for dispersion parameters, also discussed by Ha et al (2007; eq.10).

References

Breslow, NE, Clayton, DG. (1993). Approximate Inference in Generalized Linear Mixed Models. Journal of the American Statistical Association 88, 9-25. Cox, D. R. and Donnelly C. A. (2011) Principles of Applied Statistics. Cambridge Univ. Press. Ha, I. D., Lee, Y. and MacKenzie, G. (2007) Model selection for multi-component frailty models. Statistics in Medicine 26: 4790-4807. Lee, Y., Nelder, J. A. (2001) Hierarchical generalised linear models: A synthesis of generalised linear models, random-effect models and structured dispersions. Biometrika 88, 987-1006. Lee, Y., Nelder, J. A. and Pawitan, Y. (2006). Generalised linear models with random effects: unified analysis via h-likelihood. Chapman & Hall: London. Noh, M., and Lee, Y. (2007). REML estimation for binary data in GLMMs, J. Multivariate Anal. 98, 896-915. Smyth GK, Huele AF, Verbyla AP (2001). Exact and approximate REML for heteroscedastic regression. Statistical Modelling 1, 161-175. Vaida, F., and Blanchard, S. 2005. Conditional Akaike information for mixed-effects models. Biometrika 92, 351-370.

Examples

Run this code

data(wafers)
## Gamma GLMM with log link
HLfit(y ~X1+X2+X1*X3+X2*X3+I(X2^2)+(1|batch),family=Gamma(log),
          resid.formula = ~ X3+I(X3^2) ,data=wafers)## Gamma - inverseGamma HGLM with log link
HLfit(y ~X1+X2+X1*X3+X2*X3+I(X2^2)+(1|batch),family=Gamma(log),
          HLmethod="HL(1,1)",rand.family=inverse.Gamma(log),
          resid.formula = ~ X3+I(X3^2) ,data=wafers)

Run the code above in your browser using DataLab