ref.grid: Create a reference grid from a fitted model

Description

Using a fitted model object, determine a reference grid for which least-squares means are defined. The resulting ref.grid object encapsulates all the information needed to calculate LS means and make inferences on them.

Usage

ref.grid(object, at, cov.reduce = mean, mult.name, mult.levs, 
    options = get.lsm.option("ref.grid"), data, df, type, 
    transform = c("none", "response", "mu", "unlink", "log"), 
    nesting, ...)
    
.Last.ref.grid

Arguments

object

An object produced by a supported model-fitting function, such as lm. Many models are supported. See models.

Optional named list of levels for the corresponding variables

cov.reduce

A function, logical value, or formula; or a named list of these. Each covariate not specified in at is reduced according to these specifications.

If a single function, it is applied to each covariate.

If logical and TRUE, mean is used. If logical and FALSE, it is equivalent to specifying function(x) sort(unique(x)), and these values are considered part of the reference grid; thus, it is a handy alternative to specifying these same values in at.

If a formula (which must be two-sided), then a model is fitted to that formula using lm; then in the reference grid, its response variable is set to the results of predict for that model, with the reference grid as newdata. (This is done after the reference grid is determined.) A formula is appropriate here when you think experimental conditions affect the covariate as well as the response.

If cov.reduce is a named list, then the above criteria are used to determine what to do with covariates named in the list. (However, formula elements do not need to be named, as those names are determined from the formulas' left-hand sides.) Any unresolved covariates are reduced using "mean".

Any cov.reduce specification for a covariate also named in at is ignored.

mult.name

Character, the name to give to the “factor” whose levels delineate the elements of a multivariate response. If this is provided, it overrides the default name, e.g., "rep.meas" for an mlm object or "cut" for a polr object.

mult.levs

A named list of levels for the dimensions of a multivariate response. If there is more than one element, the combinations of levels are used, in expand.grid order. The (total) number of levels must match the number of dimensions. If mult.name is specified, this argument is ignored.

options

If non-NULL, a named list of arguments to pass to update, just after the object is constructed.

data

A data.frame to use to obtain information about the predictors (e.g. factor levels). If missing, then recover.data is used to attempt to reconstruct the data.

This is a courtesy shortcut, equivalent to specifying options(df = df). See update.

type

If provided, this is saved as the "predict.type" setting. See update

transform

If other than "none", the reference grid is reconstructed via regrid with the given transform argument. See Details.

nesting

If the model has nested fixed effects, this may be specified here via a character vector or named list specifying the nesting structure. Specifying nesting overrides any nesting structure that is automatically detected. See Details.

…

Optional arguments passed to lsm.basis, such as vcov. (see Details below) or options for certain models (see models).

Value

An S4 object of class "ref.grid" (see ref.grid-class). These objects encapsulate everything needed to do calculations and inferences for least-squares means, and contain nothing that depends on the model-fitting procedure. As a side effect, the result is also saved as .Last.ref.grid (in the global environment, unless this variable is found in another position).

Details

The reference grid consists of combinations of independent variables over which predictions are made. Least-squares means are defined as these predictions, or marginal averages thereof. The grid is determined by first reconstructing the data used in fitting the model (see recover.data), or by using the data.frame provided in context. The default reference grid is determined by the observed levels of any factors, the ordered unique values of character-valued predictors, and the results of cov.reduce for numeric predictors. These may be overridden using at.

Ability to support a particular class of object depends on the existence of recover.data and lsm.basis methods -- see extending-lsmeans for details. The call methods("recover.data") will help identify these.

In certain models, (e.g., results of glmer.nb), it is not possible to identify the original dataset. In such cases, we can work around this by setting data equal to the dataset used in fitting the model, or a suitable subset. Only the complete cases in data are used, so it may be necessary to exclude some unused variables. Using data can also help save computing, especially when the dataset is large. In any case, data must represent all factor levels used in fitting the model. It cannot be used as an alternative to at. (Note: If there is a pattern of NAs that caused one or more factor levels to be excluded when fitting the model, then data should also exclude those levels.)

By default, the variance-covariance matrix for the fixed effects is obtained from object, usually via its vcov method. However, the user may override this via a vcov. argument, specifying a matrix or a function. If a matrix, it must be square and of the same dimension and parameter order of the fixed effects. If a function, must return a suitable matrix when it is called with object as its only argument.

Nested factors: Having a nesting structure affects marginal averaging in lsmeans in that it is done separately for each level (or combination thereof) of the grouping factors. ref.grid tries to discern which factors are nested in other factors, but it is not always obvious, and if it misses some, the user must specify this structure via nesting; or later using update. nesting may be a character vector or a named list. If a list, each name should be the name of a single factor in the grid, and its entry a character vector of the name(s) of its grouping factor(s). nested may also be a character value of the form "factor1 %in% (factor2*factor3)". If there is more than one such specification, they may be appended separated by commas, or as a character vector. For example, these specifications are equivalent: nesting = list(state = "country", city = c("state", "country"), nesting = "state %in% country, city %in% (state*country)", and nesting = c("state %in% country)", "city %in% (state*country)").

There is a subtle difference between specifying type = "response" and transform = "response". While the summary statistics for the grid itself are the same, subsequent use in lsmeans will yield different results if there is a response transformation. With type = "response", LS means are computed by averaging together predictions on the linear-predictor scale and then back-transforming to the response scale; while with transform = "response", the predictions are already on the response scale so that the LS means will be the arithmetic means of those response-scale predictions. To add further to the possibilities, geometric means of the response-scale predictions are obtainable via transform = "log", type = "response".

The most recent result of ref.grid, whether called directly or indirectly via lsmeans, lstrends, or some other function that calls one of these, is saved in the user's environment as .Last.ref.grid. This facilitates checking what reference grid was used, or reusing the same reference grid for further calculations. This automatic saving is enabled by default, but may be disabled via lsm.options(save.ref.grid = FALSE), and re-enabled by specifying TRUE.

Examples

Run this code

# NOT RUN {
require(lsmeans)

fiber.lm <- lm(strength ~ machine*diameter, data = fiber)
ref.grid(fiber.lm)
summary(ref.grid(fiber.lm))

ref.grid(fiber.lm, at = list(diameter = c(15, 25)))

# }
# NOT RUN {
# We could substitute the sandwich estimator vcovHAC(fiber.lm)
# as follows:
require(sandwich)
summary(ref.grid(fiber.lm, vcov. = vcovHAC))
# }
# NOT RUN {
# If we thought that the machines affect the diameters
# (admittedly not plausible in this example), then we should use:
ref.grid(fiber.lm, cov.reduce = diameter~machine)

# Multivariate example
MOats.lm = lm(yield ~ Block + Variety, data = MOats)
ref.grid(MOats.lm, mult.name = "nitro")
# silly illustration of how to use 'mult.levs'
ref.grid(MOats.lm, mult.levs = list(T=LETTERS[1:2], U=letters[1:2]))
# }

Run the code above in your browser using DataLab