lsmeans: Least-squares means

Description

Compute least-squares means for specified factors or factor combinations in a linear model, and optionally comparisons or contrasts among them.

Usage

lsmeans(object, specs, adjust = c("auto","tukey","sidak","scheffe", p.adjust.methods),
  conf = .95, at, trend, contr = list(), 
  cov.reduce = function(x, name) mean(x), 
  fac.reduce = function(coefs, lev) apply(coefs, 2, mean), 
  glhargs = NULL, lf = FALSE, mlf = rep(1,nresp)/nresp, ...)
  
## S3 method for class 'lsm':
print(x, omit = NULL, ...)
## S3 method for class 'data.frame.lsm':
print(x, ...)

lsm(...)
## S3 method for class 'lsmlf':
glht(model, linfct, ...)

Arguments

object

A lm, aov (with no Error component), glm, lme, gls, lmer, or glmer object having at least one fixed factor among the predictors.

specs

A formula, or a list of formulas, specifying the desired families of least-squares means. The right-hand side of each formula specifies the desired predictors for which we want least-squares means. The optional left-hand side specifies what kind of comp

adjust

Adjustment method for the p values of tests of contrasts. "auto" uses the method returned in the "adjust" attribute of the contrast function; "tukey" computes p values using the Studentized range distribution with

conf

Desired confidence level for intervals. For robustness, you may specify either a fraction or a percentage; i.e., .975 and 97.5 yield the same results.

An optional named list or named vector of predictor values for the reference grid. If no value is found in at for a particular predictor, then if it is a factor, all its levels are used, and if a covariate, we use the results of cov.red

trend

If non-missing and trend is the name of a numeric predictor or a term in object's model, then the linear trends for this predictor are estimated and compared instead of the least-squares means. This is useful in models where the

contr

An optional named list. Each entry is itself a list or a data.frame specifying contrast coefficients. If the left-hand side of a formula in specs matches a name in contr, then those contrast

cov.reduce

A logical value, or a function with arguments x and name that should return the value to use in prediction for the covariate with name name and values x. By default, the mean is used. The name

fac.reduce

A function of coefs and lev where lev is the level of a factor or factor combination at which a least-squares mean is calculated. The argument coefs is a matrix whose rows correspond to the combinations

glhargs

If this is a list, the object and specified contrasts are passed to the function glht in the multcomp package, with the contents of glhargs as additional arguments.

If TRUE, the linear functions of the regression coefficients are returned unevaluated.

model

model argument for glht

linfct

linfct argument for glht

mlf

Numeric vector defining a linear function of a multivariate response (i.e., y %*% mlf where y is the response variable of dimension nresp). The default is the mean.)

...

Additional argument(s) passed to the contrast function, or to glht; see Details.

Object of class "lsm"

omit

Indexes of elements of x that you do not want printed.

Value

If lf==FALSE, lsmeans returns an object of class "lsm", which inherits from "list". Each element of the list is either of class data.frame.lsm or "summary.glht" (see the documentation for glht). (The latter occur only if glhargs is non-NULL.) Each element summarizes a family of least-squares means or contrasts among them. Each data.frame.lsm element is an extension of a data.frame and contains lsmeans or contrast estimates and associated quantities. If lf==TRUE, lsmeans returns a list of matrices containing the linear functions generated by specs. The "lsm" and "data.frame.lsm" classes each have only a print method, which exist to display results in the desired format. The lsm function returns an object of class "lsmlf", and may be used in a manner similar to mcp in a call to glht. This is implemented via the provided S3 method for glht for "lsmlf" objects. See the examples.

Details

Least-squares means, popularized by SAS, are predictions from a linear model at combinations of specified factors. SAS's documentation describes them as ``predicted population margins---that is, they estimate the marginal means over a balanced population'' (SAS Institute 2012). In generalized linear models, least-squares means are marginal linear predictions that can be transformed back to the response scale via the inverse-link function. Unspecified factors and covariates are handled by summarizing the predictions over those factors and variables. For example, if the fitted model has formula response ~ x1 + x2 + treat where a1 and x2 are numeric and treat is a factor, the least-squares means will be the predicted response for each treatment, at some specified values of x1 and x2. By default, the means of the two covariates will be used, resulting in what ANOVA textbooks often call the adjusted means. We may use that at argument to instead make predictions at other values of x1 and x2. Now consider the model response ~ A + B + A:B, where A and B are both factors. If we ask for least-squares means for A, then at each level of A we are faced with a different prediction for each level of B. Blind (and default) use of least-squares means would result in these predictions being averaged together with equal weight, and this may be inappropriate, especially when the interaction effect is strong. (A warning is generated in such potentially inappropriate cases.) Like most statistical calculations, it is possible to use least-squares means inappropriately. The fac.reduce argument at least expands one's options in producing meaningful results in multi-factor situations. If you give multiple values for a covariate in at or as a result of cov.reduce, the covariate is treated just like a factor, in that predictions are made at each combination of its values with other factors and multi-valued covariates, and averaged as appropriate using fac.reduce. If you want to see separate predictions for each value, include the covariate in specs. (This differs from what SAS does with non-unique at values.) One other note concerning covariates: One must be careful with covariates that depend on one another. For example, if a model contains covariates x and xsq where xsq = x^2, the default behavior will make predictions at x = mean(x) and xsq = mean(xsq), which probably isn't a valid combination (we need x = mean(x) and xsq = mean(x)^2). The inconsistency is avoided if the model specifis poly(x,2) (or even x + I(x^2)) instead of x + xsq, because then only x is averaged, and everything remains consistent. The built-in contrast methods that can be used in specs formulas are pairwise, revpairwise, poly, trt.vs.ctrl, trt.vs.ctrl1, and trt.vs.ctrlk. They are implemented as functions having the same names with .lsmc added (e.g., pairwise.lsmc). Users may write additional .lsmc functions that generate custom families of contrasts. See the documentation for pairwise.lsmc for an example. Experimental feature: If a contrast family named cld is specified, then it is the same as pairwise, with the side effect that a column named .group is added to the table of least-squares means. This column is a compact letter display (CLD): means associated with one or more of the same letters have P values greater than 1 - conf in the selected pairwise-comparison tests. For more details on CLDs, see the documentation for cld in the multcomp package. When trend is specified and it names a predictor variable in the model, the trend is estimated using a difference quotient over 1/1000 the range of the predictor (centered at the predictor's at value or its cov.reduce result). If trend instead names a model term, then the trend is determined symbolically. It could be informative to run with lf=TRUE. See the examples. Degrees of freedom are currently not provided for lme or glme objects, or for mer objects arising from generalized linear models; in those cases, asymptotic results are printed, and this fact is emphasized by displaying NA for the defrees of freedom. For linear mer objects, degrees of freedom are computed using the Kenward and Roger (1997) method, provided the pbkrtest package is installed (the package is loaded if needed). Moreover, in that case, the adjusted covariance matrix from the vcovAdj() function in the pbkrtest package is used to calculate standard errors. See Halekoh and H?jsgaard (2012) and the documentation for KRmodcomp for more details. Degrees of freedom are not known to glht except in the case of lm objects. Note that glht requires the degrees of freedom to be an integer; accordingly, if df is included in glhargs, it is coerced to an integer value of max(1, as.integer(df + .2)). That is, it is forced to be at least 1, and it is taken up to the next higher integer if its fractional part is at least .8. For users of the package multcomp, linear functions corresponding to specs may be passed to glht in two different ways. One is to call glht from within lsmeans by specifying glhtargs. The other way is to call lsmeans from within glht by supplying a call to lsm in its linfct argument. The first way provides summary.glht objects for all contrast results (but not for the lsmeans). The second way passes just one set of linear functions to glht; in particular, only the first element of specs is used, and the last set of linear functions from that specification are given to glht. Thus, glht(model, lsm(~treat)) will estimate the lsmeans of treatments, while glht(model, lsm(pairwise~treat)) will estimate the pairwise comparisons. If the model contains a matrix among its predictors, each column is averaged using the function specified in cov.reduce. There is no provision for matrices in the at argument.

References

Halekoh, U. and H?jsgaard, S. (2012) A Kenward-Roger Approximation and parametric bootsrap methods for tests in linear mixed models -- the R package pbkrtest, submitted. Kenward, M.G. and Roger, J.H. (1997) Small sample inference for fixed effects from restricted maximum likelihood, Biometrics, 53, 983--997. SAS Institute Inc. (2012) Online documentation; Shared concepts; LSMEANS statement, http://support.sas.com/documentation/cdl/en/statug/63962/HTML/default/viewer.htm#statug_introcom_a0000003362.htm, accessed August 15, 2012.

Examples

Run this code

require(lsmeans)

### Covariance example (from Montgomery Design (8th ed.), p.656)
# Uses supplied dataset 'fiber'
fiber.lm <- lm(strength ~ diameter + machine, data = fiber)

# adjusted means and comparisons, treating machine C as control
lsmeans (fiber.lm, trt.vs.ctrlk ~ machine)


### Factorial experiment
warp.lm <- lm(breaks ~ wool * tension, data = warpbreaks)
#-- We only need to see the wool*tension means listed once ...
print(lsmeans (warp.lm,  list(pairwise ~ wool | tension,  poly ~ tension | wool)),
    omit=3)

# Compact letter display
lsmeans(warp.lm, cld ~ tension | wool)


### Unbalanced split-plot example ###
#-- The imbalance is imposed deliberately to illustrate that
#-- the variance estimates become biased
require(nlme)
Oats.lme <- lme(yield ~ factor(nitro) + Variety, random = ~1 | Block/Variety, 
    subset = -c(1,2,3,5,8,13,21,34,55), data=Oats)
lsmeans(Oats.lme, list(poly ~ nitro, pairwise ~ Variety))

# Model with a quadratic trend for 'nitro'
Oatsq.lme <- update(Oats.lme, . ~ nitro + I(nitro^2) + Variety)

# Predictions at each unique 'nitro' value in the dataset
lsmeans(Oatsq.lme, ~ nitro, cov.reduce = FALSE)

# lsmeans for 'Variety' at the average 'nitro' value.
lsmeans(Oatsq.lme, ~ Variety)
# These differ quite a bit from those obtained from 'Oats.lme', because each
# is a single prediction rather than the average of 4 predictions

# lsmeans for 'Variety', using the four unique 'nitro' values
lsmeans(Oatsq.lme, ~ Variety, cov.reduce = FALSE)
# These are pretty close to those from 'Oats.lme' because both are based
# on averaging predictions for the 12 


# Compare with lmer result 
if (require(lme4)) {
    
    Oats.lmer <- lmer(yield ~ factor(nitro) + Variety + (1 | Block/Variety), 
                      subset = -c(1,2,3,5,8,13,21,34,55), data=Oats)
    # If pbkrtest installed, provides df, bias-adjusted SEs
    if (require(pbkrtest)) {
        lsmeans(Oats.lmer, list(poly ~ nitro, pairwise ~ Variety))
    }
    
    # Using in conjunction with 'glht' (note -- this does not use adjusted vcov)
    # calling 'glht' from 'lsmeans' ...
    lsmeans(Oats.lmer, pairwise ~ Variety, glhargs=list(df=9))

    # calling 'lsmeans' from 'glht' to get simultaneous CIs
    confint(glht(Oats.lmer, linfct = lsm(~ Variety), df=9))

    # Custom contrasts
    lsmeans(Oats.lmer, my.own ~ Variety, 
            contr = list(my.own = list(G.vs.M = c(1,-1,0), GM.vs.V = c(.5,.5,-1))))
}
    

#### Examples with trends...
# Model with interaction
fiber.lm2 = lm(strength ~ diameter * machine, data = fiber)

# Compare the linear trends for diameter
lsmeans(fiber.lm2, pairwise ~ machine, trend = "diameter")

# Model with log(diameter) as the covariate
fiber.lm3 = lm(strength ~ log(diameter) * machine, data = fiber)
# Compare the fitted linear trends for log(diameter)
lsmeans(fiber.lm3, pairwise ~ machine, trend = "log(diameter)")

# Compare the fitted linear trends for diameter itself 
# - this is done via a diff quotient - compare with fiber.lm2 results
lsmeans(fiber.lm3, pairwise ~ machine, trend = "diameter")

# Examine the linear functions generated for these examples
lsmeans(fiber.lm2, ~ machine, trend = "diameter", lf = TRUE)
lsmeans(fiber.lm3, ~ machine, trend = "log(diameter)", lf = TRUE)
lsmeans(fiber.lm3, ~ machine, trend = "diameter", lf = TRUE)

Run the code above in your browser using DataLab