predictInterval: Predict from merMod objects with a prediction interval

Description

This function provides a way to capture model uncertainty in predictions from multi-level models fit with lme4. By drawing a sampling distribution for the random and the fixed effects and then estimating the fitted value across that distribution, it is possible to generate a prediction interval for fitted values that includes all variation in the model except for variation in the covariance paramters, theta. This is a much faster alternative than bootstrapping for models fit to medium to large datasets.

Usage

predictInterval(merMod, newdata, level = 0.8, n.sims = 1000, stat = c("median", "mean"), type = c("linear.prediction", "probability"), include.resid.var = TRUE, returnSims = FALSE, seed = NULL, .parallel = FALSE, .paropts = NULL)

Arguments

merMod

a merMod object from lme4

newdata

a data.frame of new data to predict

level

the width of the prediction interval

n.sims

number of simulation samples to construct

stat

take the median or mean of simulated intervals

type

type of prediction to develop

include.resid.var

logical, include or exclude the residual variance for linear models

returnSims

logical, should all n.sims simulations be returned?

seed

numeric, optional argument to set seed for simulations

.parallel,

logical should parallel computation be used, default is FALSE

.paropts,

a list of additional options passed into the foreach function when parallel computation is enabled. This is important if (for example) your code relies on external data or packages: use the .export and .packages arguments to supply them so that all cluster nodes have the correct environment set up for computing.

Value

a data.frame with three columns:

fit: The center of the distribution of predicted values as defined by the stat parameter.
lwr: The lower confidence interval bound corresponding to the quantile cut defined in level.
upr: The upper confidence interval bound corresponding to the quantile cut defined in level.

If returnSims = TRUE, then the individual simulations are attached to this data.frame in the attribute sim.results and are stored as a matrix.

Details

To generate a prediction inteval, the function first computes a simulated distribution of all of the parameters in the model. For the random, or grouping, effects, this is done by sampling from a multivariate normal distribution which is defined by the BLUP estimate provided by ranef and the associated variance-covariance matrix for each observed level of each grouping terms. For each grouping term, an array is build that has as many rows as there are levels of the grouping factor, as many columns as there are predictors at that level (e.g. an intercept and slope), and is stacked as high as there are number of simulations. These arrays are then multiplied by the new data provided to the function to produce a matrix of yhat values. The result is a matrix of the simulated values of the linear predictor for each observation for each simulation. Each grouping term has such a matrix for each observation. These values can be added to get the estimate of the fitted value for the random effect terms, and this can then be added to a matrix of simulated values for the fixed effect level to come up with n.sims number of possible yhat values for each observation.

The distribution of simulated values is cut according to the interval requested by the function. The median or mean value as well as the upper and lower bounds are then returned. These can be presented either on the linear predictor scale or on the response scale using the link function in the merMod.

Examples

Run this code

m1 <- lmer(Reaction ~ Days + (1 | Subject), sleepstudy)
regFit <- predict(m1, newdata = sleepstudy[11, ]) # a single value is returned
intFit <- predictInterval(m1, newdata = sleepstudy[11, ]) # bounded values
# Can do glmer
d1 <- cbpp
d1$y <- d1$incidence / d1$size
 gm2 <- glmer(y ~ period + (1 | herd), family = binomial, data = d1,
               nAGQ = 9, weights = d1$size)
 regFit <- predict(gm2, newdata = d1[1:10, ])
 # get probabilities
 regFit <- predict(gm2, newdata = d1[1:10, ], type = "response")
 intFit <- predictInterval(gm2, newdata = d1[1:10, ], type = "probability")
 intFit <- predictInterval(gm2, newdata = d1[1:10, ], type = "linear.prediction")