predict: Prediction from a model fit.

Description

Prediction of the response variable by its expected value obtained as (the inverse link transformation of) the linear predictor ($\eta$) and more generally for terms of the form X_n$\beta$+Z_nLv, for new design matrices X_n and Z_n. Various components of prediction variances and predictions intervals can also be computed using predict. The get_... functions are convenient extractors for such components.

Usage

"predict"(object, newdata = newX, newX = NULL, re.form = NULL, variances=list(), binding = FALSE, intervals = NULL, level = 0.95, ...)
get_fixefVar(...)
get_predVar(...)
get_residVar(...)
get_respVar(...)

Arguments

object

The return object of fitting functions HLfit,corrHLfit,HLCor... returning an object inheriting from HLfit class.

newdata

Either NULL, a matrix or data frame, or a numeric vector. If NULL, the original data are reused. Otherwise, all variables required to evaluate model formulas must be included. Which variables are required may depend on other arguments: see “prediction with given phi's” example, also illustrating the syntax when formulas include an offset. or a numeric vector, which names (if any) are ignored. This makes it easier to use predict as an objective function for an optimization procedure such as optim, which calls the objective function on unnamed vectors. However, one must make sure that the order of elements in the vector is the order of first occurrence of the variables in the model formula. This order can be checked in the error message returned when calling predict on a newX vector of clearly wrong size, e.g. predict(.

newX

equivalent to newdata, available for back-compatibility

re.form

formula for random effects to include. By default, it is NULL, in which case all random effects are included. If it is NA, no random effect is included. If it is a formula, only the random effects it contains are retained. The other variance components are removed from both point prediction and variances calculations. If you want to retain only the spatial effects in the point prediction, but all variances, either use re.form and add missing variances (on linear predictor scale) manually, or ignore this argument and see Details and Examples for different ways of controlling variances.

variances

A list which elements control the computation of different estimated variances. In particular, list(linPred=TRUE,disp=TRUE) is suitable for uncertainty in point prediction.

predict can return four components of prediction variance: fixefVar, predVar, residVar and respVar, detailed below. They are all returned as attributes of the point predictions. By default, each component is a vector of variances. However, if variances$cov=TRUE, a covariance matrix is returned when applicable (i.e. not for "residVar"). fixefVar is the (co)variance of fixed effects (X$\beta$) due to uncertainty in $\beta$. It is called by variances$fixefVar=TRUE. predVar is the (co)variance of the linear predictor $\eta$. It is the most common measure of uncertainty in point prediction. It accounts for uncertainty in fixed effects (X$\beta$) and random effects (ZLv) for given dispersion parameters (see Details), but it can also accounts for uncertainty in dispersion parameters ($\lambda$ and $phi$) estimates if variances$disp=TRUE, for models in which the effect of uncertainty in dispersion parameters can be computed. Currently, this effect can be computed for a scalar residual variance ($\phi$) and a single random effect with a scalar variance ($\lambda$). variances$predVar=TRUE will return the sum of the two components, if available; otherwise it returns only the (co)variance for given $\lambda$ and $phi$. The latter component can be requested by variances$linPred=TRUE. residVar provides the residual variances (for Gaussian or Gamma responses). It is called by variances$residVar=TRUE. respVar is the variance of the response, i.e. the sum of predVar and residVar. It is called by variances$respVar=TRUE. Calling for one (co)variance implies that some of its components may be also returned.

intervals

(vector of) character string(s). Provides prediction intervals with (intended) level level, deduced from the given prediction variance term, e.g. intervals="predVar". Currently only intervals from fixefVar and predVar (and for LMMs respVar including the residual variance) may have a probabilistic meaning. Intervals returned in other cases are (currently) meaningless.

level

Coverage of the intervals.

binding

If binding is a character string, the predicted values are bound with the newdata and the result is returned as a data frame. The predicted values column name is the given binding, or a name based on it, if the newdata already include a variable with this name). If binding is FALSE, The predicted values are returned as a matrix and the data frame used for prediction is returned as an attribute (unless it was NULL).

...

further arguments passed to or from other methods. For the get_... functions, they are passed to predict.

Value

For predict, a matrix or data frame (according to the binding argument), with optional attributes frame, intervals, predVar, fixefVar, residVar, and/or respVar, the last four holding one or more variance vector or covariance matrices. The further attribute fittedName contains the binding name, if any.The get_... extractor functions call predict and extract from its result the attribute implied by the name of the extractor.

Details

If newdata is NULL, predict returns the fitted responses, including random effects, from the object. Otherwise it computes new predictions including random effects as far as possible. For spatial random effects it constructs a correlation matrix C between new locations and locations in the original fit. Then it infers the random effects in the new locations as C (L'$)^{-1}$ v (see spaMM for notation). For non-spatial random effects, it checks whether any group (i.e., level of a random effect) in the new data was represented in the original data, and it adds the inferred random effect for this group to the prediction for individuals in this group.

fixefVar is the (co)variance of X$\beta$ (or X_n$\beta$), deduced from the asymptotic covariance matrix of $\beta$ estimates.

predVar is the prediction (co)variance of $\eta$=X$\beta$+Zv (see HLfit Details for notation), or more generally of X_n$\beta$+Z_nLv, by default computed for given dispersion parameters. For levels of the random effects present in the original data, predVar computation assumes that the covariance matrix of $\beta$ and v estimates is the inverse of the expected Hessian matrix (for given dispersion parameters) of the augmented linear model for $\beta$ and v. It thus takes into account the joint uncertainty in estimation of $\beta$ and prediction of v. For new levels of the random effects, predVar computation additionally takes into account uncertainty in prediction of v for these new levels. For prediction covariance with a new Z_n, it matters whether a single or multiple new levels are used: see Examples.

If variances$disp is TRUE, prediction variance may also include a term accounting for uncertainty in $\phi$ and $\lambda$, computed following Booth and Hobert (1998, eq. 19). This computation is currently implemented for models with a single random effect, and ignore uncertainties in spatial correlation parameters.

For models with non-Gaussian response, the prediction covariance of the response is approximated by the prediction covariance of the linear predictor, pre- and post-multiplied by $d\mu/d\eta$. These variance calculations are approximate except for LMMs, and cannot be garanteed to give accurate results.

In the point prediction of the linear predictor, the unconditional expected value of $u$ is assigned to the realizations of $u$ for unobserved levels of non-spatial random effects (it is zero in GLMMs but not for non-gaussian random effects), and the inferred value of $u$ is assigned in all other cases. Corresponding values of $v$ are then deduced. This computation yields the classical “BLUP” or empirical Bayes predictor in LMMs, but otherwise it may yield less well characterized predictors, where “unconditional” $v$ may not be its expected value when the rand.family link is not identity.

References

Booth, J.G., Hobert, J.P. (1998) Standard errors of prediction in generalized linear mixed models. J. Am. Stat. Assoc. 93: 262-272.

Examples

Run this code

data(blackcap)
fitobject <- corrHLfit(migStatus ~ 1 + Matern(1|latitude+longitude),data=blackcap,
                       ranFix=list(nu=4,rho=0.4,phi=0.05))
predict(fitobject)
getDistMat(fitobject)

#### multiple controls of prediction variances
## (1) fit with an additional random effect
grouped <- cbind(blackcap,grp=c(rep(1,7),rep(2,7))) 
fitobject <- corrHLfit(migStatus ~ 1 +  (1|grp) +Matern(1|latitude+longitude),
                       data=grouped,  ranFix=list(nu=4,rho=0.4,phi=0.05))

## (2) re.form usage to remove a random effect from point prediction and variances: 
predict(fitobject,re.form= ~ 1 +  Matern(1|latitude+longitude))

## (3) comparison of covariance matrices for two types of new data
moregroups <- grouped[1:5,]
rownames(moregroups) <- paste("newloc",1:5,sep="")
moregroups$grp <- rep(3,5) ## all new data belong to an unobserved third group 
cov1 <- get_predVar(fitobject,newdata=moregroups,
                     variances=list(linPred=TRUE,cov=TRUE))
moregroups$grp <- 3:7 ## all new data belong to distinct unobserved groups
cov2 <- get_predVar(fitobject,newdata=moregroups,
                     variances=list(linPred=TRUE,cov=TRUE))
cov1-cov2 ## the expected off-diagonal covariance due to the common group in the first fit.

## Not run: 
# ## prediction with distinct given phi's in different locations:
# varphi <- cbind(blackcap,logphi=runif(14))
# vphifit <- corrHLfit(migStatus ~ 1 + Matern(1|latitude+longitude), 
#                      resid.model = list(formula=~0+offset(logphi)),
#                      data=varphi,  ranFix=list(nu=4,rho=0.4))
# # for respVar computation, one needs the resid.model formula to specify phi:
# get_respVar(vphifit,newdata=data.frame(latitude=1,longitude=1,logphi=1))
# # for predVar computation, phi is not needed 
# #     (and could have been specified through ranFix):  
# get_predVar(vphifit,newdata=data.frame(latitude=1,longitude=1))                     
# 
# ## Effects of numerically singular correlation matrix C:
# fitobject <- corrHLfit(migStatus ~ 1 + Matern(1|latitude+longitude),data=blackcap,
#                        ranFix=list(nu=10,rho=0.001)) ## numerically singular C
# predict(fitobject) ## predicted mu computed as X beta + L v 
# predict(fitobject,newdata=blackcap) ## predicted mu computed as X beta + C 
# 
# ## point predictions and variances with new X and Z
# if(require("rsae", quietly = TRUE)) {
#   data(landsat)
#   fitobject <- HLfit(HACorn ~ PixelsCorn + PixelsSoybeans + (1|CountyName),
#                      data=landsat[-33,],HLmethod="ML")
#   newXandZ <- unique(data.frame(PixelsCorn=landsat$MeanPixelsCorn,
#                                 PixelsSoybeans=landsat$MeanPixelsSoybeans,
#                                 CountyName=landsat$CountyName))
#   predict(fitobject,newdata=newXandZ,variances = list(predVar=TRUE))
#   get_predVar(fitobject,newdata=newXandZ,variances = list(predVar=TRUE))
# }
# 
# ## End(Not run)

Run the code above in your browser using DataLab