Regression Deletion Diagnostics
This suite of functions can be used to compute some of the regression (leave-one-out deletion) diagnostics for linear and generalized linear models discussed in Belsley, Kuh and Welsch (1980), Cook and Weisberg (1982), etc.
influence.measures(model)rstandard(model, ...) "rstandard"(model, infl = lm.influence(model, do.coef = FALSE), sd = sqrt(deviance(model)/df.residual(model)), ...) "rstandard"(model, infl = influence(model, do.coef = FALSE), type = c("deviance", "pearson"), ...)rstudent(model, ...) "rstudent"(model, infl = lm.influence(model, do.coef = FALSE), res = infl$wt.res, ...) "rstudent"(model, infl = influence(model, do.coef = FALSE), ...)dffits(model, infl = , res = )dfbeta(model, ...) "dfbeta"(model, infl = lm.influence(model, do.coef = TRUE), ...)dfbetas(model, ...) "dfbetas"(model, infl = lm.influence(model, do.coef = TRUE), ...)covratio(model, infl = lm.influence(model, do.coef = FALSE), res = weighted.residuals(model))cooks.distance(model, ...) "cooks.distance"(model, infl = lm.influence(model, do.coef = FALSE), res = weighted.residuals(model), sd = sqrt(deviance(model)/df.residual(model)), hat = infl$hat, ...) "cooks.distance"(model, infl = influence(model, do.coef = FALSE), res = infl$pear.res, dispersion = summary(model)$dispersion, hat = infl$hat, ...)hatvalues(model, ...) "hatvalues"(model, infl = lm.influence(model, do.coef = FALSE), ...)hat(x, intercept = TRUE)
- an R object, typically returned by
- influence structure as returned by
influence(the latter only for the
- (possibly weighted) residuals, with proper default.
- standard deviation to use, see default.
- dispersion (for
glmobjects) to use, see default.
- hat values $H[i,i]$, see default.
- type of residuals for
- the $X$ or design matrix.
- should an intercept column be prepended to
- further arguments passed to or from other methods.
The primary high-level function is
influence.measures which produces a
"infl" object tabular display showing the DFBETAS for
each model variable, DFFITS, covariance ratios, Cook's distances and
the diagonal elements of the hat matrix. Cases which are influential
with respect to any of these measures are marked with an asterisk.
cooks.distance provide direct access to the
corresponding diagnostic quantities. Functions
rstudent give the standardized and Studentized residuals
respectively. (These re-normalize the residuals to have unit variance,
using an overall and leave-one-out measure of the error variance
Values for generalized linear models are approximations, as described in Williams (1987) (except that Cook's distances are scaled as $F$ rather than as chi-square values). The approximations can be poor when some cases have large influence.
sd arguments are there
to encourage the use of these direct access functions, in situations
where, e.g., the underlying basic influence measures (from
lm.influence or the generic
Note that cases with
weights == 0 are dropped from all
these functions, but that if a linear model has been fitted with
na.action = na.exclude, suitable values are filled in for the
cases excluded during fitting.
hat() exists mainly for S (version 2)
compatibility; we recommend using
dfbetas, the method
for linear models also works for generalized linear models.
Belsley, D. A., Kuh, E. and Welsch, R. E. (1980) Regression Diagnostics. New York: Wiley.
Cook, R. D. and Weisberg, S. (1982) Residuals and Influence in Regression. London: Chapman and Hall.
Williams, D. A. (1987) Generalized linear model diagnostics using the deviance and single case deletions. Applied Statistics 36, 181--191.
Fox, J. (1997) Applied Regression, Linear Models, and Related Methods. Sage.
Fox, J. (2002) An R and S-Plus Companion to Applied Regression. Sage Publ.; http://www.socsci.mcmaster.ca/jfox/Books/Companion/.
plotmath for the use of
hat in plot annotation.