# influence.measures

##### Regression Deletion Diagnostics

This suite of functions can be used to compute some of the regression (leave-one-out deletion) diagnostics for linear and generalized linear models discussed in Belsley, Kuh and Welsch (1980), Cook and Weisberg (1982), etc.

- Keywords
- regression

##### Usage

`influence.measures(model)`rstandard(model, …)
# S3 method for lm
rstandard(model, infl = lm.influence(model, do.coef = FALSE),
sd = sqrt(deviance(model)/df.residual(model)),
type = c("sd.1", "predictive"), …)
# S3 method for glm
rstandard(model, infl = influence(model, do.coef = FALSE),
type = c("deviance", "pearson"), …)

rstudent(model, …)
# S3 method for lm
rstudent(model, infl = lm.influence(model, do.coef = FALSE),
res = infl$wt.res, …)
# S3 method for glm
rstudent(model, infl = influence(model, do.coef = FALSE), …)

dffits(model, infl = , res = )

dfbeta(model, …)
# S3 method for lm
dfbeta(model, infl = lm.influence(model, do.coef = TRUE), …)

dfbetas(model, …)
# S3 method for lm
dfbetas(model, infl = lm.influence(model, do.coef = TRUE), …)

covratio(model, infl = lm.influence(model, do.coef = FALSE),
res = weighted.residuals(model))

cooks.distance(model, …)
# S3 method for lm
cooks.distance(model, infl = lm.influence(model, do.coef = FALSE),
res = weighted.residuals(model),
sd = sqrt(deviance(model)/df.residual(model)),
hat = infl$hat, …)
# S3 method for glm
cooks.distance(model, infl = influence(model, do.coef = FALSE),
res = infl$pear.res,
dispersion = summary(model)$dispersion,
hat = infl$hat, …)

hatvalues(model, …)
# S3 method for lm
hatvalues(model, infl = lm.influence(model, do.coef = FALSE), …)

hat(x, intercept = TRUE)

##### Arguments

- model
- infl
influence structure as returned by

`lm.influence`

or`influence`

(the latter only for the`glm`

method of`rstudent`

and`cooks.distance`

).- res
(possibly weighted) residuals, with proper default.

- sd
standard deviation to use, see default.

- dispersion
dispersion (for

`glm`

objects) to use, see default.- hat
hat values \(H_{ii}\), see default.

- type
type of residuals for

`rstandard`

, with different options and meanings for`lm`

and`glm`

. Can be abbreviated.- x
the \(X\) or design matrix.

- intercept
should an intercept column be prepended to

`x`

?- …
further arguments passed to or from other methods.

##### Details

The primary high-level function is `influence.measures`

which produces a
class `"infl"`

object tabular display showing the DFBETAS for
each model variable, DFFITS, covariance ratios, Cook's distances and
the diagonal elements of the hat matrix. Cases which are influential
with respect to any of these measures are marked with an asterisk.

The functions `dfbetas`

, `dffits`

,
`covratio`

and `cooks.distance`

provide direct access to the
corresponding diagnostic quantities. Functions `rstandard`

and
`rstudent`

give the standardized and Studentized residuals
respectively. (These re-normalize the residuals to have unit variance,
using an overall and leave-one-out measure of the error variance
respectively.)

Values for generalized linear models are approximations, as described in Williams (1987) (except that Cook's distances are scaled as \(F\) rather than as chi-square values). The approximations can be poor when some cases have large influence.

The optional `infl`

, `res`

and `sd`

arguments are there
to encourage the use of these direct access functions, in situations
where, e.g., the underlying basic influence measures (from
`lm.influence`

or the generic `influence`

) are
already available.

Note that cases with `weights == 0`

are *dropped* from all
these functions, but that if a linear model has been fitted with
`na.action = na.exclude`

, suitable values are filled in for the
cases excluded during fitting.

For linear models, `rstandard(*, type = "predictive")`

provides
leave-one-out cross validation residuals, and the “PRESS”
statistic (**PRE**dictive **S**um of **S**quares, the same as
the CV score) of model `model`

is

PRESS <- sum(rstandard(model, type="pred")^2)

The function `hat()`

exists mainly for S (version 2)
compatibility; we recommend using `hatvalues()`

instead.

##### Note

For `hatvalues`

, `dfbeta`

, and `dfbetas`

, the method
for linear models also works for generalized linear models.

##### References

Belsley, D. A., Kuh, E. and Welsch, R. E. (1980).
*Regression Diagnostics*.
New York: Wiley.

Cook, R. D. and Weisberg, S. (1982).
*Residuals and Influence in Regression*.
London: Chapman and Hall.

Williams, D. A. (1987).
Generalized linear model diagnostics using the deviance and single
case deletions.
*Applied Statistics*, **36**, 181--191.
10.2307/2347550.

Fox, J. (1997).
*Applied Regression, Linear Models, and Related Methods*.
Sage.

Fox, J. (2002)
*An R and S-Plus Companion to Applied Regression*.
Sage Publ.

Fox, J. and Weisberg, S. (2011).
*An R Companion to Applied Regression*, second edition.
Sage Publ;
http://socserv.socsci.mcmaster.ca/jfox/Books/Companion/.

##### See Also

`influence`

(containing `lm.influence`

).

‘plotmath’ for the use of `hat`

in plot annotation.

##### Examples

`library(stats)`

```
# NOT RUN {
require(graphics)
## Analysis of the life-cycle savings data
## given in Belsley, Kuh and Welsch.
lm.SR <- lm(sr ~ pop15 + pop75 + dpi + ddpi, data = LifeCycleSavings)
inflm.SR <- influence.measures(lm.SR)
which(apply(inflm.SR$is.inf, 1, any))
# which observations 'are' influential
summary(inflm.SR) # only these
# }
# NOT RUN {
inflm.SR # all
# }
# NOT RUN {
plot(rstudent(lm.SR) ~ hatvalues(lm.SR)) # recommended by some
plot(lm.SR, which = 5) # an enhanced version of that via plot(<lm>)
## The 'infl' argument is not needed, but avoids recomputation:
rs <- rstandard(lm.SR)
iflSR <- influence(lm.SR)
identical(rs, rstandard(lm.SR, infl = iflSR))
## to "see" the larger values:
1000 * round(dfbetas(lm.SR, infl = iflSR), 3)
cat("PRESS :"); (PRESS <- sum( rstandard(lm.SR, type = "predictive")^2 ))
stopifnot(all.equal(PRESS, sum( (residuals(lm.SR) / (1 - iflSR$hat))^2)))
## Show that "PRE-residuals" == L.O.O. Crossvalidation (CV) errors:
X <- model.matrix(lm.SR)
y <- model.response(model.frame(lm.SR))
## Leave-one-out CV least-squares prediction errors (relatively fast)
rCV <- vapply(seq_len(nrow(X)), function(i)
y[i] - X[i,] %*% .lm.fit(X[-i,], y[-i])$coef,
numeric(1))
## are the same as the *faster* rstandard(*, "pred") :
stopifnot(all.equal(rCV, unname(rstandard(lm.SR, type = "predictive"))))
## Huber's data [Atkinson 1985]
xh <- c(-4:0, 10)
yh <- c(2.48, .73, -.04, -1.44, -1.32, 0)
lmH <- lm(yh ~ xh)
# }
# NOT RUN {
summary(lmH)
# }
# NOT RUN {
im <- influence.measures(lmH)
# }
# NOT RUN {
im
# }
# NOT RUN {
plot(xh,yh, main = "Huber's data: L.S. line and influential obs.")
abline(lmH); points(xh[im$is.inf], yh[im$is.inf], pch = 20, col = 2)
## Irwin's data [Williams 1987]
xi <- 1:5
yi <- c(0,2,14,19,30) # number of mice responding to dose xi
mi <- rep(40, 5) # number of mice exposed
glmI <- glm(cbind(yi, mi -yi) ~ xi, family = binomial)
# }
# NOT RUN {
summary(glmI)
# }
# NOT RUN {
signif(cooks.distance(glmI), 3) # ~= Ci in Table 3, p.184
imI <- influence.measures(glmI)
# }
# NOT RUN {
imI
# }
# NOT RUN {
stopifnot(all.equal(imI$infmat[,"cook.d"],
cooks.distance(glmI)))
# }
```

*Documentation reproduced from package stats, version 3.5.1, License: Part of R 3.5.1*