This function calculates the empirical influence values for a statistic applied to a data set. It allows four types of calculation, namely the infinitesimal jackknife (using numerical differentiation), the usual jackknife estimates, the ‘positive’ jackknife estimates and a method which estimates the empirical influence values using regression of bootstrap replicates of the statistic. All methods can be used with one or more samples.
empinf(boot.out = NULL, data = NULL, statistic = NULL,
type = NULL, stype = NULL ,index = 1, t = NULL,
strata = rep(1, n), eps = 0.001, ...)
A bootstrap object created by the function boot
. If
type
is "reg"
then this argument is required. For any
of the other types it is an optional argument. If it is included
when optional then the values of data
, statistic
,
stype
, and strata
are taken from the components of
boot.out
and any values passed to empinf
directly are
ignored.
A vector, matrix or data frame containing the data for which
empirical influence values are required. It is a required argument
if boot.out
is not supplied. If boot.out
is supplied
then data
is set to boot.out$data
and any value
supplied is ignored.
The statistic for which empirical influence values are required. It
must be a function of at least two arguments, the data set and a
vector of weights, frequencies or indices. The nature of the second
argument is given by the value of stype
. Any other arguments
that it takes must be supplied to empinf
and will be passed
to statistic
unchanged. This is a required argument if
boot.out
is not supplied, otherwise its value is taken from
boot.out
and any value supplied here will be ignored.
The calculation type to be used for the empirical influence
values. Possible values of type
are "inf"
(infinitesimal jackknife), "jack"
(usual jackknife),
"pos"
(positive jackknife), and "reg"
(regression
estimation). The default value depends on the other arguments. If
t
is supplied then the default value of type
is
"reg"
and boot.out
should be present so that its
frequency array can be found. It t
is not supplied then if
stype
is "w"
, the default value of type
is
"inf"
; otherwise, if boot.out
is present the default
is "reg"
. If none of these conditions apply then the default
is "jack"
. Note that it is an error for type
to be
"reg"
if boot.out
is missing or to be "inf"
if
stype
is not "w"
.
A character variable giving the nature of the second argument to
statistic
. It can take on three values: "w"
(weights),
"f"
(frequencies), or "i"
(indices). If
boot.out
is supplied the value of stype
is set to
boot.out$stype
and any value supplied here is ignored.
Otherwise it is an optional argument which defaults to "w"
.
If type
is "inf"
then stype
MUST be
"w"
.
An integer giving the position of the variable of interest in the
output of statistic
.
A vector of length boot.out$R
which gives the bootstrap
replicates of the statistic of interest. t
is used only when
type
is reg
and it defaults to
boot.out$t[,index]
.
An integer vector or a factor specifying the strata for multi-sample
problems. If boot.out
is supplied the value of strata
is set to boot.out$strata
. Otherwise it is an optional
argument which has default corresponding to the single sample
situation.
This argument is used only if type
is "inf"
. In that
case the value of epsilon to be used for numerical differentiation
will be eps
divided by the number of observations in
data
.
Any other arguments that statistic
takes. They will be
passed unchanged to statistic
every time that it is called.
A vector of the empirical influence values of statistic
applied
to data
. The values will be in the same order as the
observations in data.
All arguments to empinf
must be passed using the name =
value
convention. If this is not followed then unpredictable
errors can occur.
If type
is "inf"
then numerical differentiation is used
to approximate the empirical influence values. This makes sense only
for statistics which are written in weighted form (i.e. stype
is "w"
). If type
is "jack"
then the usual
leave-one-out jackknife estimates of the empirical influence are
returned. If type
is "pos"
then the positive
(include-one-twice) jackknife values are used. If type
is
"reg"
then a bootstrap object must be supplied. The regression
method then works by regressing the bootstrap replicates of
statistic
on the frequency array from which they were derived.
The bootstrap frequency array is obtained through a call to
boot.array
. Further details of the methods are given in
Section 2.7 of Davison and Hinkley (1997).
Empirical influence values are often used frequently in nonparametric
bootstrap applications. For this reason many other functions call
empinf
when they are required. Some examples of their use are
for nonparametric delta estimates of variance, BCa intervals and
finding linear approximations to statistics for use as control
variates. They are also used for antithetic bootstrap resampling.
Davison, A.C. and Hinkley, D.V. (1997) Bootstrap Methods and Their Application. Cambridge University Press.
Efron, B. (1982) The Jackknife, the Bootstrap and Other Resampling Plans. CBMS-NSF Regional Conference Series in Applied Mathematics, 38, SIAM.
Fernholtz, L.T. (1983) von Mises Calculus for Statistical Functionals. Lecture Notes in Statistics, 19, Springer-Verlag.
boot
, boot.array
, boot.ci
,
control
, jack.after.boot
,
linear.approx
, var.linear
# NOT RUN { # The empirical influence values for the ratio of means in # the city data. ratio <- function(d, w) sum(d$x *w)/sum(d$u*w) empinf(data = city, statistic = ratio) city.boot <- boot(city, ratio, 499, stype="w") empinf(boot.out = city.boot, type = "reg") # A statistic that may be of interest in the difference of means # problem is the t-statistic for testing equality of means. In # the bootstrap we get replicates of the difference of means and # the variance of that statistic and then want to use this output # to get the empirical influence values of the t-statistic. grav1 <- gravity[as.numeric(gravity[,2]) >= 7,] grav.fun <- function(dat, w) { strata <- tapply(dat[, 2], as.numeric(dat[, 2])) d <- dat[, 1] ns <- tabulate(strata) w <- w/tapply(w, strata, sum)[strata] mns <- as.vector(tapply(d * w, strata, sum)) # drop names mn2 <- tapply(d * d * w, strata, sum) s2hat <- sum((mn2 - mns^2)/ns) c(mns[2] - mns[1], s2hat) } grav.boot <- boot(grav1, grav.fun, R = 499, stype = "w", strata = grav1[, 2]) # Since the statistic of interest is a function of the bootstrap # statistics, we must calculate the bootstrap replicates and pass # them to empinf using the t argument. grav.z <- (grav.boot$t[,1]-grav.boot$t0[1])/sqrt(grav.boot$t[,2]) empinf(boot.out = grav.boot, t = grav.z) # }