empinf: Empirical Influence Values

Description

This function calculates the empirical influence values for a statistic applied to a data set. It allows four types of calculation, namely the infinitesimal jackknife (using numerical differentiation), the usual jackknife estimates, the "positive" jackknife estimates and a method which estimates the empirical influence values using regression of bootstrap replicates of the statistic. All methods can be used with one or more samples.

Usage

empinf(boot.out=NULL, data=NULL, statistic=NULL, 
       type=<>, stype="w", index=1, t=NULL,
       strata=rep(1, n), eps=0.001, ...)

Arguments

boot.out=

A bootstrap object created by the function boot. If type is "reg" then this argument is required. For any of the other types it is an optional argument. If it is included when optional then the values of

data=

A vector, matrix or data frame containing the data for which empirical influence values are required. It is a required argument if boot.out is not supplied. If boot.out is supplied then data is set to boot.ou

statistic=

The statistic for which empirical influence values are required. It must be a function of at least two arguments, the data set and a vector of weights, frequencies or indices. The nature of the second argument is given by the value of stype

type=

The calculation type to be used for the empirical influence values. Possible values of type are "inf" (infinitesimal jackknife), "jack" (usual jackknife), "pos" (positive jackknife), and "reg"

stype=

A character variable giving the nature of the second argument to statistic. It can take on three values: "w" (weights), "f" (frequencies), or "i" (indices). If boot.out is supplied the valu

index=

An integer giving the position of the variable of interest in the output of statistic.

A vector of length boot.out$R which gives the bootstrap replicates of the statistic of interest. t is used only when type is reg and it defaults to boot.out$t[,index].

strata=

An integer vector or a factor specifying the strata for multi-sample problems. If boot.out is supplied the value of strata is set to boot.out$strata. Otherwise it is an optional argument which has default correspo

eps=

This argument is used only if type is "inf". In that case the value of epsilon to be used for numerical differentiation will be eps divided by the number of observations in data.

...

Any other arguments that statistic takes. They will be passed unchanged to statistic every time that it is called.

Value

A vector of the empirical influence values of statistic applied to data. The values will be in the same order as the observations in data.

Warning

All arguments to empinf must be passed using the name=value convention. If this is not followed then unpredictable errors can occur.

Details

If type is "inf" then numerical differentiation is used to approximate the empirical influence values. This makes sense only for statistics which are written in weighted form (i.e. stype is "w"). If type is "jack" then the usual leave-one-out jackknife estimates of the empirical influence are returned. If type is "pos" then the positive (include-one-twice) jackknife values are used. If type is "reg" then a bootstrap object must be supplied. The regression method then works by regressing the bootstrap replicates of statistic on the frequency array from which they were derived. The bootstrap frequency array is obtained through a call to boot.array. Further details of the methods are given in Section 2.7 of Davison and Hinkley (1997).

Empirical influence values are often used frequently in nonparametric bootstrap applications. For this reason many other functions call empinf when they are required. Some examples of their use are for nonparametric delta estimates of variance, BCa intervals and finding linear approximations to statistics for use as control variates. They are also used for antithetic bootstrap resampling.

References

Davison, A.C. and Hinkley, D.V. (1997) Bootstrap Methods and Their Application. Cambridge University Press.

Efron, B. (1982) The Jackknife, the Bootstrap and Other Resampling Plans. CBMS-NSF Regional Conference Series in Applied Mathematics, 38, SIAM.

Fernholtz, L.T. (1983) von Mises Calculus for Statistical Functionals. Lecture Notes in Statistics, 19, Springer-Verlag.

Examples

Run this code

# The empirical influence values for the ratio of means in 
# the city data.
data(city)
ratio <- function(d, w) sum(d$x *w)/sum(d$u*w)
empinf(data=city,statistic=ratio)
city.boot <- boot(city,ratio,499,stype="w")
empinf(boot.out=city.boot,type="reg")


# A statistic that may be of interest in the difference of means
# problem is the t-statistic for testing equality of means.  In 
# the bootstrap we get replicates of the difference of means and 
# the variance of that statistic and then want to use this output
# to get the empirical influence values of the t-statistic.
data(gravity)
grav1 <- gravity[as.numeric(gravity[,2])>=7,]
grav.fun <- function(dat, w)
{    strata <- tapply(dat[, 2], as.numeric(dat[, 2]))
     d <- dat[, 1]
     ns <- tabulate(strata)
     w <- w/tapply(w, strata, sum)[strata]
     mns <- tapply(d * w, strata, sum)
     mn2 <- tapply(d * d * w, strata, sum)
     s2hat <- sum((mn2 - mns^2)/ns)
     c(mns[2]-mns[1],s2hat)
}


grav.boot <- boot(grav1, grav.fun, R=499, stype="w", strata=grav1[,2])


# Since the statistic of interest is a function of the bootstrap
# statistics, we must calculate the bootstrap replicates and pass
# them to empinf using the t argument.
grav.z <- (grav.boot$t[,1]-grav.boot$t0[1])/sqrt(grav.boot$t[,2])
empinf(boot.out=grav.boot,t=grav.z)

Run the code above in your browser using DataLab