saddle.distn
Saddlepoint Distribution Approximations for Bootstrap Statistics
Approximate an entire distribution using saddlepoint methods. This
function can calculate simple and conditional saddlepoint distribution
approximations for a univariate quantity of interest. For the simple
saddlepoint the quantity of interest is a linear combination of
W where W is a vector of random variables. For the
conditional saddlepoint we require the distribution of one linear
combination given the values of any number of other linear
combinations. The distribution of W must be one of multinomial,
Poisson or binary. The primary use of this function is to calculate
quantiles of bootstrap distributions using saddlepoint approximations.
Such quantiles are required by the function control
to
approximate the distribution of the linear approximation to a
statistic.
- Keywords
- smooth, dplot, nonparametric
Usage
saddle.distn(A, u = NULL, alpha = NULL, wdist = "m",
type = "simp", npts = 20, t = NULL, t0 = NULL,
init = rep(0.1, d), mu = rep(0.5, n), LR = FALSE,
strata = NULL, …)
Arguments
- A
This is a matrix of known coefficients or a function which returns such a matrix. If a function then its first argument must be the point
t
at which a saddlepoint is required. The most common reason for A being a function would be if the statistic is not itself a linear combination of the W but is the solution to a linear estimating equation.- u
If
A
is a function thenu
must also be a function returning a vector with length equal to the number of columns of the matrix returned byA
. Usually all components other than the first will be constants as the other components are the values of the conditioning variables. IfA
is a matrix with more than one column (such as whenwdist = "cond"
) thenu
should be a vector with length one less thanncol(A)
. In this caseu
specifies the values of the conditioning variables. IfA
is a matrix with one column or a vector thenu
is not used.- alpha
The alpha levels for the quantiles of the distribution which should be returned. By default the 0.1, 0.5, 1, 2.5, 5, 10, 20, 50, 80, 90, 95, 97.5, 99, 99.5 and 99.9 percentiles are calculated.
- wdist
The distribution of W. Possible values are
"m"
(multinomial),"p"
(Poisson), or"b"
(binary).- type
The type of saddlepoint to be used. Possible values are
"simp"
(simple saddlepoint) and"cond"
(conditional). Ifwdist
is"m"
,type
is set to"simp"
.- npts
The number of points at which the saddlepoint approximation should be calculated and then used to fit the spline.
- t
A vector of points at which the saddlepoint approximations are calculated. These points should extend beyond the extreme quantiles required but still be in the possible range of the bootstrap distribution. The observed value of the statistic should not be included in
t
as the distribution function approximation breaks down at that point. The points should, however cover the entire effective range of the distribution including close to the centre. Ift
is supplied thennpts
is set tolength(t)
. Whent
is not supplied, the function attempts to find the effective range of the distribution and then selects points to cover this range.- t0
If
t
is not supplied then a vector of length 2 should be passed ast0
. The first component oft0
should be the centre of the distribution and the second should be an estimate of spread (such as a standard error). These two are then used to find the effective range of the distribution. The range finding mechanism does rely on an accurate estimate of location int0[1]
.- init
When
wdist
is"m"
, this vector should contain the initial values to be passed tonlmin
when it is called to solve the saddlepoint equations.- mu
The vector of parameter values for the distribution. The default is that the components of W are identically distributed.
- LR
A logical flag. When
LR
isTRUE
the Lugananni-Rice cdf approximations are calculated and used to fit the spline. Otherwise the cdf approximations used are based on Barndorff-Nielsen's r*.- strata
A vector giving the strata when the rows of A relate to stratified data. This is used only when
wdist
is"m"
.- …
When
A
andu
are functions any additional arguments are passed unchanged each time one of them is called.
Details
The range at which the saddlepoint is used is such that the cdf
approximation at the endpoints is more extreme than required by the
extreme values of alpha
. The lower endpoint is found by
evaluating the saddlepoint at the points t0[1]-2*t0[2]
,
t0[1]-4*t0[2]
, t0[1]-8*t0[2]
etc. until a point is
found with a cdf approximation less than min(alpha)/10
, then a
bisection method is used to find the endpoint which has cdf
approximation in the range (min(alpha)/1000
,
min(alpha)/10
). Then a number of, equally spaced, points are
chosen between the lower endpoint and t0[1]
until a total of
npts/2
approximations have been made. The remaining
npts/2
points are chosen to the right of t0[1]
in a
similar manner. Any points which are very close to the centre of the
distribution are then omitted as the cdf approximations are not
reliable at the centre. A smoothing spline is then fitted to the
probit of the saddlepoint distribution function approximations at the
remaining points and the required quantiles are predicted from the
spline.
Sometimes the function will terminate with the message
"Unable to find range"
. There are two main reasons why this may
occur. One is that the distribution is too discrete and/or the
required quantiles too extreme, this can cause the function to be
unable to find a point within the allowable range which is beyond the
extreme quantiles. Another possibility is that the value of
t0[2]
is too small and so too many steps are required to find
the range. The first problem cannot be solved except by asking for
less extreme quantiles, although for very discrete distributions the
approximations may not be very good. In the second case using a
larger value of t0[2]
will usually solve the problem.
Value
The returned value is an object of class "saddle.distn"
. See the help
file for saddle.distn.object
for a description of such
an object.
References
Booth, J.G. and Butler, R.W. (1990) Randomization distributions and saddlepoint approximations in generalized linear models. Biometrika, 77, 787--796.
Canty, A.J. and Davison, A.C. (1997) Implementation of saddlepoint approximations to resampling distributions. Computing Science and Statistics; Proceedings of the 28th Symposium on the Interface 248--253.
Davison, A.C. and Hinkley, D.V. (1997) Bootstrap Methods and their Application. Cambridge University Press.
Jensen, J.L. (1995) Saddlepoint Approximations. Oxford University Press.
See Also
lines.saddle.distn
, saddle
,
saddle.distn.object
, smooth.spline
Examples
# NOT RUN {
# The bootstrap distribution of the mean of the air-conditioning
# failure data: fails to find value on R (and probably on S too)
air.t0 <- c(mean(aircondit$hours), sqrt(var(aircondit$hours)/12))
# }
# NOT RUN {
saddle.distn(A = aircondit$hours/12, t0 = air.t0)
# }
# NOT RUN {
# alternatively using the conditional poisson
saddle.distn(A = cbind(aircondit$hours/12, 1), u = 12, wdist = "p",
type = "cond", t0 = air.t0)
# Distribution of the ratio of a sample of size 10 from the bigcity
# data, taken from Example 9.16 of Davison and Hinkley (1997).
ratio <- function(d, w) sum(d$x *w)/sum(d$u * w)
city.v <- var.linear(empinf(data = city, statistic = ratio))
bigcity.t0 <- c(mean(bigcity$x)/mean(bigcity$u), sqrt(city.v))
Afn <- function(t, data) cbind(data$x - t*data$u, 1)
ufn <- function(t, data) c(0,10)
saddle.distn(A = Afn, u = ufn, wdist = "b", type = "cond",
t0 = bigcity.t0, data = bigcity)
# From Example 9.16 of Davison and Hinkley (1997) again, we find the
# conditional distribution of the ratio given the sum of city$u.
Afn <- function(t, data) cbind(data$x-t*data$u, data$u, 1)
ufn <- function(t, data) c(0, sum(data$u), 10)
city.t0 <- c(mean(city$x)/mean(city$u), sqrt(city.v))
saddle.distn(A = Afn, u = ufn, wdist = "p", type = "cond", t0 = city.t0,
data = city)
# }