tolerance: Tolerance by p-value surfaces

Description

Calculates a p-value surface based on asymptotic theory or Monte-Carlo (MC) permutations describing the extremity of risk given a fixed or adaptive kernel-smoothed density-ratio, allowing the drawing of tolerance contours.

Usage

tolerance(rs, method = c("ASY", "MC"), ref.density = NULL, beta = 0.025,
  ITER = 100, parallelise = NULL, verbose = TRUE, ...)

Arguments

An object of class rrs giving the estimated relative risk function for which to calculate the p-value surface.

method

A character string specifying the method of calculation. "ASY" (default) instructs the function to compute the p-values using asymptotic theory. "MC" computes the values by random permutations of the data. See `Details'.

ref.density

Required if rs is based on fixed-bandwidth estimates of the case and control densities and method = "ASY". Either a pixel image or an object of class bivden giving the reference density to use in asymptotic formulae. May be unnormalised. Ignored if rs is based on adaptive kernel estimates or if method = "MC".

beta

A numeric value \(0 <\) beta \(< 1\) giving the fineness of the adaptive bandwidth partitioning to use for calculation of the required quantities for asymptotic adaptive p-value surfaces. Smaller values provide more accurate bandwidth bins at the cost of additional computing time, see Davies and Baddeley (2017); the default is sensible in most cases. Ignored if rs is based on fixed-bandwidth kernel estimates.

ITER

Number of iterations for the Monte-Carlo permutations. Ignored if method = "ASY".

parallelise

Numeric argument to invoke parallel processing, giving the number of CPU cores to use when method = "MC". Experimental. Test your system first using parallel::detectCores() to identify the number of cores available to you.

verbose

Logical value indicating whether to print function progress during execution.

...

Additional arguments to be passed to risk when method = "MC". While most information needed for the MC repetitions is implicitly gleaned from the object passed to rs, this ellipsis is typically used to set the appropriate epsilon and pilot.symmetry values for the internal calls to risk.

Value

A pixel image of the estimated p-value surface.

Details

This function implements developments in Hazelton and Davies (2009) (fixed) and Davies and Hazelton (2010) (adaptive) to compute pointwise p-value surfaces based on asymptotic theory of kernel-smoothed relative risk surfaces. Alternatively, the user may elect to calculate the p-value surfaces using Monte-Carlo methods (see Kelsall and Diggle, 1995). Superimposition upon a plot of the risk surface contours of these p-values at given significance levels (i.e. ``tolerance contours'') can be an informative way of exploring the statistical significance of the extremity of risk across the defined study region.

The Monte-Carlo method, while not dependent on asymptotic theory, is computationally expensive and it has been suggested might have some undesirable practical consequences in certain settings (Hazelton and Davies, 2009). When performing the MC simulations, the same global (and pilot, if necessary) bandwidths and edge-correction regimen is employed as were used in the initial density estimates of the observed data. With regard to arguments to be passed to internal calls of risk, the user should take care to use ... to set the epsilon value to match that which was used in creation of the object passed to rs (if this was set to a non-default value). Furthermore, if performing MC simulations for the adaptive relative risk function, the function borrows the value of the beta argument to speed things up via partitioning, and the user should additionally access ... to set the same pilot.symmetry value as was used for creation of the object passed to rs, in the same way as for any non-default use of epsilon. This will ensure the simulations are all performed under the same conditions as the original risk function.

The asymptotic approach to the p-value calculation can be advantageous over a Monte-Carlo method, which can lead to excessive computation time for adaptive risk surfaces and large datasets. See the aforementioned references for further comments.

Choosing different options for the argument test simply manipulates the `direction' of the p-values. That is, plotting tolerance contours at a significance level of 0.05 for a p-value surface calculated with test = "double" is equivalent to plotting tolerance contours at significance levels of 0.025 and 0.975 for test = "upper".

Implementation of the Monte-Carlo contours for the fixed-bandwidth estimator simply involves random allocation of case/control marks and re-estimation of the risk surface ITER times, against which the original estimate is compared. The bandwidth(s) for case and control densities in the permuted estimates are controlled by hpsim. If your risk surface is adaptive, hpsim is used to control the pilot bandwidths, h0sim the global bandwidths. In particular, for the adaptive symmetric estimator (Davies et al. 201), it is assumed that the original estimate was itself calculated as a symmetric estimate via use of the pdef argument. The symchoice argument governs the specific permuted data set to use for the pilot estimate, and if hpsim is NULL, the pilot bandwidth thereof is taken from the stored pdef object in the original estimate. An error will occur if you attempt to set symchoice with an rs argument in this function that does not contain density estimates with present pdef objects of class "bivden". See the help file for bivariate.density for details on using the pdef argument.

In addition to the usage noted above, you may define either hpsim and/or h0sim as functions which re-calculate the case and control pilot (or fixed) bandwidth(s) and the global bandwidth(s) at each iteration, based on the data set of the permuted case/control marks. If so, these must strictly be functions that take the case data as the first argument and the control data as the second argument, each as a two-column matrix of the x-y coordinates. The function must strictly return a numeric vector of length 1 or 2; these entries to be assigned to the relevant density estimates as per the usage notes on supply of a numeric vector for hpsim. Take care -- warnings will be issued if, for example, you specify a hpsim function that returns two numeric values, but your rs object is a symmetric-adaptive estimate (in which case it only makes sense to yield one pilot bandwidth)!

References

Davies, T.M. and Baddeley A. (2017), Fast computation of spatially adaptive kernel estimates, Statistics and Computing, [to appear].

Davies, T.M. and Hazelton, M.L. (2010), Adaptive kernel estimation of spatial relative risk, Statistics in Medicine, 29(23) 2423-2437.

Hazelton, M.L. and Davies, T.M. (2009), Inference based on kernel estimates of the relative risk function in geographical epidemiology, Biometrical Journal, 51(1), 98-109.

Kelsall, J.E. and Diggle, P.J. (1995), Kernel estimation of relative risk, Bernoulli, 1, 3-16.

Examples

Run this code

# NOT RUN {
# }
# NOT RUN {
data(pbc)
h0 <- LSCV.risk(pbc,method="hazelton");h0
pbccas <- split(pbc)[[1]]
pbccon <- split(pbc)[[2]]

# ASY
riskfix <- risk(pbc,h0=h0)
fixtol1 <- tolerance(riskfix,ref.density=density(pbc,OS(pbc)))

riskada <- risk(pbc,h0=h0,adapt=TRUE,hp=NS(pbc),pilot.symmetry="pooled",davies.baddeley=0.025)
adatol1 <- tolerance(riskada)

par(mfrow=c(1,2))
plot(riskfix)
tol.contour(fixtol1,levels=c(0.1,0.05,0.01),lty=3:1,add=TRUE)
plot(riskada)
tol.contour(adatol1,levels=c(0.1,0.05,0.01),lty=3:1,add=TRUE)


# MC
fixtol2 <- tolerance(riskfix,method="MC",ITER=200) 
adatol2 <- tolerance(riskada,method="MC",ITER=200,parallelise=4) # ~1 minute with parallelisation
par(mfrow=c(1,2))
plot(riskfix)
tol.contour(fixtol2,levels=c(0.1,0.05,0.01),lty=3:1,add=TRUE)
plot(riskada)
tol.contour(adatol2,levels=c(0.1,0.05,0.01),lty=3:1,add=TRUE)
# }
# NOT RUN {

# }

Run the code above in your browser using DataLab