spaMM_boot: Parametric bootstrap

Description

This simulates samples from a fit object inheriting from class "HLfit", as produced by spaMM's fitting function, and applies a given function to each simulated sample. Parallelization is supported (see Details). A typical usage of the parametric bootstrap is to fit by one model samples produced under another model (see Example).

Usage

spaMM_boot(object, simuland, nsim, nb_cores = NULL, resp_testfn=NULL, 
           control.foreach=list(), debug. = FALSE, type, ...)

Arguments

object

The fit object to simulate from.

simuland

The function to apply to each simulated sample. See Details for requirements of this function.

nsim

Number of samples to simulate and analyze.

nb_cores

Number of cores to use for parallel computation. The default is spaMM.getOption("nb_cores"), and 1 if the latter is NULL. nb_cores=1 prevents the use of parallelisation procedures.

resp_testfn

Passed to simulate.HLfit; NULL, or a function that tests a condition which simulated samples should satisfy. This function takes a response vector as argument and return a boolean (TRUE indicating that the sample satisfies the condition).

control.foreach

list of control arguments for foreach. These include in particular .combine (with default value "rbind"), and .errorhandling (with default value "remove", but "pass" is quite useful for debugging).

debug.

Boolean; only for debugging purposes, particularly in parallel runs, where debug.=TRUE allows useful debugging info to be returned. If debug.=FALSE, the returned bootreps will contain NA for fits that fail, and if debug.=2 an error will be thrown for fits that fail (only useful in serial computations).

type

For development purposes, not further documented.

…

Further arguments passed (in a non-standard way) to the simuland function.

Value

A list, with two elements (unless debug. is TRUE):

bootreps, nsim return values in the format returned either by apply or parallel::parApply or by foreach::`%dopar%` as controlled by control.foreach$.combine. If simuland returns a vector, spaMM_boot should effectively rbind the results by default, returning an nsim-row matrix in all cases. From spaMM 2.5.6, if simuland returns a 1-row data frame, spaMM_boot rbinds the results into a nsim-row data frame in all cases. The results may not be consistent among parallel backends in other cases, and may change in later versions, so users should stick to one of these two cases as much as possible.
RNGstate, the state of .Random.seed at the beginning of the simulation.

Details

spaMM_boot handles parallel backends with different features. pbapply::pbapply has a very simple interface (essentially equivalent to apply) and provides progress bars, but (currently: version 1.4.0) does not have efficient load-balancing. doSNOW also provides a progress bar and allows more efficient load-balancing, but its requires foreach, whose handling of '…' arguments is tortuous. foreach will be used if doSNOW is loaded; then, some of the '…' arguments may need to be quoted (see Example). foreach also handles errors diferently from pbapply (which will simply stop if fitting a model to a bootstrap replicate fails): see the foreach documentation.

spaMM_boot calls simulate.HLfit on the fit object and applies simuland on each column of the matrix returned by this call. The simuland function must take as first argument a vector of response values, and may use … to pass additional arguments. The default simuland function is .eval_replicate(), and an alternative function .eval_replicate2() is also provided. The latter function is slower, as it refits the models compared with different initial values for random-effect parameters, which is useful in some difficult cases where initial values matter.

Advanced users can define their own simuland function. An example is provided in the file tests/testthat/test-LRT-boot.R, using … to pass additional arguments beyond response values. Alternatively, .eval_replicate() can be used as a template. It has no … argument, as essential arguments are passed through its environment. spaMM_boot redefines the environment of any simuland for that purpose. This implies that users but should not assume that they can control their own simuland function's environment (except its isdebugged() status).

Examples

Run this code

# NOT RUN {
if (spaMM.getOption("example_maxtime")>10) {
 data("blackcap")
 
 # Generate fits of null and full models:
 lrt <- fixedLRT(null.formula=migStatus ~ 1 + Matern(1|latitude+longitude),
       formula=migStatus ~ means + Matern(1|latitude+longitude), 
       HLmethod='ML',data=blackcap)

 # The 'simuland' argument: 
 myfun <- function(y, what=NULL, lrt, ...) { 
    data <- lrt$fullfit$data
    data$migStatus <- y ## replaces original response (! more complicated for binomial fits)
    full_call <- getCall(lrt$fullfit) ## call for full fit
    full_call$data <- data
    res <- eval(full_call) ## fits the full model on the simulated response
    if (!is.null(what)) res <- eval(what) ## post-process the fit
    return(res) ## the fit, or anything produced by evaluating 'what'
  }
  # where the 'what' argument (not required) of myfun() allows one to control 
  # what the function returns without redefining the function.
  
  # Call myfun() with no 'what' argument: returns a list of fits 
  spaMM_boot(lrt$nullfit, simuland = myfun, nsim=1, lrt=lrt)[["bootreps"]] 
  
  # Return only a model coefficient for each fit: 
  spaMM_boot(lrt$nullfit, simuland = myfun, nsim=7,
               what=quote(fixef(res)[2L]), lrt=lrt)[["bootreps"]]       
}
# }

Run the code above in your browser using DataLab