speed: Speed Tips

Description

A compendium of ways to make secr.fit run faster.

Arguments

Use an appropriate mask

Check the extent and spacing of the habitat mask that you are using. Execution time is roughly proportional to the number of mask points (nrow(mymask)). Default settings can lead to very large masks for detector arrays that are elongated `north-south' because the number of points in the east-west direction is fixed. Compare results with a much sparser mask (e.g., nx = 32 instead of nx = 64).

Use conditional likelihood

If you don't need to model variation in density over space or time then consider maximizing the conditional likelihood in secr.fit (CL = TRUE). This reduces the complexity of the optimization problem, especially where there are several sessions and you want session-specific density estimates (by default, derived() returns a separate estimate for each session even if the detection parameters are constant across sessions).

Model selection

Do you really need to fit all those complex models? Chasing down small decrements in AIC is so last-century. Remember that detection parameters are mostly nuisance parameters, and models with big differences in AIC may barely differ in their density estimates. This is a good topic for further research - we seem to need a `focussed information criterion' (Claeskens and Hjort 2008) to discern the differences that matter. Be aware of the effects that can really make a difference: learned responses (b, bk etc.) and massive unmodelled heterogeneity. Use score.test() to compare nested models. At each stage this requires only the more simple model to have been fitted in full; further processing is required to obtain a numerical estimate of the gradient of the likelihood surface for the more complex model, but this is much faster than maximizing the likelihood. The tradeoff is that the score test is only approximate, and you may want to later verify the results using a full AIC comparison.

Break problem down

Suppose you are fitting models to multiple separate datasets that fit the general description of `sessions'. If you are fitting separate detection parameters to each session (i.e., you do not need to pool detection information), and you are not modelling trend in density across sessions, then it is much quicker to fit each session separately than to try to do it all at once. See Examples.

Mash replicated clusters of detectors

If your detectors are arranged in similar clusters (e.g., small square grids) then try the function mash.

Reduce sparse `proximity' data to `multi'

Full data from `proximity' detectors has dimensions n x S x K (n is number of individuals, S is number of occasions, K is number of traps). If the data are sparse (i.e. multiple detections of an individual on one occasion are rare) then it is efficient to treat proximity data as multi-catch data (dimension n x S, maximum of one detection per occasion). Use

reduce(proxCH, outputdetector =
  "multi")

Use multiple cores when applicable

Some computations can be run in parallel on multiple processors (most desktops these days have multiple cores), but capability is limited. Check the `ncores' argument of sim.secr() and secr.fit() and ?ncores. The speed gain is significant for parametric bootstrap computations in sim.secr. Parallelisation is also allowed for the session likelihood components of a multi-session model in secr.fit(), but gains there seem to be small or negative. Functions par.secr.fit, par.derived, and par.secr.fit are an alternative and more effective way to take advantage of multiple cores when fitting several models.

Avoid covariates with many levels

Categorical (factor) covariates with many levels and continuous covariates that take many values are not handled efficiently in secr.fit, and can dramatically slow down analyses and increase memory requirements.

Simulations

Model fitting is not needed to assess power. The precision of estimates from secr.fit can be predicted without laboriously fitting models to simulated datasets. Just use method = "none" to obtain the asymptotic variance at the known parameter values for which data have been simulated (e.g. with sim.capthist()). Suppress computation of standard errors by derived(). For a model fitted by conditional likelihood (CL = TRUE) the subsequent computation of derived density estimates can take appreciable time. If variances are not needed (e.g., when the aim is to predict the bias of the estimator across a large number of simulations) it is efficient to set se.D = FALSE in derived(). It is tempting to save a list with the entire `secr' object from each simulated fit, and to later extract summary statistics as needed. Be aware that with large simulations the overheads associated with storage of the list can become very large. The solution is to anticipate the summary statistics you will want and save only these.

References

Claeskens, G. and Hjort N. L. (2008) Model Selection and Model Averaging. Cambridge: Cambridge University Press.

Examples

Run this code

## compare timing of combined model with separate single-session models
## for 5-session ovenbird mistnetting data: 2977/78 = 38-fold difference

system.time(fit1 <- secr.fit(ovenCH, buffer = 300, model = list(D ~
    session, g0 ~ session, sigma ~ session)))
##   user  system elapsed 
## 2875.06   62.59 2977.26 

system.time(fit2 <- lapply (ovenCH, secr.fit, buffer = 300))
##   user  system elapsed 
##  75.61    1.95   77.65 

## ratio of density estimates
collate(fit1)[,1,1,"D"] / sapply(fit2, function(x) predict(x)["D","estimate"])
## session=2005 session=2006 session=2007 session=2008 session=2009 
##    1.0000066    1.0000692    1.0000001    0.9999543    0.9999285

Run the code above in your browser using DataLab