Fits a density surface model (DSM) to detection adjusted counts from a spatially-referenced distance sampling analysis. dsm takes observations of animals, allocates them to segments of line (or strip transects) and optionally adjusts the counts based on detectability using a supplied detection function model. A generalized additive model, generalized mixed model or generalized linear model is then used to model these adjusted counts based on a formula involving environmental covariates.
dsm(formula, ddf.obj, segment.data, observation.data, engine = "gam",
convert.units = 1, family = quasipoisson(link = "log"),
group = FALSE, control = list(keepData = TRUE), availability = 1,
strip.width = NULL, segment.area = NULL, weights = NULL,
transect = "line", method = "REML", ...)segment data, see dsm-data.
observation data, see dsm-data.
conversion factor to multiply the area of the segments by. See 'Units' below.
response distribution (popular choices include quasipoisson, Tweedie/tw and negbin/nb). Defaults to quasipossion.
if TRUE the abundance of groups will be calculated rather than the abundance of individuals. Setting this option to TRUE is equivalent to setting the size of each group to be 1.
the usual control argument for a gam; keepData must be TRUE for variance estimation to work (though this option cannot be set for GLMs or GAMMs.
an availability bias used to scale the counts/estimated counts by. If we have N animals in a segment, then N/availability will be entered into the model. Uncertainty in the availability is not handled at present.
if ddf.obj, above, is NULL, then this is where the strip width is specified (i.e. for a strip transect survey). This is sometimes (and more correctly) referred to as the half-width, i.e. right truncation minus left truncation.
if `NULL` (default) segment areas will be calculated by multiplying the `Effort` column in `segment.data` by the (right minus left) truncation distance for the `ddf.obj` or by `strip.width`. Alternatively a vector of segment areas can be provided (which must be the same length as the number of rows in `segment.data`) or a character string giving the name of a column in `segment.data` which contains the areas. If segment.area is specified it takes precident.
weights for each observation used in model fitting. The default, weights=NULL, weights each observation by its area (see Details). Setting a scalar value (e.g. weights=1) all observations are equally weighted.
type of transect ("line", the default or "point"). This is overridden by the detection function transect type, this is usually only necessary when no detection function is specified.
The smoothing parameter estimation method. Default is "REML", using Restricted Maximum Likelihood. See gam for other options. Ignored for engine="glm".
a glm/gam/gamm object, with an additional element, ddf which holds the detection function object.
It is often the case that distances are collected in metres and segment lengths are recorded in kilometres. dsm allows you to provide a conversation factor (convert.units) to multiply the areas by. For example: if distances are in metres and segment lengths are in kilometres setting convert.units=1000 will lead to the analysis being in metres. Setting convert.units=1/1000 will lead to the analysis being in kilometres. The conversion factor will be applied to `segment.area` if that is specified.
For large models, engine="bam" with method="fREML" may be useful. Models specified for bam should be as gam. READ bam before using this option; this option is considered EXPERIMENTAL at the moment. In particular note that the default basis choice (thin plate regression splines) will be slow and that in general fitting is less stable than when using gam. For negative binomial response, theta must be specified when using bam.
The response (LHS of `formula`) can be one of the following:
n, count, N |
count in each segment |
Nhat, abundance.est |
estimated abundance per segment, estimation is via a Horvitz-Thompson estimator. This should be used when there are covariates in the detection function. |
presence |
interpret the data as presence/absence (remember to change the family argument to binomial()), detectability is not accounted for |
D, density, Dhat, density.est |
density per segment |
The offset used in the model is dependent on the response:
| count | area of segment multiplied by average probability of detection in the segment |
| estimated count | area of the segment |
| presence | zero |
| density | zero |
In the latter two cases (density and presence estimation) observations can be weighted by segment areas via the weights= argument. By default (weights=NULL), when density or presence are estimated the weights are set to the segment areas (using segment.area or by calculating 2*(strip width)*Effort) Alternatively weights=1 will set the weights to all be equal. A third alternative is to pass in a vector of length equal to the number of segments, containing appropriate weights.
Hedley, S. and S. T. Buckland. 2004. Spatial models for line transect sampling. JABES 9:181-199.
Miller, D. L., Burt, M. L., Rexstad, E. A., Thomas, L. (2013), Spatial models for distance sampling data: recent developments and future directions. Methods in Ecology and Evolution, 4: 1001-1010. doi: 10.1111/2041-210X.12105 (Open Access, available at http://onlinelibrary.wiley.com/doi/10.1111/2041-210X.12105/abstract)
Wood, S.N. 2006. Generalized Additive Models: An Introduction with R. CRC/Chapman & Hall.
# NOT RUN {
library(Distance)
library(dsm)
# load the Gulf of Mexico dolphin data (see ?mexdolphins)
data(mexdolphins)
# fit a detection function and look at the summary
hr.model <- ds(distdata, max(distdata$distance),
key = "hr", adjustment = NULL)
summary(hr.model)
# fit a simple smooth of x and y to counts
mod1 <- dsm(count~s(x,y), hr.model, segdata, obsdata)
summary(mod1)
# predict over a grid
mod1.pred <- predict(mod1, preddata, preddata$area)
# calculate the predicted abundance over the grid
sum(mod1.pred)
# plot the smooth
plot(mod1)
# }
Run the code above in your browser using DataLab