dfuncEstim: Estimate a detection function from distance-sampling data

Description

Fit a specific detection function off-transect or off-point (radial) distances.

Usage

dfuncEstim(formula, detectionData, siteData, likelihood = "halfnorm",
  pointSurvey = FALSE, w.lo = 0, w.hi = NULL, expansions = 0,
  series = "cosine", x.scl = 0, g.x.scl = 1, observer = "both",
  warn = TRUE, transectID = NULL, pointID = "point",
  length = "length", control = RdistanceControls())

Arguments

formula

A standard formula object (e.g., dist ~ 1, dist ~ covar1 + covar2). The left-hand side (before ~) is the name of the vector containing distances (off-transect or radial). The right-hand side (after ~) contains the names of covariate vectors to fit in the detection function. If covariates do not appear in data, they must be found in the parent frame (similar to lm, glm, etc.)

detectionData

A data frame containing detection distances (either perpendicular for line-transect or radial for point-transect designs), with one row per detected object or group. This data frame must contain at least the following information:

Detection Distances: A single column containing detection distances must be specified on the left-hand side of formula.
Site IDs: The ID of the transect or point (i.e., the 'site') where each object or group was detected. The site ID column(s) (see arguments transectID and pointID) must specify the site (transect or point) so that this data frame can be merged with siteData.

Optionally, this data frame can contain the following variables:

Group Sizes: The number of individuals in the group associated with each detection. If unspecified, Rdistance assumes all detections are of single individuals (i.e., all group sizes are 1).
When Rdistance allows detection-level covariates, detection-level covariates will appear in this data frame.

See example data set sparrowDetectionData). See also Input data frames below for information on when detectionData and siteData are required inputs.

siteData

A data.frame containing site (transect or point) IDs and any site level covariates to include in the detection function. Every unique surveyed site (transect or point) is represented on one row of this data set, whether or not targets were sighted at the site. See arguments transectID and pointID for an explanation of site and transect ID's.

If sites are transects, this data frame must also contain transect length. By default, transect length is assumed to be in column 'length' but can be specified using argument length.

The total number of sites surveyed is nrow(siteData). Duplicate site-level IDs are not allowed in siteData.

See Input data frames for when detectionData and siteData are required inputs.

likelihood

String specifying the likelihood to fit. Built-in likelihoods at present are "uniform", "halfnorm", "hazrate", "negexp", and "Gamma". See vignette for a way to use user-define likelihoods.

pointSurvey

A logical scalar specifying whether input data come from point-transect surveys (TRUE), or line-transect surveys (FALSE).

w.lo

Lower or left-truncation limit of the distances in distance data. This is the minimum possible off-transect distance. Default is 0.

w.hi

Upper or right-truncation limit of the distances in dist. This is the maximum off-transect distance that could be observed. If left unspecified (i.e., at the default of NULL), right-truncation is set to the maximum of the observed distances.

expansions

A scalar specifying the number of terms in series to compute. Depending on the series, this could be 0 through 5. The default of 0 equates to no expansion terms of any type. No expansion terms are allowed (i.e., expansions is forced to 0) if covariates are present in the detection function (i.e., right-hand side of formula includes something other than 1).

series

If expansions > 0, this string specifies the type of expansion to use. Valid values at present are 'simple', 'hermite', and 'cosine'.

x.scl

This parameter is passed to F.gx.estim. See F.gx.estim documentation for definition.

g.x.scl

This parameter is passed to F.gx.estim. See F.gx.estim documentation for definition.

observer

This parameter is passed to F.gx.estim. See F.gx.estim documentation for definition.

warn

A logical scalar specifying whether to issue an R warning if the estimation did not converge or if one or more parameter estimates are at their boundaries. For estimation, warn should generally be left at its default value of TRUE. When computing bootstrap confidence intervals, setting warn = FALSE turns off annoying warnings when an iteration does not converge. Regardless of warn, messages about convergence and boundary conditions are printed by print.dfunc, print.abund, and plot.dfunc, so there should be little harm in setting warn = FALSE.

transectID

A character vector naming the transect ID column(s) in detectionData and siteData. Rdistance accommodates two kinds of transects: continuous and point. When continuous transects are used, detections can occur at any point along the route and these are generally called line-transects. When point transects are used, detections can only occur at a series of stops (points) along the route and are generally called point-transects. Transects themselves are the basic sampling unit when pointSurvey=FALSE and are synonymous with sites in this case. Transects may contain multiple sampling units (i.e., points) when pointSurvey=TRUE. For line-transects, the transectID column(s) alone is sufficient to specify unique sample sites. For point-transects, the amalgamation of transectID and pointID specify unique sampling sites. See Input data frames below.

pointID

When point-transects are used, this is the ID of points on a transect. When pointSurvey=TRUE, the amalgamation of transectID and pointID specify unique sampling sites. See Input data frames.

If single points are surveyed, meaning surveyed points were not grouped into transects, each 'transect' consists of one point. In this case, set transectID equal to the point's ID and set pointID equal to 1 for all points.

length

Character string specifying the (single) column in siteData that contains transect length. This is ignored if pointSurvey = TRUE.

control

A list containing optimization control parameters such as the maximum number of iterations, tolerance, the optimizer to use, etc. See the RdistanceControls function for explanation of each value, the defaults, and the requirements for this list. See examples below for how to change controls.

Value

An object of class 'dfunc'. Objects of class 'dfunc' are lists containing the following components:

parameters

The vector of estimated parameter values. Length of this vector for built-in likelihoods is one (for the function's parameter) plus the number of expansion terms plus one if the likelihood is either 'hazrate' or 'uniform' (hazrate and uniform have two parameters).

varcovar

The variance-covariance matrix for coefficients of the distance function, estimated by the inverse of the Hessian of the fit evaluated at the estimates. There is no guarantee this matrix is positive-definite and should be viewed with caution. Error estimates derived from bootstrapping are generally more reliable.

loglik

The maximized value of the log likelihood (more specifically, the minimized value of the negative log likelihood).

convergence

The convergence code. This code is returned by optim. Values other than 0 indicate suspect convergence.

like.form

The name of the likelihood. This is the value of the argument likelihood.

w.lo

Left-truncation value used during the fit.

w.hi

Right-truncation value used during the fit.

dist

The input vector of observed distances.

covars

A model.matrix containing the covariates used in the fit.

expansions

The number of expansion terms used during estimation.

series

The type of expansion used during estimation.

call

The original call of this function.

call.x.scl

The distance at which the distance function is scaled. This is the x at which g(x) = g.x.scl. Normally, call.x.scl = 0.

call.g.x.scl

The value of the distance function at distance call.x.scl. Normally, call.g.x.scl = 1.

call.observer

The value of input parameter observer.

fit

The fitted object returned by optim. See documentation for optim.

factor.names

The names of any factors in formula

pointSurvey

The input value of pointSurvey. This is TRUE if distances are radial from a point. FALSE if distances are perpendicular off-transect.

formula

The formula specified for the detection function.

Input data frames

To save space and to easily specify sites without detections, all site ID's, regardless of whether a detection occurred there, and site level covariates are stored in the siteData data frame. Detection distances and group sizes are measured at the detection level and are stored in the detectionData data frame.

Data frame requirements

The following explains conditions under which various combinations of the input data frames are required.

Detection data and site data both required: Both detectionData and siteData are required if site level covariates are specified on the right-hand side of formula. Detection level covariates are not currently allowed.
Detection data only required: The detectionData data frame alone can be specified if no covariates are included in the distance function (i.e., right-hand side of formula is "~1"). Note that this routine (dfuncEstim) does not need to know about sites where zero targets were detected, hence siteData can be missing when no covariates are involved.
Neither detection data nor site data required Neither detectionData nor siteData are required if all variables specified in formula are within the scope of this routine (e.g., in the global working environment). Scoping rules here work the same as for other modeling routines in R such as lm and glm. Like other modeling routines, it is possible to mix and match the location of variables in the model. Some variables can be in the .GlobalEnv while others are in either detectionData or siteData.

Relationship between data frames (transect and point ID's)

The input data frames, detectionData and siteData, must be merge-able on unique sites. For line-transects, site ID's specify transects or routes and are unique values of the transectID column in siteData. In this case, the following merge must work: merge(detectionData,siteData,by=transectID).

For point-transects, site ID's specify individual points are unique values of the combination paste(transectID,pointID). In this case, the following merge must work: merge(detectionData,siteData,by=c(transectID, pointID).

By default,transectID and pointID are NULL and the merge is done on all common columns. That is, when transectID is NULL, this routine assumes unique transects are specified by unique combinations of the common variables (i.e., unique values of intersect(names(detectionData), names(siteData))).

An error occurs if there are no common column names between detectionData and siteData. Duplicate site IDs are not allowed in siteData. If the same site is surveyed in multiple years, specify another transect ID column (e.g., transectID = c("year","transectID")). Duplicate site ID's are allowed in detectionData.

To help envision the relationship between data frames, bear in mind that during bootstrap estimation of variance in abundEstim, unique transects (i.e., unique values of the transect ID column(s)), not detections or points, are resampled with replacement.

Likelihood functions

Given a specified sighting function (e.g., "halfnorm"), maximum likelihood is used to estimate the parameter(s) of the function (e.g., standard error) that best fit the distance data.

When plotted (see Examples), histogram bins are plotted behind the detection function for visualization; however, the function is fit to the actual data, not to the bins.

References

Buckland, S.T., D.R. Anderson, K.P. Burnham, J.L. Laake, D.L. Borchers, and L. Thomas. (2001) Introduction to distance sampling: estimating abundance of biological populations. Oxford University Press, Oxford, UK.

Examples

Run this code

# NOT RUN {
# Load example sparrow data (line transect survey type)
data(sparrowDetectionData)
data(sparrowSiteData)


# Fit half-normal detection function
dfunc <- dfuncEstim(formula=dist~1,
                    detectionData=sparrowDetectionData,
                    likelihood="halfnorm", w.hi=100)

# Fit a second half-normal detection function, now including
# a categorical covariate for observer who surveyed the site (factor, 5 levels)
# Increase maximum iterations
dfuncObs <- dfuncEstim(formula=dist~observer,
                       detectionData=sparrowDetectionData,
                       siteData=sparrowSiteData,
                       likelihood="halfnorm", w.hi=100, pointSurvey=FALSE,
                       control=RdistanceControls(maxIter=1000))

# Print results
# And plot the detection function for each observer
dfuncObs
plot(dfuncObs,
     newdata=data.frame(observer=levels(sparrowSiteData$observer)))
     
# Show some plotting options
plot(dfuncObs, 
   newdata=data.frame(observer=levels(sparrowSiteData$observer)), 
   vertLines = FALSE, lty=c(1,1), 
   col.dfunc=heat.colors(length(levels(sparrowSiteData$observer))), 
   col=c("grey","lightgrey"), border=NA, 
   xlab="Distance (m)",
   main="Showing plot options")


# }

Run the code above in your browser using DataLab

Description

Usage

Arguments

Value

Input data frames

Data frame requirements

Relationship between data frames (transect and point ID's)

Likelihood functions

References

See Also

Examples