Fit a specific detection function off-transect or off-point (radial) distances.
dfuncEstim(formula, detectionData, siteData, likelihood = "halfnorm",
pointSurvey = FALSE, w.lo = 0, w.hi = NULL, expansions = 0,
series = "cosine", x.scl = 0, g.x.scl = 1, observer = "both",
warn = TRUE, transectID = NULL, pointID = "point",
length = "length", control = RdistanceControls())
A standard formula object (e.g., dist ~ 1
,
dist ~ covar1 + covar2
). The left-hand side (before ~
)
is the name of the vector containing distances (off-transect or
radial). The right-hand side (after ~
)
contains the names of covariate vectors to fit in the detection
function. If covariates do not appear in data
, they must
be found in the parent frame (similar to lm
, glm
, etc.)
A data frame containing detection distances (either perpendicular for line-transect or radial for point-transect designs), with one row per detected object or group. This data frame must contain at least the following information:
Detection Distances: A single column containing
detection distances must be specified on the left-hand
side of formula
.
Site IDs: The ID of the transect or point
(i.e., the 'site') where each object or group was detected.
The site ID column(s) (see arguments transectID
and
pointID
) must
specify the site (transect or point) so that this
data frame can be merged with siteData
.
Optionally, this data frame can contain the following variables:
Group Sizes: The number of individuals in the group
associated with each detection. If unspecified, Rdistance
assumes all detections are of single individuals (i.e.,
all group sizes are 1).
When Rdistance
allows detection-level
covariates, detection-level
covariates will appear in this data frame.
See example data set sparrowDetectionData
).
See also Input data frames below
for information on when detectionData
and
siteData
are required inputs.
A data.frame containing site (transect or point)
IDs and any
site level covariates to include in the detection function.
Every unique surveyed site (transect or point) is represented on
one row of this data set, whether or not targets were sighted
at the site. See arguments transectID
and
pointID
for an explanation of site and transect ID's.
If sites are transects,
this data frame must also contain transect length. By
default, transect length is assumed to be in column 'length'
but can be specified using argument length
.
The total number of sites surveyed is nrow(siteData)
.
Duplicate site-level IDs are not allowed in siteData
.
See Input data frames
for when detectionData
and siteData
are required inputs.
String specifying the likelihood to fit. Built-in likelihoods at present are "uniform", "halfnorm", "hazrate", "negexp", and "Gamma". See vignette for a way to use user-define likelihoods.
A logical scalar specifying whether input data come from point-transect surveys (TRUE), or line-transect surveys (FALSE).
Lower or left-truncation limit of the distances in distance data. This is the minimum possible off-transect distance. Default is 0.
Upper or right-truncation limit of the distances
in dist
. This is the maximum off-transect distance that
could be observed. If left unspecified (i.e., at the default of
NULL), right-truncation is set to the maximum of the observed
distances.
A scalar specifying the number of terms
in series
to compute. Depending on the series,
this could be 0 through 5. The default of 0 equates
to no expansion terms of any type. No expansion terms
are allowed (i.e., expansions
is forced to 0) if
covariates are present in the detection function
(i.e., right-hand side of formula
includes
something other than 1
).
If expansions
> 0, this string
specifies the type of expansion to use. Valid values at
present are 'simple', 'hermite', and 'cosine'.
This parameter is passed to F.gx.estim
.
See F.gx.estim
documentation for definition.
This parameter is passed to F.gx.estim
.
See F.gx.estim
documentation for definition.
This parameter is passed to F.gx.estim
.
See F.gx.estim
documentation for definition.
A logical scalar specifying whether to issue
an R warning if the estimation did not converge or if one
or more parameter estimates are at their boundaries.
For estimation, warn
should generally be left at
its default value of TRUE
. When computing bootstrap
confidence intervals, setting warn = FALSE
turns off annoying warnings when an iteration does
not converge. Regardless of warn
, messages about
convergence and boundary conditions are printed
by print.dfunc
, print.abund
, and
plot.dfunc
, so there should be little harm in
setting warn = FALSE
.
A character vector naming the transect ID column(s) in
detectionData
and siteData
. Rdistance
accommodates two kinds of transects: continuous and point.
When continuous transects are used, detections can occur at
any point along the route and these are generally called
line-transects. When point transects are used,
detections can only occur at a series of stops (points)
along the route and are generally called point-transects.
Transects themselves are the
basic sampling unit when pointSurvey
=FALSE and
are synonymous with sites in this case. Transects
may contain multiple sampling
units (i.e., points) when pointSurvey
=TRUE.
For line-transects, the transectID
column(s) alone is
sufficient to specify unique sample sites.
For point-transects, the amalgamation of transectID
and
pointID
specify unique sampling sites.
See Input data frames below.
When point-transects are used, this is the
ID of points on a transect. When pointSurvey
=TRUE,
the amalgamation of transectID
and
pointID
specify unique sampling sites.
See Input data frames.
If single points are surveyed,
meaning surveyed points were not grouped into transects, each 'transect' consists
of one point. In this case, set transectID
equal to
the point's ID and set pointID
equal to 1 for all points.
Character string specifying the (single) column in
siteData
that contains transect length. This is ignored if
pointSurvey
= TRUE.
A list containing optimization control parameters such
as the maximum number of iterations, tolerance, the optimizer to use,
etc. See the
RdistanceControls
function for explanation of each value,
the defaults, and the requirements for this list.
See examples below for how to change controls.
An object of class 'dfunc'. Objects of class 'dfunc' are lists containing the following components:
The vector of estimated parameter values. Length of this vector for built-in likelihoods is one (for the function's parameter) plus the number of expansion terms plus one if the likelihood is either 'hazrate' or 'uniform' (hazrate and uniform have two parameters).
The variance-covariance matrix for coefficients of the distance function, estimated by the inverse of the Hessian of the fit evaluated at the estimates. There is no guarantee this matrix is positive-definite and should be viewed with caution. Error estimates derived from bootstrapping are generally more reliable.
The maximized value of the log likelihood (more specifically, the minimized value of the negative log likelihood).
The convergence code. This code
is returned by optim
. Values other than 0 indicate suspect
convergence.
The name of the likelihood. This is
the value of the argument likelihood
.
Left-truncation value used during the fit.
Right-truncation value used during the fit.
The input vector of observed distances.
A model.matrix
containing the covariates
used in the fit.
The number of expansion terms used during estimation.
The type of expansion used during estimation.
The original call of this function.
The distance at which the distance function
is scaled. This is the x at which g(x) = g.x.scl
.
Normally, call.x.scl
= 0.
The value of the distance function at distance
call.x.scl
. Normally, call.g.x.scl
= 1.
The value of input parameter observer
.
The fitted object returned by optim
.
See documentation for optim
.
The names of any factors in formula
The input value of pointSurvey
.
This is TRUE if distances are radial from a point. FALSE
if distances are perpendicular off-transect.
The formula specified for the detection function.
To save space and to easily specify
sites without detections,
all site ID's, regardless of whether a detection occurred there,
and site level covariates are stored in
the siteData
data frame. Detection distances and group
sizes are measured at the detection level and
are stored in the
detectionData
data frame.
Detection data and site data both required:
Both detectionData
and siteData
are required if site level covariates are
specified on the right-hand side of formula
.
Detection level covariates are not currently allowed.
Detection data only required:
The detectionData
data frame alone can be
specified if no covariates
are included in the distance function (i.e., right-hand side of
formula
is "~1"). Note that this routine (dfuncEstim
)
does not need to know about sites where zero targets were detected, hence
siteData
can be missing when no covariates are involved.
Neither detection data nor site data required
Neither detectionData
nor siteData
are required if all variables specified in formula
are within the scope of this routine (e.g., in the global working
environment). Scoping rules here work the same as for other modeling
routines in R such as lm
and glm
. Like other modeling
routines, it is possible to mix and match the location of variables in
the model. Some variables can be in the .GlobalEnv
while others
are in either detectionData
or siteData
.
The input data frames, detectionData
and siteData
,
must be merge-able on unique sites. For line-transects,
site ID's specify transects or routes and are unique values of
the transectID
column in siteData
. In this case,
the following merge must work:
merge(detectionData,siteData,by=transectID)
.
For point-transects,
site ID's specify individual points are unique values
of the combination paste(transectID,pointID)
.
In this case, the following merge must work:
merge(detectionData,siteData,by=c(transectID, pointID)
.
By default,transectID
and pointID
are NULL and
the merge is done on all common columns.
That is, when transectID
is NULL, this routine assumes unique
transects are specified by unique combinations of the
common variables (i.e., unique values of
intersect(names(detectionData), names(siteData))
).
An error occurs if there are no common column names between
detectionData
and siteData
.
Duplicate site IDs are not allowed in siteData
.
If the same site is surveyed in
multiple years, specify another transect ID column (e.g., transectID =
c("year","transectID")
). Duplicate site ID's are allowed in
detectionData
.
To help envision the relationship between data frames, bear in
mind that during bootstrap estimation of variance
in abundEstim
,
unique transects (i.e., unique values of
the transect ID column(s)), not detections or
points, are resampled with replacement.
Given a specified sighting function (e.g., "halfnorm"), maximum likelihood is used to estimate the parameter(s) of the function (e.g., standard error) that best fit the distance data.
When plotted (see Examples), histogram bins are plotted behind the detection function for visualization; however, the function is fit to the actual data, not to the bins.
Buckland, S.T., D.R. Anderson, K.P. Burnham, J.L. Laake, D.L. Borchers, and L. Thomas. (2001) Introduction to distance sampling: estimating abundance of biological populations. Oxford University Press, Oxford, UK.
abundEstim
, autoDistSamp
.
See likelihood-specific help files (e.g., halfnorm.like
) for
details on each built-in likelihood. See package vignettes for information on custom,
user-defined likelihoods.
# NOT RUN {
# Load example sparrow data (line transect survey type)
data(sparrowDetectionData)
data(sparrowSiteData)
# Fit half-normal detection function
dfunc <- dfuncEstim(formula=dist~1,
detectionData=sparrowDetectionData,
likelihood="halfnorm", w.hi=100)
# Fit a second half-normal detection function, now including
# a categorical covariate for observer who surveyed the site (factor, 5 levels)
# Increase maximum iterations
dfuncObs <- dfuncEstim(formula=dist~observer,
detectionData=sparrowDetectionData,
siteData=sparrowSiteData,
likelihood="halfnorm", w.hi=100, pointSurvey=FALSE,
control=RdistanceControls(maxIter=1000))
# Print results
# And plot the detection function for each observer
dfuncObs
plot(dfuncObs,
newdata=data.frame(observer=levels(sparrowSiteData$observer)))
# Show some plotting options
plot(dfuncObs,
newdata=data.frame(observer=levels(sparrowSiteData$observer)),
vertLines = FALSE, lty=c(1,1),
col.dfunc=heat.colors(length(levels(sparrowSiteData$observer))),
col=c("grey","lightgrey"), border=NA,
xlab="Distance (m)",
main="Showing plot options")
# }
Run the code above in your browser using DataLab