Last chance! 50% off unlimited learning
Sale ends in
Performs maximum likelihood calibration for constrained quadratic and additive ordination models (CQO and CAO models are better known as QRR-VGLMs and RR-VGAMs respectively).
calibrate.qrrvglm(object, newdata = NULL,
type = c("latvar", "predictors", "response", "vcov", "everything"),
lr.confint = FALSE, cf.confint = FALSE,
level = 0.95, initial.vals = NULL, ...)
Several methods are implemented to obtain
confidence intervals/regions for the calibration estimates.
One method is when lr.confint = TRUE
,
then a 4-column matrix is returned
with the confidence limits being the final 2 columns
(if type = "everything"
then the matrix is
returned in the lr.confint
list component).
Another similar method is when cf.confint = TRUE
.
There may be some redundancy in whatever is returned.
Other methods are returned by using type
and they are described as follows.
The argument type
determines what is returned.
If type = "everything"
then all the type
values
are returned in a list, with the following components.
Each component has length nrow(newdata)
.
Calibrated latent variables or site scores
(the default).
This may have the attribute "objectiveFunction"
which is usually the log-likelihood or the deviance.
linear/quadratic or additive predictors. For example, for Poisson families, this will be on a log scale, and for binomial families, this will be on a logit scale.
Fitted values of the response, evaluated at the calibrated latent variables.
Wald-type estimated variance-covariance matrices of the
calibrated latent variables or site scores. Actually,
these are stored in a 3-D array whose dimension is
c(Rank(object), Rank(object), nrow(newdata))
.
This type has only been implemented for
binomialff
and poissonff
models
with canonical links and noRRR = ~ 1
and,
for CQOs, I.tolerances = TRUE
or eq.tolerances = TRUE
.
The fitted CQO/CAO model.
A data frame with new response data, such as new species data. The default is to use the original data used to fit the model; however, the calibration may take a long time to compute because the computations are expensive.
Note that the creation of the model frame associated with
newdata
is fragile. Factors may not be created
properly. If a variable is binary then its best for it
to be straightforward and have only 0 and 1 as values.
What type of result to be returned.
The first are the calibrated latent variables or site scores.
This is always computed.
The "predictors"
are the linear/quadratic or additive
predictors evaluated at the calibrated latent variables or site
scores.
The "response"
are the fitted values (usually means)
evaluated at the
calibrated latent variables or site scores.
The "vcov"
are the Wald-type estimated variance-covariance
matrices of the
calibrated latent variables or site scores.
The "everything"
is for all of them, i.e., all type
s.
Note that for CAO models,
the "vcov"
type is unavailable.
Compute approximate
likelihood ratio based confidence intervals?
If TRUE
then level
is the confidence level required
and one should have type = "latvar"
or type = "everything"
;
and currently only rank-1 models are supported.
This option works for CLO and CQO models and not for CAO models.
The function uniroot
is called to solve for
the root of a nonlinear equation to obtain each confidence limit,
and this is not entirely reliable.
It is assumed that the likelihood function is unimodal
about its MLE
because only one root is returned if there is more than one.
One root is found on each side of the MLE.
Technically, the default is to find the value of the latent
variable whose difference in deviance (or twice the difference
in log-likelihoods) from the optimal model
is equal to qchisq(level, df = 1)
.
The intervals are not true profile likelihood intervals
because it is not possible to estimate the regression coefficients
of the QRR-VGLM/RR-VGLM based on one response vector.
See confint
to get the flavour of these two
arguments in general.
Compute approximate
characteristic function based confidence intervals?
If TRUE
then level
is the confidence level required
and one should have type = "latvar"
or type = "everything"
;
and currently only rank-1 models are supported.
This option works for
binomialff
and poissonff
CLO and CQO models
and not for CAO models.
The function uniroot
is called to solve for
the root of a nonlinear equation to obtain each confidence limit,
and this is not entirely reliable.
It is assumed that the likelihood function is unimodal because
only one root is returned if there is more than one.
Technically, the CDF of a normalized score statistic is
obtained by Gauss--Hermite numerical integration of a
complex-valued integrand,
and this is based on the inversion formula described in
Abate and Witt (1992).
Initial values for the search.
For rank-1 models, this should be a vector having length
equal to nrow(newdata)
, and for rank-2 models
this should be a two-column matrix with the number of rows equalling
the number of rows in newdata
.
The default is a grid defined by arguments in
calibrate.qrrvglm.control
.
Arguments that are fed into
calibrate.qrrvglm.control
.
T. W. Yee.
Recent work on the standard errors by
David Zucker and
Sam Oman at HUJI
is gratefully acknowledged---these are returned in the
vcov
component and provided inspiration for lr.confint
and cf.confint
.
A joint publication is being prepared on this subject.
This function is computationally expensive.
Setting trace = TRUE
to get a running log can be a good idea.
This function has been tested but not extensively.
Given a fitted regression CQO/CAO model, maximum likelihood calibration is theoretically easy and elegant. However, the method assumes that all the responses are independent, which is often not true in practice. More details and references are given in Yee (2018) and ch.6 of Yee (2015).
The function optim
is used to search for
the maximum likelihood solution. Good initial values are
needed, and arguments in calibrate.qrrvglm.control
allows the user some control over the choice of these.
Abate, J. and Whitt, W. (1992). The Fourier-series method for inverting transforms of probability distributions. Queueing Systems, 10, 5--88.
ter Braak, C. J. F. (1995). Calibration. In: Data Analysis in Community and Landscape Ecology by Jongman, R. H. G., ter Braak, C. J. F. and van Tongeren, O. F. R. (Eds.) Cambridge University Press, Cambridge.
if (FALSE) {
hspider[, 1:6] <- scale(hspider[, 1:6]) # Stdze environmental variables
set.seed(123)
siteNos <- c(1, 5) # Calibrate these sites
pet1 <- cqo(cbind(Pardlugu, Pardmont, Pardnigr, Pardpull, Zoraspin) ~
WaterCon + BareSand + FallTwig + CoveMoss + CoveHerb + ReflLux,
trace = FALSE,
data = hspider[-siteNos, ], # Sites not in fitted model
family = poissonff, I.toler = TRUE, Crow1positive = TRUE)
y0 <- hspider[siteNos, colnames(depvar(pet1))] # Species counts
(cpet1 <- calibrate(pet1, trace = TRUE, newdata = data.frame(y0)))
(clrpet1 <- calibrate(pet1, lr.confint = TRUE, newdata = data.frame(y0)))
(ccfpet1 <- calibrate(pet1, cf.confint = TRUE, newdata = data.frame(y0)))
(cp1wald <- calibrate(pet1, newdata = y0, type = "everything"))
}
if (FALSE) {
# Graphically compare the actual site scores with their calibrated
# values. 95 percent likelihood-based confidence intervals in green.
persp(pet1, main = "Site scores: solid=actual, dashed=calibrated",
label = TRUE, col = "gray50", las = 1)
# Actual site scores:
xvars <- rownames(concoef(pet1)) # Variables comprising the latvar
est.latvar <- as.matrix(hspider[siteNos, xvars]) %*% concoef(pet1)
abline(v = est.latvar, col = seq(siteNos))
abline(v = cpet1, lty = 2, col = seq(siteNos)) # Calibrated values
arrows(clrpet1[, 3], c(60, 60), clrpet1[, 4], c(60, 60), # Add CIs
length = 0.08, col = "orange", angle = 90, code = 3, lwd = 2)
arrows(ccfpet1[, 3], c(70, 70), ccfpet1[, 4], c(70, 70), # Add CIs
length = 0.08, col = "limegreen", angle = 90, code = 3, lwd = 2)
arrows(cp1wald$latvar - 1.96 * sqrt(cp1wald$vcov), c(65, 65),
cp1wald$latvar + 1.96 * sqrt(cp1wald$vcov), c(65, 65), # Wald CIs
length = 0.08, col = "blue", angle = 90, code = 3, lwd = 2)
legend("topright", lwd = 2,
leg = c("CF interval", "Wald interval", "LR interval"),
col = c("limegreen", "blue", "orange"), lty = 1)
}
Run the code above in your browser using DataLab