folds_control(type = c("kfold", "subsampling", "bootstrap"),
K = ifelse(type == "kfold", 5, 30),
prob = 0.5, weights = c("case", "freq"),
seed = NULL)## S3 method for class 'tvcm':
cvloss(object, folds = folds_control(),
fun = NULL, dfpar = NULL, direction = c("backward", "forward"),
papply = mclapply, verbose = FALSE, ...)
## S3 method for class 'cvloss.tvcm':
print(x, ...)
## S3 method for class 'cvloss.tvcm':
plot(x, legend = TRUE, details = TRUE, ...)
## S3 method for class 'tvcm':
oobloss(object, newdata = NULL, weights = NULL,
fun = NULL, ...)
## S3 method for class 'tvcm':
prune(tree, dfsplit = NULL, dfpar = NULL,
direction = c("backward", "forward"),
alpha = NULL, maxstep = NULL, terminal = NULL,
papply = mclapply, keeploss = FALSE, original,...)
cvloss.tvcm
as produced by
"subsampling"
cross-validation scheme.object
are case weights or
frequencies of cases; for object
is applied.NULL
, the value of
dfpar
of the partitioning stage is "backward"
(the default) or
"forward"
. Indicates the pruning algorithm to be
used. "backward"
applies backward pruning where in each
iteration the inner node that produces the smallest per-node
TRUE
verbose output is
generated during the validation.dfsplit
with
which the partitions are to be cross-validated. If no dfsplit
is specified (default), the parameter is ignored for pruning.sctest = TRUE
, see cvloss.tvcm
object with the following essential components:dfsplit
and
nsplit
. Specifies the grid of values at which the
cross-validated loss was evaluated.dfsplit
and
nsplit
. The cross-validated loss for each fold corresponding
to the values in grid
.newdata
validation set and sctest = FALSE
) is a two stage procedure that
first grows overly fine partitions and second selects the best-sized
partitions by pruning. Both steps can be carried out with a single
The dfsplit
to minimize the estimated in-sample
prediction error. The in-sample prediction error is, in what follows,
defined as the mean of the in-sample loss plus dfpar
times the
number of coefficients plus dfsplit
times the number of
splits. In the common likelihood setting, the loss is equal - 2 times
the maximum likelihood and dfpar = 2
. The per-split penalty
dfsplit
generally unknown and estimated by using
cross-validation.
dfsplit
by cross-validation. The function
type = "subsampling"
(random draws
without replacement) and type = "bootstrap"
(random draws with
replacement). For 2-stage models (with random-effects) fitted by
weights = "freq"
should be used.
dfsplit
. Out-of-bag loss refers here to the prediction error
based on a loss function, which is typically the -2 log-likelihood
error (see the details for oobloss
below). Commonly,
dfsplit
is used for backward pruning (direction =
"backward"
), but it is also possible to cross-validate dfsplit
for premature stopping (direction = "forward"
, see argument
dfsplit
in
print
and a plot
generic is
provided. The proposed estimate for dfsplit
is the one that
minimizes the validated loss and can be extracted from component
dfsplit.min
.
newdata
argument). By default, the loss is defined as the sum of deviance
residuals, see the return value dev.resids
of
fun
,
see the examples below. In general the sum of deviance residual is
equal the -2 log-likelihood. A special case is the gaussian family,
where the deviance residuals are computed as $\sum_{i=1}^N w_i
(y_i-\mu)^2$ that is, the deviance residuals ignore the term
$\log{2\pi\sigma^2}$. Therefore, the sum of deviance residuals for
the gaussian model (and possibly others) is not exactly the -2
log-likelihood prediction error but shifted by a constant. Another
special case are models with random effects. For models based on
The dfsplit
times the number of splits. Pruning with direction = "backward"
works as follows: In each iteration, all nested models of the current
model are evaluated, i.e. models which collapse one of the inner nodes
of the current model. The inner node that yields the smallest increase
in the estimated prediction error is collapsed and the resulting model
substitutes the current model. The algorithm is stopped as soon as all
nested models have a higher estimated prediction error than the
current model, which will be returned.
T. Hastie, R. Tibshirani, J. Friedman (2001), The elements of statistical learning, Springer.
## --------------------------------------------------------- #
## Dummy Example 1:
##
## Model selection for the 'vcrpart_2' data. The example is
## merely a syntax template.
## --------------------------------------------------------- #
## load the data
data(vcrpart_2)
## fit the model
control <- tvcm_control(maxstep = 2L, minsize = 5L, cv = FALSE)
model <- tvcglm(y ~ vc(z1, z2, by = x1) + vc(z1, by = x2),
data = vcrpart_2, family = gaussian(),
control = control, subset = 1:75)
## cross-validate 'dfsplit'
cv <- cvloss(model, folds = folds_control(type = "kfold", K = 2, seed = 1))
cv
plot(cv)
## out-of-bag error
oobloss(model, newdata = vcrpart_2[76:100,])
## use an alternative loss function
rfun <- function(y, mu, wt) sum(abs(y - mu))
oobloss(model, newdata = vcrpart_2[76:100,], fun = rfun)
Run the code above in your browser using DataLab