folds_control(type = c("kfold", "subsampling", "bootstrap"),
K = ifelse(type == "kfold", 5, 30),
prob = 0.5, weights = c("case", "freq"),
seed = NULL)## S3 method for class 'tvcm':
cvloss(object, folds = folds_control(),
fun = NULL, dfpar = NULL, direction = c("backward", "forward"),
papply = mclapply, verbose = FALSE, ...)
## S3 method for class 'cvloss.tvcm':
print(x, ...)
## S3 method for class 'cvloss.tvcm':
plot(x, legend = TRUE, details = TRUE, ...)
## S3 method for class 'tvcm':
oobloss(object, newdata = NULL, weights = NULL,
fun = NULL, ...)
## S3 method for class 'tvcm':
prune(tree, dfsplit = NULL, dfpar = NULL,
direction = c("backward", "forward"),
alpha = NULL, maxstep = NULL, terminal = NULL,
papply = mclapply, keeploss = FALSE, original,...)
cvloss.tvcm as produced by
"subsampling" cross-validation scheme.object are case weights or
frequencies of cases; for object is applied.NULL, the value of
dfpar of the partitioning stage is "backward" (the default) or
"forward". Indicates the pruning algorithm to be
used. "backward" applies backward pruning where in each
iteration the inner node that produces the smallest per-node
TRUE verbose output is
generated during the validation.dfsplit with
which the partitions are to be cross-validated. If no dfsplit
is specified (default), the parameter is ignored for pruning.sctest = TRUE, see cvloss.tvcm object with the following essential components:dfsplit and
nsplit. Specifies the grid of values at which the
cross-validated loss was evaluated.dfsplit and
nsplit. The cross-validated loss for each fold corresponding
to the values in grid.newdata
validation set and sctest = FALSE) is a two stage procedure that
first grows overly fine partitions and second selects the best-sized
partitions by pruning. Both steps can be carried out with a single
The dfsplit to minimize the estimated in-sample
prediction error. The in-sample prediction error is, in what follows,
defined as the mean of the in-sample loss plus dfpar times the
number of coefficients plus dfsplit times the number of
splits. In the common likelihood setting, the loss is equal - 2 times
the maximum likelihood and dfpar = 2. The per-split penalty
dfsplit generally unknown and estimated by using
cross-validation.
dfsplit by cross-validation. The function
type = "subsampling" (random draws
without replacement) and type = "bootstrap" (random draws with
replacement). For 2-stage models (with random-effects) fitted by
weights = "freq"
should be used.
dfsplit. Out-of-bag loss refers here to the prediction error
based on a loss function, which is typically the -2 log-likelihood
error (see the details for oobloss below). Commonly,
dfsplit is used for backward pruning (direction =
"backward"), but it is also possible to cross-validate dfsplit
for premature stopping (direction = "forward", see argument
dfsplit in
print and a plot generic is
provided. The proposed estimate for dfsplit is the one that
minimizes the validated loss and can be extracted from component
dfsplit.min.
newdata
argument). By default, the loss is defined as the sum of deviance
residuals, see the return value dev.resids of
fun,
see the examples below. In general the sum of deviance residual is
equal the -2 log-likelihood. A special case is the gaussian family,
where the deviance residuals are computed as $\sum_{i=1}^N w_i
(y_i-\mu)^2$ that is, the deviance residuals ignore the term
$\log{2\pi\sigma^2}$. Therefore, the sum of deviance residuals for
the gaussian model (and possibly others) is not exactly the -2
log-likelihood prediction error but shifted by a constant. Another
special case are models with random effects. For models based on
The dfsplit
times the number of splits. Pruning with direction = "backward"
works as follows: In each iteration, all nested models of the current
model are evaluated, i.e. models which collapse one of the inner nodes
of the current model. The inner node that yields the smallest increase
in the estimated prediction error is collapsed and the resulting model
substitutes the current model. The algorithm is stopped as soon as all
nested models have a higher estimated prediction error than the
current model, which will be returned.
T. Hastie, R. Tibshirani, J. Friedman (2001), The elements of statistical learning, Springer.
## --------------------------------------------------------- #
## Dummy Example 1:
##
## Model selection for the 'vcrpart_2' data. The example is
## merely a syntax template.
## --------------------------------------------------------- #
## load the data
data(vcrpart_2)
## fit the model
control <- tvcm_control(maxstep = 2L, minsize = 5L, cv = FALSE)
model <- tvcglm(y ~ vc(z1, z2, by = x1) + vc(z1, by = x2),
data = vcrpart_2, family = gaussian(),
control = control, subset = 1:75)
## cross-validate 'dfsplit'
cv <- cvloss(model, folds = folds_control(type = "kfold", K = 2, seed = 1))
cv
plot(cv)
## out-of-bag error
oobloss(model, newdata = vcrpart_2[76:100,])
## use an alternative loss function
rfun <- function(y, mu, wt) sum(abs(y - mu))
oobloss(model, newdata = vcrpart_2[76:100,], fun = rfun)Run the code above in your browser using DataLab