## S3 method for class 'tvcm':
prune(tree, cp = NULL, alpha = NULL, maxstep = NULL,
terminal = NULL, original = FALSE, ...)## S3 method for class 'tvcm':
prunepath(tree, steps = 1L, ...)
## S3 method for class 'tvcm':
cvloss(object, folds = folds_control(), ...)
folds_control(type = c("kfold", "subsampling", "bootstrap"),
K = ifelse(type == "kfold", 5, 100),
prob = 0.5, weights = c("case", "freq"),
seed = NULL)
## S3 method for class 'cvloss.tvcm':
plot(x, legend = TRUE, details = TRUE, ...)
## S3 method for class 'tvcm':
oobloss(object, newdata = NULL, weights = NULL,
fun = NULL, ...)
sctest = TRUE
, see object
are case weights or
frequencies of cases; for "subsampling"
cross-validation scheme.cvloss.tvcm
as produced by
object
is applied.cvloss.tvcm
object with at least the following components:cp
.grid
for each fold.newdata
. The cp
. The aim of
pruning by cp
is to collapse inner nodes to minimize the
cost-complexity criterion
$$error(cp) = error(tree) + cp * complexity(tree)$$
where the training error $error(tree)$ is defined by
lossfun
and $complexity(tree)$ is defined as the total
number of coefficients times dfpar
plus the total number of
splits times dfsplit
. The function lossfun
and the
parameters dfpar
and dfsplit
are defined by the
control
argument of
subtree
models that collapse one inner node of the
currenttree
model.subtree
modelsdev
<cp
then set as thetree
model
thesubtree
that minimizesdev
and repeated 1 to 3,
otherwise stop. The penalty cp
is generally unknown and is estimated adaptively from
the data. The cp
. Compute for eachcp
the average validation error.
Doing so yields for each fold a sequence of values for cp
and
a sequence of average validation errors. These sequences are then
combined to a finer grid and the average validation error is averaged
correspondingly. From these two sequences we choose the cp
value that minimizes the validation error. Notice that the average
validation error is computed as the total prediction error of the
validation set divided by the sum of validation set weights. See also
the argument ooblossfun
in
The steps
argument. The output shows several
information on the performances when collapsing inner nodes. The node
labels shown in the output refer to the initial tree.
The function type = "subsampling"
(random draws without replacement) and type = "bootstrap"
(random
draws with replacement). For 2-stage models (with random-effects)
fitted by weights = "freq"
should be considered. print
and a plot
generic is
provided.
newdata
argument). By default, the loss is
defined as the sum of deviance residuals, see the return value dev.resids
of fun
, see the
examples below. In general the sum of deviance residual is equal the sum of
the -2 log-likelihood errors. A special case is the gaussian family, where
the deviance residuals are computed as $\sum_{i=1}^N w_i (y_i-\mu)^2$,
that is, the deviance residuals ignore the term $log 2\pi\sigma^2$.
Therefore, the sum of deviance residuals for the gaussian model (and
possibly others) is not exactly the sum of -2 log-likelihood prediction
errors (but shifted by a constant). Another special case are models with
random effects. For models based on
Hastie, T., R. Tibshirani and J. Friedman (2001). The Elements of Statistical Learning (2 ed.). New York, USA: Springer-Verlag.
## --------------------------------------------------------- #
## Dummy Example 1:
##
## Model selection for the 'vcrpart_2' data. The example is
## merely a syntax template.
## --------------------------------------------------------- #
## load the data
data(vcrpart_2)
## fit the model
control <- tvcm_control(maxstep = 2L, minsize = 5L, cv = FALSE)
model <- tvcglm(y ~ vc(z1, z2, by = x1) + vc(z1, by = x2),
data = vcrpart_2, family = gaussian(),
control = control, subset = 1:75)
## cross-validate 'dfsplit'
cv <- cvloss(model, folds = folds_control(type = "kfold", K = 2, seed = 1))
cv
plot(cv)
## prune model with estimated 'cp'
model.p <- prune(model, cp = cv$cp.hat)
## backtrack pruning
prunepath(model.p, steps = 1:3)
## out-of-bag error
oobloss(model, newdata = vcrpart_2[76:100,])
## use an alternative loss function
rfun <- function(y, mu, wt) sum(abs(y - mu))
oobloss(model, newdata = vcrpart_2[76:100,], fun = rfun)
Run the code above in your browser using DataLab