## S3 method for class 'tvcm':
prune(tree, cp = NULL, alpha = NULL, maxstep = NULL,
terminal = NULL, original = FALSE, ...)folds_control(type = c("kfold", "subsampling", "bootstrap"),
K = ifelse(type == "kfold", 5, 30),
prob = 0.5, weights = c("case", "freq"),
seed = NULL)
## S3 method for class 'tvcm':
cvloss(object, folds = folds_control(), ...)
## S3 method for class 'cvloss.tvcm':
print(x, ...)
## S3 method for class 'cvloss.tvcm':
plot(x, legend = TRUE, details = TRUE, ...)
## S3 method for class 'tvcm':
oobloss(object, newdata = NULL, weights = NULL,
fun = NULL, ...)
cvloss.tvcm as produced by
"subsampling" cross-validation scheme.object are case weights or
frequencies of cases; for object is applied.sctest = TRUE, see cvloss.tvcm
object with at least the following components:cp.grid for each fold.newdata. In normal practice, the cp. The
aim of pruning by cp is to collapse inner nodes to minimize the
cost-complexity criterion
$$error(cp) = error(tree) + cp * complexity(tree)$$
whereby, the training error $error(tree)$ is defined by lossfun
and $complexity(tree)$ is defined as the total number of coefficients times
dfpar plus the total number of splits times dfsplit. The function
lossfun and the parameters dfpar and dfsplit are defined
by the control argument of
subtreemodels that collapse one inner node of the
currenttreemodel.subtreemodelsdev<cpthen set as thetreemodel
thesubtreethat minimizesdevand repeated 1 to 3,
otherwise stop. The penalty cp is generally unknown and is estimated adaptively from
the data. cp. Compute for eachcpthe average validation error.
Doing so yields for each fold a sequence of values for cp and a
sequence of average validation errors. The obtained sequences for cp
are combined to a fine grid and the average validation error is averaged
correspondingly. From these two sequences we choose the cp that
minimizes the validation error. Notice that the average validation error
is computed as the total prediction error of the validation set divided
by the sum of validation set weights. See also the argument ooblossfun in
The function type = "subsampling"
(random draws without replacement) and type = "bootstrap" (random
draws with replacement). For 2-stage models (with random-effects)
fitted by weights = "freq" should be considered. print and a plot generic is
provided.
newdata argument). By default, the loss is
defined as the sum of deviance residuals, see the return value dev.resids
of fun, see the
examples below. In general the sum of deviance residual is equal the sum of
the -2 log-likelihood errors. A special case is the gaussian family, where
the deviance residuals are computed as $\sum_{i=1}^N w_i (y_i-\mu)^2$,
that is, the deviance residuals ignore the term $log 2\pi\sigma^2$.
Therefore, the sum of deviance residuals for the gaussian model (and
possibly others) is not exactly the sum of -2 log-likelihood prediction
errors (but shifted by a constant). Another special case are models with
random effects. For models based on
T. Hastie, R. Tibshirani, J. Friedman (2001), The elements of statistical learning, Springer.
## --------------------------------------------------------- #
## Dummy Example 1:
##
## Model selection for the 'vcrpart_2' data. The example is
## merely a syntax template.
## --------------------------------------------------------- #
## load the data
data(vcrpart_2)
## fit the model
control <- tvcm_control(maxstep = 2L, minsize = 5L, cv = FALSE)
model <- tvcglm(y ~ vc(z1, z2, by = x1) + vc(z1, by = x2),
data = vcrpart_2, family = gaussian(),
control = control, subset = 1:75)
## cross-validate 'dfsplit'
cv <- cvloss(model, folds = folds_control(type = "kfold", K = 2, seed = 1))
cv
plot(cv)
## out-of-bag error
oobloss(model, newdata = vcrpart_2[76:100,])
## use an alternative loss function
rfun <- function(y, mu, wt) sum(abs(y - mu))
oobloss(model, newdata = vcrpart_2[76:100,], fun = rfun)Run the code above in your browser using DataLab