rms (version 2.0-2)

validate.rpart: Dxy and Mean Squared Error by Cross-validating a Tree Sequence

Description

Uses xval-fold cross-validation of a sequence of trees to derive estimates of the mean squared error and Somers' Dxy rank correlation between predicted and observed responses. In the case of a binary response variable, the mean squared error is the Brier accuracy score. There are print and plot methods for objects created by validate.rpart.

Usage

# f <- rpart(formula=y ~ x1 + x2 + ...) # or rpart
## S3 method for class 'rpart':
validate(fit, method, B, bw, rule, type, sls, aics, pr=TRUE,
    k, rand, xval=10, FUN, \dots)
## S3 method for class 'validate.rpart':
print(x, \dots)
## S3 method for class 'validate.rpart':
plot(x, what=c("mse","dxy"), legendloc=locator, ...)

Arguments

fit
an object created by rpart. You must have specified the model=TRUE argument to rpart.
method
B
bw
rule
type
sls
aics
are there only for consistency with the generic validate function; these are ignored
x
the result of validate.rpart
k
a sequence of cost/complexity values. By default these are obtained from calling FUN with no optional arguments or from the rpart cptable object in the original fit object. You may also specify a scalar or vec
rand
a random sample (usually omitted)
xval
number of splits
FUN
the name of a function which produces a sequence of trees, such prune.
...
additional arguments to FUN (ignored by print,plot).
pr
set to FALSE to prevent intermediate results for each k to be printed
what
a vector of things to plot. By default, 2 plots will be done, one for mse and one for Dxy.
legendloc
a function that is evaluated with a single argument equal to 1 to generate a list with components x, y specifying coordinates of the upper left corner of a legend, or a 2-vector. For the latter, legendloc spec

Value

  • a list of class "validate.rpart" with components named k, size, dxy.app, dxy.val, mse.app, mse.val, binary, xval. size is the number of nodes, dxy refers to Somers' D, mse refers to mean squared error of prediction, app means apparent accuracy on training samples, val means validated accuracy on test samples, binary is a logical variable indicating whether or not the response variable was binary (a logical or 0/1 variable is binary). size will not be present if the user specifies k.

Side Effects

prints if pr=TRUE

concept

  • model validation
  • predictive accuracy

See Also

rpart, somers2, rcorr.cens, locator, legend

Examples

Run this code
n <- 100
set.seed(1)
x1 <- runif(n)
x2 <- runif(n)
x3 <- runif(n)
y  <- 1*(x1+x2+rnorm(n) > 1)
table(y)
require(rpart)
f <- rpart(y ~ x1 + x2 + x3, model=TRUE)
v <- validate(f)
v    # note the poor validation
par(mfrow=c(1,2))
plot(v, legendloc=c(.2,.5))
par(mfrow=c(1,1))

Run the code above in your browser using DataCamp Workspace