Uses xval
-fold cross-validation of a sequence of trees to derive
estimates of the mean squared error and Somers' Dxy
rank correlation
between predicted and observed responses. In the case of a binary response
variable, the mean squared error is the Brier accuracy score. For
survival trees, Dxy
is negated so that larger is better.
There are print
and plot
methods for
objects created by validate.rpart
.
# f <- rpart(formula=y ~ x1 + x2 + \dots) # or rpart
# S3 method for rpart
validate(fit, method, B, bw, rule, type, sls, aics,
force, estimates, pr=TRUE,
k, rand, xval=10, FUN, …)
# S3 method for validate.rpart
print(x, …)
# S3 method for validate.rpart
plot(x, what=c("mse","dxy"), legendloc=locator, …)
an object created by rpart
. You must have specified the
model=TRUE
argument to rpart
.
are there only for consistency with the generic validate
function; these are ignored
the result of validate.rpart
a sequence of cost/complexity values. By default these are obtained
from calling FUN
with no optional arguments or
from the rpart
cptable
object in the original fit object.
You may also specify a scalar or vector.
a random sample (usually omitted)
number of splits
the name of a function which produces a sequence of trees, such
prune
.
additional arguments to FUN
(ignored by print,plot
).
set to FALSE
to prevent intermediate results for each k
to be printed
a vector of things to plot. By default, 2 plots will be done, one for
mse
and one for Dxy
.
a function that is evaluated with a single argument equal to 1
to
generate a list with components x, y
specifying coordinates of the
upper left corner of a legend, or a 2-vector. For the latter,
legendloc
specifies the relative fraction of the plot at which to
center the legend.
a list of class "validate.rpart"
with components named k, size, dxy.app
,
dxy.val, mse.app, mse.val, binary, xval
. size
is the number of nodes,
dxy
refers to Somers' D
, mse
refers to mean squared error of prediction,
app
means apparent accuracy on training samples, val
means validated
accuracy on test samples, binary
is a logical variable indicating whether
or not the response variable was binary (a logical or 0/1 variable is
binary). size
will not be present if the user specifies k
.
prints if pr=TRUE
# NOT RUN {
n <- 100
set.seed(1)
x1 <- runif(n)
x2 <- runif(n)
x3 <- runif(n)
y <- 1*(x1+x2+rnorm(n) > 1)
table(y)
require(rpart)
f <- rpart(y ~ x1 + x2 + x3, model=TRUE)
v <- validate(f)
v # note the poor validation
par(mfrow=c(1,2))
plot(v, legendloc=c(.2,.5))
par(mfrow=c(1,1))
# }
Run the code above in your browser using DataLab