# validate.rpart

From rms v2.0-2
by Frank E Harrell Jr

##### Dxy and Mean Squared Error by Cross-validating a Tree Sequence

Uses `xval`

-fold cross-validation of a sequence of trees to derive
estimates of the mean squared error and Somers' `Dxy`

rank correlation
between predicted and observed responses. In the case of a binary response
variable, the mean squared error is the Brier accuracy score.
There are `print`

and `plot`

methods for
objects created by `validate.rpart`

.

##### Usage

```
# f <- rpart(formula=y ~ x1 + x2 + ...) # or rpart
## S3 method for class 'rpart':
validate(fit, method, B, bw, rule, type, sls, aics, pr=TRUE,
k, rand, xval=10, FUN, \dots)
## S3 method for class 'validate.rpart':
print(x, \dots)
## S3 method for class 'validate.rpart':
plot(x, what=c("mse","dxy"), legendloc=locator, ...)
```

##### Arguments

- fit
- an object created by
`rpart`

. You must have specified the`model=TRUE`

argument to`rpart`

. - method
- B
- bw
- rule
- type
- sls
- aics
- are there only for consistency with the generic
`validate`

function; these are ignored - x
- the result of
`validate.rpart`

- k
- a sequence of cost/complexity values. By default these are obtained
from calling
`FUN`

with no optional arguments or from the`rpart`

`cptable`

object in the original fit object. You may also specify a scalar or vec - rand
- a random sample (usually omitted)
- xval
- number of splits
- FUN
- the name of a function which produces a sequence of trees, such
`prune`

. - ...
- additional arguments to
`FUN`

(ignored by`print,plot`

). - pr
- set to
`FALSE`

to prevent intermediate results for each`k`

to be printed - what
- a vector of things to plot. By default, 2 plots will be done, one for
`mse`

and one for`Dxy`

. - legendloc
- a function that is evaluated with a single argument equal to
`1`

to generate a list with components`x, y`

specifying coordinates of the upper left corner of a legend, or a 2-vector. For the latter,`legendloc`

spec

##### Value

- a list of class
`"validate.rpart"`

with components named`k, size, dxy.app`

,`dxy.val, mse.app, mse.val, binary, xval`

.`size`

is the number of nodes,`dxy`

refers to Somers'`D`

,`mse`

refers to mean squared error of prediction,`app`

means apparent accuracy on training samples,`val`

means validated accuracy on test samples,`binary`

is a logical variable indicating whether or not the response variable was binary (a logical or 0/1 variable is binary).`size`

will not be present if the user specifies`k`

.

##### Side Effects

prints if `pr=TRUE`

##### concept

- model validation
- predictive accuracy

##### See Also

##### Examples

```
n <- 100
set.seed(1)
x1 <- runif(n)
x2 <- runif(n)
x3 <- runif(n)
y <- 1*(x1+x2+rnorm(n) > 1)
table(y)
require(rpart)
f <- rpart(y ~ x1 + x2 + x3, model=TRUE)
v <- validate(f)
v # note the poor validation
par(mfrow=c(1,2))
plot(v, legendloc=c(.2,.5))
par(mfrow=c(1,1))
```

