rpart
Recursive Partitioning and Regression Trees
Fit a rpart
model
 Keywords
 tree
Usage
rpart(formula, data, weights, subset, na.action = na.rpart, method, model = FALSE, x = FALSE, y = TRUE, parms, control, cost, ...)
Arguments
 formula
 a formula, with a response but no interaction
terms. If this a a data frome, that is taken as the model frame
(see
model.frame).
 data
 an optional data frame in which to interpret the variables named in the formula.
 weights
 optional case weights.
 subset
 optional expression saying that only a subset of the rows of the data should be used in the fit.
 na.action
 the default action deletes all observations for which
y
is missing, but keeps those in which one or more predictors are missing.  method
 one of
"anova"
,"poisson"
,"class"
or"exp"
. Ifmethod
is missing then the routine tries to make an intelligent guess. Ify
is a survival object, thenmethod = "exp"
is assumed, ify
has 2 columns thenmethod = "poisson"
is assumed, ify
is a factor thenmethod = "class"
is assumed, otherwisemethod = "anova"
is assumed. It is wisest to specify the method directly, especially as more criteria may added to the function in future. Alternatively,method
can be a list of functions namedinit
,split
andeval
. Examples are given in the file ‘tests/usersplits.R’ in the sources, and in the vignettes ‘User Written Split Functions’.  model
 if logical: keep a copy of the model frame in the result?
If the input value for
model
is a model frame (likely from an earlier call to therpart
function), then this frame is used rather than constructing new data.  x
 keep a copy of the
x
matrix in the result.  y
 keep a copy of the dependent variable in the result. If
missing and
model
is supplied this defaults toFALSE
.  parms
 optional parameters for the splitting function.
Anova splitting has no parameters.
Poisson splitting has a single parameter, the coefficient of variation of
the prior distribution on the rates. The default value is 1.
Exponential splitting has the same parameter as Poisson.
For classification splitting, the list can contain any of:
the vector of prior probabilities (component
prior
), the loss matrix (componentloss
) or the splitting index (componentsplit
). The priors must be positive and sum to 1. The loss matrix must have zeros on the diagonal and positive offdiagonal elements. The splitting index can begini
orinformation
. The default priors are proportional to the data counts, the losses default to 1, and the split defaults togini
.  control
 a list of options that control details of the
rpart
algorithm. Seerpart.control
.  cost
 a vector of nonnegative costs, one for each variable in the model. Defaults to one for all variables. These are scalings to be applied when considering splits, so the improvement on splitting on a variable is divided by its cost in deciding which split to choose.
 ...
 arguments to
rpart.control
may also be specified in the call torpart
. They are checked against the list of valid arguments.
Details
This differs from the tree
function in S mainly in its handling
of surrogate variables. In most details it follows Breiman
et. al (1984) quite closely. R package tree provides a
reimplementation of tree
.
Value

An object of class
rpart
. See rpart.object
.
References
Breiman L., Friedman J. H., Olshen R. A., and Stone, C. J. (1984) Classification and Regression Trees. Wadsworth.
See Also
Examples
library(rpart)
fit < rpart(Kyphosis ~ Age + Number + Start, data = kyphosis)
fit2 < rpart(Kyphosis ~ Age + Number + Start, data = kyphosis,
parms = list(prior = c(.65,.35), split = "information"))
fit3 < rpart(Kyphosis ~ Age + Number + Start, data = kyphosis,
control = rpart.control(cp = 0.05))
par(mfrow = c(1,2), xpd = NA) # otherwise on some devices the text is clipped
plot(fit)
text(fit, use.n = TRUE)
plot(fit2)
text(fit2, use.n = TRUE)
Community examples
# Set random seed. Don't remove this line. set.seed(1) # Take a look at the iris dataset str(iris) summary(iris) # A decision tree model has been built for you tree < rpart(Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width, data = iris, method = "class") # A dataframe containing unseen observations unseen < data.frame(Sepal.Length = c(5.3, 7.2), Sepal.Width = c(2.9, 3.9), Petal.Length = c(1.7, 5.4), Petal.Width = c(0.8, 2.3)) # Predict the label of the unseen observations. Print out the result. predict(tree, unseen, type = "class")