
itree
model.
itree(formula, data, weights, subset, na.action = na.itree,
method, penalty= "none", model = FALSE,
x = FALSE, y = TRUE, parms, control, cost, ...)
y
is missing, but keeps those in which one or more predictors
are missing."anova"
, "class"
,
"extremes"
, "purity"
, "class_extremes"
,
"class_purity"
, "regression_extremes"
, or
"regression_purity"
. The purity and extremes methods
are new to itree. Unlike rpart, itree does not currently support
method="poisson"
or method="exp"
. If method
is missing
then the routine tries to make an intelligent guess -- the default is the
CART methodology, as in rpart. If y
is a factor then
method="class"
is assumed, otherwise method="anova"
is assumed.
Passing a factor with method="purity"
is equivalent to passing
method="class_purity"
, and similarly for extremes/regression.
It is wisest to specify the method directly, especially as more
criteria may added to the function in future.
As in rpart, method
can be a list of functions named init
, split
and eval
.
See the rpart documentation for how this works.
"none"
, "newvar"
or "ema"
. The penalty
for splitting a particular node on a specified predictor given the predictors already used in the
branch leading to this node. Default is "none" which corresponds to CART. "newvar"
penalizes predictors not used in the branch leading to the current node. "ema"
implements an
exponential moving average style penalty whereby recently used variables are favored.
model
is a model frame (likely from an
earlier call to the itree
function), then this frame is used
rather than constructing new data.x
matrix in the result.model
is supplied this defaults to FALSE
.prior
), the loss matrix
(component loss
) or the splitting index (component
split
). The priors must be positive and sum to 1. The loss
matrix must have zeros on the diagonal and positive off-diagonal
elements. The splitting index can be gini
or
information
. The default priors are proportional to the data
counts, the losses default to 1, and the split defaults to
gini
.
For the regression extremes method, parms=1
or
parms=-1
specifies whether
we are looking for high or low means respectively (see Buja & Lee). Default is 1
for high means.
For classification extremes, parms is a list specificying the class of interest -- see the
examples for syntax.
itree
algorithm, similar to rpart.control
.
See itree.control
.itree.control
may also be
specified in the call to itree
. They are checked against the
list of valid arguments.itree
. See itree.object
.
Wadsworth.
itree.control
, itree.object
,
summary.itree
, print.itree
#CART (same as rpart):
fit <- itree(Kyphosis ~ Age + Number + Start, data=kyphosis)
fit2 <- itree(Kyphosis ~ Age + Number + Start, data=kyphosis,
parms=list(prior=c(.65,.35), split='information'))
fit3 <- itree(Kyphosis ~ Age + Number + Start, data=kyphosis,
control=itree.control(cp=.05))
par(mfrow=c(1,2), xpd=NA) # otherwise on some devices the text is clipped
plot(fit)
text(fit, use.n=TRUE)
plot(fit2)
text(fit2, use.n=TRUE)
#### new to itree:
#same example, but using one-sided extremes:
fit.ext <- itree(Kyphosis ~ Age + Number + Start, data=kyphosis,method="extremes",
parms=list(classOfInterest="absent"))
#we see buckets with every y="absent":
plot(fit.ext); text(fit.ext,use.n=TRUE)
library(mlbench); data(BostonHousing)
#one sided purity:
fit4 <- itree(medv~.,BostonHousing,method="purity",minbucket=25)
#low means tree:
fit5 <- itree(medv~.,BostonHousing,method="extremes",parms=-1,minbucket=25)
#new variable penalty:
fit6 <- itree(medv~.,BostonHousing,penalty="newvar",interp_param1=.2)
#ema penalty
fit7 <- itree(medv~.,BostonHousing,penalty="ema",interp_param1=.1)
#one-sided-purity + new variable penalty:
fit8 <- itree(medv~.,BostonHousing,method="purity",penalty="newvar",interp_param1=.2)
#one-sided extremes for classification must specify a "class of interest"
data(PimaIndiansDiabetes)
levels(PimaIndiansDiabetes$diabetes)
fit9.a <- itree(diabetes~.,PimaIndiansDiabetes,minbucket=50,
method="extremes",parms=list(classOfInterest="neg"))
plot(fit9.a); text(fit9.a)
#can also pass the index of the class of interest in levels().
fit9.b <- itree(diabetes~.,PimaIndiansDiabetes,minbucket=50,
method="extremes",parms=list(classOfInterest=1))
# so fit9.a = fit9.b
Run the code above in your browser using DataLab