prune.oblique.tree: Cost-complexity Pruning of Oblique Tree Object

Description

Determines a nested sequence of subtrees of the supplied tree by recursively “snipping” off the least important splits.

Usage

prune.oblique.tree(
	tree, 
	k = NULL, 
	newdata, 
	prune.impurity = c("deviance", "misclass"), 
	penalty = c("complexity", "size"), 
	eps = 1e-3)

Arguments

tree

Fitted model object of class oblique.tree. This is assumed to be the result of some function that produces an object with the same named components as that returned by oblique.tree.

Cost-complexity parameter defining either a specific subtree of tree (k a scalar) or the (optional) sequence of subtrees minimizing the cost-complexity measure (k a vector). If missing, k is determined algorithmically.

newdata

Data frame upon which the sequence of cost-complexity subtrees is evaluated. If missing, the data used to grow the tree is used.

prune.impurity

Character string denoting the measure of node heterogeneity used to guide cost-complexity pruning. The default is deviance and the alternative is misclass (number of misclassifications or total loss).

penalty

Character string denoting the measure of tree complexity used to guide cost-complexity pruning. The default is complexity (that considers the complexity of splits used throughout the tree) and the alternative is size (that counts the number of leaves of the tree).

eps

A lower bound for the probabilities, used to compute deviances if events of predicted probability zero occur in newdata.

Value

size: The complexity of each tree in the cost-complexity pruning sequence.
deviance: Total deviance of each tree in the cost-complexity pruning sequence.
k: The value of the cost-complexity pruning parameter of each tree in the sequence.

Details

Determines a nested sequence of subtrees of the supplied tree by recursively "snipping" off the least important splits, based upon the cost-complexity measure.

If k is supplied, the optimal subtree for that value is returned.

The response as well as the predictors referred to in the right side of the formula in tree must be present by name in newdata. These data are dropped down each tree in the tree sequence and deviances or losses calculated by comparing the supplied response to the prediction. A plot method exists for objects of this class. It displays the value of the deviance, the number of misclassifications or the total loss for each subtree in the tree sequence. An additional axis displays the values of the cost-complexity parameter at each subtree.

References

Truong. A (2009) Fast Growing and Interpretable Oblique Trees via Probabilistic Models

Ripley, B. D. (1996). Pattern Recognition and Neural Networks. Cambridge University Press, Cambridge. Chapter 7.

Examples

Run this code

#grow an mixture tree on the Pima Indian dataset
data(Pima.tr, package = "MASS")
ob.tree <- oblique.tree(formula		= type~.,
			data		= Pima.tr,
			oblique.splits	= "on")
plot(ob.tree);text(ob.tree);title(main="Mixture Tree")

#examine the tree sequence
tree.seq <- prune.oblique.tree(	tree	= ob.tree)	
print(tree.seq);plot(tree.seq)

#examine test error over the tree sequence
data(Pima.te, package = "MASS")
tree.seq <- prune.oblique.tree(	tree	= ob.tree,
				newdata	= Pima.te)
print(tree.seq);plot(tree.seq)

#deviance is least when k = 8.148267
pruned <- prune.oblique.tree(	tree	= ob.tree,
				k	= 9)
plot(pruned);text(pruned);title(main="Pruned Tree")

Run the code above in your browser using DataLab