Fit and plot a rpart
model for exploratory purposes using
rpart
and rpart.plot
libraries.
tree_var(
df,
y,
type = 2,
max = 3,
min = 20,
cp = 0,
ohse = TRUE,
plot = TRUE,
explain = TRUE,
title = NA,
subtitle = NULL,
...
)
Data frame
Variable or Character. Name of the independent variable.
Type of plot. Possible values:
0 Draw a split label at each split and a node label at each leaf.
1 Label all nodes, not just leaves.
Similar to text.rpart
's all=TRUE
.
2 Default.
Like 1
but draw the split labels below the node labels.
Similar to the plots in the CART book.
3 Draw separate split labels for the left and right directions.
4 Like 3
but label all nodes, not just leaves.
Similar to text.rpart
's fancy=TRUE
.
See also clip.right.labs
.
5 Show the split variable name in the interior nodes.
Integer. Maximal depth of the tree.
Integer. The minimum number of observations that must exist in a node in order for a split to be attempted.
complexity parameter. Any split that does not decrease the overall
lack of fit by a factor of cp
is not attempted. For instance,
with anova
splitting, this means that the overall R-squared must
increase by cp
at each step. The main role of this parameter
is to save computing time by pruning off splits that are obviously
not worthwhile. Essentially,the user informs the program that any
split which does not improve the fit by cp
will likely be
pruned off by cross-validation, and that hence the program need
not pursue it.
Boolean. Auto generate One Hot Smart Encoding?
Boolean. Return a plot? If not, rpart
object.
Boolean. Include a brief explanation on the bottom part of the plot.
Character. Title and subtitle to include in plot.
Set to NULL
to ignore.
Additional parameters passed to rpart.plot()
.
(Invisible) list type 'tree_var' with plot (function), model, predictions, performance metrics, and interpret auxiliary text.
This differs from the tree
function in S mainly in its handling
of surrogate variables. In most details it follows Breiman
et. al (1984) quite closely. R package tree provides a
re-implementation of tree
.
Breiman L., Friedman J. H., Olshen R. A., and Stone, C. J. (1984) Classification and Regression Trees. Wadsworth.
Other Exploratory:
corr_cross()
,
corr_var()
,
crosstab()
,
df_str()
,
distr()
,
freqs_df()
,
freqs_list()
,
freqs_plot()
,
freqs()
,
lasso_vars()
,
missingness()
,
plot_cats()
,
plot_df()
,
plot_nums()
,
trendsRelated()
Other Visualization:
distr()
,
freqs_df()
,
freqs_list()
,
freqs_plot()
,
freqs()
,
noPlot()
,
plot_chord()
,
plot_survey()
,
plot_timeline()
# NOT RUN {
data(dft)
# Regression Tree
tree <- tree_var(dft, Fare, subtitle = "Titanic dataset")
tree$plot() # tree plot
tree$model # rpart model object
tree$performance # metrics
# Binary Tree
tree_var(dft, Survived_TRUE, explain = FALSE, cex = 0.8)$plot()
# Multiclass tree
tree_var(dft[, c("Pclass", "Fare", "Age")], Pclass, ohse = FALSE)$plot()
# }
Run the code above in your browser using DataLab