This function instantiates a HHDecisionTree model, it is used to induce classification or regression trees depending upon the value of the response parameter. It supports the parameters listed below.
HHDecisionTree(
response = "classify",
n_min = 2,
min_node_impurity = 0.2,
n_trees = 1,
n_folds = 5,
testSize = 0.2,
useIdentity = FALSE,
pruning = FALSE,
dataDescription = "Unknown",
control = mni.control(n_folds = 5),
prune_control = prune.control(prune_type = "all", prune_stochastic_max_nodes = 14,
prune_stochastic_max_depth = 20, prune_stochastic_samples = 3000),
show_progress = FALSE,
seed = NA
)
The response parameter is used to specify what type of model to build, either 'classify' for a classification tree model or 'regressor' for a regression tree model. The default is 'classify'.
The n_min parameter is used to stop splitting a node when a minimum number of samples at that node has been reached. The default value is 2.
The min node impurity parameter is used to stop splitting a node if the node impurity at that node is less than this value. The node impurity is calculated using the hyperplane Gini index. The default value is 0.2.
The n_trees parameter is used to specify the number of trees to use(grow) per fold or trial. The default value is 1.
The n_folds parameter is used to specify the number of folds to use i.e. split the input data into n_folds equal amounts, for n_folds times, use one portion of the input data as a test dataset, and the remaining n_folds-1 portions as the training dataset. The model is then trained using these training and test datasets, once training is complete the next fold or portion of the input dataset is treated as the test dataset and the remainder the training dataset, the model is then trained again. This process is repeated until all portions or folds of the input dataset have been used as a test dataset. When n_folds=1 the testSize parameter determines the size of the test dataset. The default value is 5.
The testSize parameter determines how much of the input dataset is to be used as the test dataset. The remainder is used as the training dataset. This parameter is only used when the parameter n_folds=1. For values of n_folds greater than one, the computed fold size will govern the test dataset size used (see the n_folds parameter for more details). The default value is 0.2.
The useIdentity parameter when set TRUE will result in hhcartr using the original training data to find the optimal splits rather than using the reflected data. The default value is FALSE.
The pruning parameter when set TRUE will result in tree pruning after all trees are induced. The default value is FALSE.
The dataDescription parameter is a short description used to describe the dataset being modelled. It is used is output displays and plots as documentation. The default value is <U+201C>Unknown<U+201D>.
The control parameter is used to specify parameters for the mni.control function. See documentation for mni.control for supported parameters.
The prune_control parameter is used to specify parameters for the prune.control function. This parameter is only used when 'pruning = TRUE'. See documentation for prune.control for supported parameters.
The show_progress parameter when set TRUE will cause progress messages to be displayed as trees are induced. A value of FALSE will result in no progress messages being displayed. The default value is TRUE.
Specify a seed to seed the RNG. Acceptable values are 1-9999. If no value is specified a random integer in the range 1-9999 is used.
Returns pkg.env$folds_trees, a list of all trees induced during training.