BestForestSplit: Choosing the best variable for splitting.

Description

BestForestSplit searches through possible variables in order to find the most accurate split. It returns the variable chosen, the model, and the two sets of fitted values where both 0 or 1 are considered a "success."

Usage

BestForestSplit(response, data, num.features, ntry, weights = rep(1,
  nrow(data)))

Arguments

response

Logical vector of 0 and 1 denoting the binomial response.

data

A data frame or matrix consisting of all possible variables to attempt.

num.features

A numeric of the number of variables in the dataset to possibly try. The leftmost number of variables in the dataset are the variables chosen.

ntry

A numeric of the number of variables from the num.features to attempt to split. This is useful for building random forests. For a standard tree, choose ntry = num.features.

weights

A vector of weights for use in Weighted Least Squares. Defaults to a vector of 1.

Value

List of elements

Feature

Returns the variable chosen for best split.

fit

A glm object of the fit with the chosen variable.

weights0

A vector of the weights if response 0 was considered a success. Calculated as \(1 - weights1\).

weights1

A vector of the weights if response 1 was considered a success.

Details

BestForestSplit searches through possible variables to split using single variable logistic regression with prior weights in the iteratively reweighted least squares procedure. The variable minimizing residual deviance is chosen. Note, this is a valid choice since all models being compared are using the same Null Model containing only the intercept with equal weights.