NNS.boost: NNS Boost

Description

Ensemble method for classification using the predictions of the NNS multivariate regression NNS.reg collected from uncorrelated feature combinations.

Usage

NNS.boost(
  IVs.train,
  DV.train,
  IVs.test = NULL,
  type = NULL,
  representative.sample = FALSE,
  depth = "max",
  n.best = NULL,
  learner.trials = 100,
  epochs = NULL,
  CV.size = 0.25,
  balance = FALSE,
  ts.test = NULL,
  folds = 5,
  threshold = NULL,
  obj.fn = expression(sum((predicted - actual)^2)),
  objective = "min",
  extreme = FALSE,
  feature.importance = TRUE,
  status = TRUE,
  ncores = NULL
)

Arguments

IVs.train

a matrix or data frame of variables of numeric or factor data types.

DV.train

a numeric or factor vector with compatible dimensions to (IVs.train).

IVs.test

a matrix or data frame of variables of numeric or factor data types with compatible dimensions to (IVs.train). If NULL, will use (IVs.train) as default.

type

NULL (default). To perform a classification of discrete integer classes from factor target variable (DV.train), set to (type = "CLASS"), else for continuous (DV.train) set to (type = NULL).

representative.sample

logical; FALSE (default) Reduces observations of IVs.train to a set of representative observations per regressor.

depth

options: (integer, NULL, "max"); Specifies the order parameter in the NNS.reg routine, assigning a number of splits in the regressors. (depth = "max")(default) will be significantly faster, but increase the variance of results, which is suggested for mixed continuous and discrete (unordered, ordered) data.

n.best

integer; NULL (default) Sets the number of nearest regression points to use in weighting for multivariate regression at sqrt(# of regressors). Analogous to k in a k Nearest Neighbors algorithm. If NULL, determines the optimal clusters via the NNS.stack procedure.

learner.trials

integer; NULL (default) Sets the number of trials to obtain an accuracy threshold level. (learner.trials = 100) is the default setting.

epochs

integer; 2*length(DV.train) (default) Total number of feature combinations to run.

CV.size

numeric [0, 1]; (CV.size = .25) (default) Sets the cross-validation size. Defaults to 0.25 for a 25 percent random sampling of the training set.

balance

logical; FALSE (default) Uses both up and down sampling from caret to balance the classes. type="CLASS" required.

ts.test

integer; NULL (default) Sets the length of the test set for time-series data; typically 2*h parameter value from NNS.ARMA or double known periods to forecast.

folds

integer; 5 (default) Sets the number of folds in the NNS.stack procedure for optimal n.best parameter.

threshold

numeric; NULL (default) Sets the obj.fn threshold to keep feature combinations.

obj.fn

expression; expression( sum((predicted - actual)^2) ) (default) Sum of squared errors is the default objective function. Any expression() using the specific terms predicted and actual can be used. Automatically selects an accuracy measure when (type = "CLASS").

objective

options: ("min", "max") "max" (default) Select whether to minimize or maximize the objective function obj.fn.

extreme

logical; FALSE (default) Uses the maximum (minimum) threshold obtained from the learner.trials, rather than the upper (lower) quintile level for maximization (minimization) objective.

feature.importance

logical; TRUE (default) Plots the frequency of features used in the final estimate.

status

logical; TRUE (default) Prints status update message in console.

ncores

integer; value specifying the number of cores to be used in the parallelized procedure. If NULL (default), the number of cores to be used is equal to the number of cores of the machine - 1.

Value

Returns a vector of fitted values for the dependent variable test set $results, and the final feature loadings $feature.weights.

References

Viole, F. (2016) "Classification Using NNS Clustering Analysis" https://www.ssrn.com/abstract=2864711

Examples

Run this code

# NOT RUN {
 ## Using 'iris' dataset where test set [IVs.test] is 'iris' rows 141:150.
 
# }
# NOT RUN {
 a <- NNS.boost(iris[1:140, 1:4], iris[1:140, 5],
 IVs.test = iris[141:150, 1:4],
 epochs = 100, learner.trials = 100,
 type = "CLASS")

 ## Test accuracy
 mean(a$results == as.numeric(iris[141:150, 5]))
 
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab