Usage
kernelFactory(x = NULL, y = NULL, cp = 1, rp = round(log(nrow(x), 10)), method = "burn", ntree = 500, filter = 0.01, popSize = rp * cp * 7, iters = 80, mutationChance = 1/(rp * cp), elitism = max(1, round((rp * cp) * 0.05)), oversample = TRUE)
Arguments
x
A data frame of predictors (numeric, integer or factor). Categorical variables need to be factors. Indicator values should not be too imbalanced because this might produce constants in the subsetting process.
y
A factor containing the response vector. Only {0,1} is allowed.
cp
The number of column partitions.
rp
The number of row partitions.
method
Can be one of the following: POLynomial kernel function (pol), LINear kernel function (lin), Radial Basis kernel Function rbf), random choice (random=pol, lin, rbf) (random), burn- in choice of best function (burn=pol, lin, rbf ) (burn). Use random or burn if you don't know in advance which kernel function is best.
ntree
Number of trees in the Random Forest base classifiers.
filter
either NULL (deactivate) or a percentage denoting the minimum class size of dummy predictors. This parameter is used to remove near constants. For example if nrow(xTRAIN)=100, and filter=0.01 then all dummy predictors with any class size equal to 1 will be removed. Set this higher (e.g., 0.05 or 0.10) in case of errors.
popSize
Population size of the genetic algorithm.
iters
Number of generations of the genetic algorithm.
mutationChance
Mutationchance of the genetic algorithm.
elitism
Elitism parameter of the genetic algorithm.
oversample
Oversample the smallest class. This helps avoid problems related to the subsetting procedure (e.g., if rp is too high).