spFSR.default: Default Function of SP-FSR for Feature Selection and Ranking

Description

This is the default function of spFeatureSelection. See spFeatureSelection for example.

Usage

spFSR.default(
  task,
  wrapper = NULL,
  scoring = NULL,
  perturb.amount = 0.05,
  gain.min = 0.01,
  gain.max = 2,
  change.min = 0,
  change.max = 0.2,
  bb.bottom.threshold = 10^(-8),
  mon.gain.A = 100,
  mon.gain.a = 0.75,
  mon.gain.alpha = 0.6,
  hot.start.num.ft.factor = 15,
  hot.start.max.auto.num.ft = 150,
  use.hot.start = TRUE,
  hot.start.range = 0.2,
  rf.n.estimators = 50,
  gain.type = "bb",
  num.features.selected = 0L,
  iters.max = 100L,
  stall.limit = 35L,
  n.samples.max = 5000,
  ft.weighting = FALSE,
  encoding.type = "encode",
  is.debug = FALSE,
  stall.tolerance = 10^(-8),
  random.state = 1,
  rounding = 3,
  run.parallel = TRUE,
  n.jobs = NULL,
  show.info = TRUE,
  print.freq = 10L,
  num.cv.folds = 5L,
  num.cv.reps.eval = 3L,
  num.cv.reps.grad = 1L,
  num.grad.avg = 4L,
  perf.eval.method = "cv"
)

Value

spFSR returns an object of class "spFSR". An object of class "spFSR" consists of the following:

task.spfs: An mlr3 package tsk object defined on the best performing features.
wrapper: An mlr3 package lrn object or a mlr3pipelines package GraphLearner object as specified by the user.
scoring: An mlr3 package msr as specified by the user.
param best.model: An mlr3 package model object trained by the wrapper using task.spfs.
iter.results: A data.frame object containing detailed information on each iteration.
features: Names of the best performing features.
num.features: The number of best performing features.
importance: A vector of importance ranks of the best performing features.
total.iters: The total number of iterations executed.
best.iter: The iteration where the best performing feature subset was encountered.
best.value: The best measure value encountered during execution.
best.std: The standard deviation corresponding to the best measure value encountered.
run.time: Total run time in minutes.
results: Dataframe with boolean of selected features, names and measure
call: Call.

Arguments

task: A task tsk object created using mlr3 package. It must be either a ClassifTask or RegrTask object.
wrapper: A Learner lrn object created using mlr3 package or a GraphLearner object created using mlr3pipelines package. Multiple learners object is not supported. If left empty will select random forest by default.
scoring: A performance measure msr within the mlr3 package supported by the task. If left blank will select accuracy for classification and r-squared for regression.
perturb.amount: Perturbation amount for feature importances during gradient approximation. It must be a value between 0.01 and 0.1. Default value is 0.05.
gain.min: The minimum gain value. It must be greater than or equal to 0.001. Default value is 0.01.
gain.max: The maximum gain value. It must be greater than or equal to gain.min. Default value is 1.0.
change.min: The minimum change value. It must be non-negative. Default value is 0.0.
change.max: The maximum change value. It must be greater than change.min. Default is 0.2.
bb.bottom.threshold: The threshold value of denominator for the Barzilai-Borwein gain sequence. It must be positive. Default is 1/10^8.
mon.gain.A: Parameter for the monetone gain sequence. It must be a positive integer. Default is 100.
mon.gain.a: Parameter for the monetone gain sequence. It must be positive. Default is 0.75.
mon.gain.alpha: Parameter for the monetone gain sequence. It must be between (0, 1). Default is 0.6.
hot.start.num.ft.factor: The factor of features to select for hot start. Must be an integer greater than 1. Default is 15.
hot.start.max.auto.num.ft: The maximum initial number of features for automatic hot start. Must be an integer greater than 1. Default is 75.
use.hot.start: Logical argument. Whether hot start should be used. Default is True.
hot.start.range: Float, the initial range of imputations carried over from hot start. It must be between (0,1). Default is 0.2.
rf.n.estimators: integer, The number of trees to use in the random forest hot start. The default is 50.
gain.type: The gain sequence to use. Accepted methods are 'bb' for Barzilai-Borwein or 'mon' for a monetonic gain sequence. Default is 'bb'.
num.features.selected: Number of features selected. It must be a nonnegative integer and must not exceed the total number of features in the task. A value of 0 results in automatic feature selection. Default value is 0L.
iters.max: Maximum number of iterations to execute. The minimum value is 2L. Default value is 300L.
stall.limit: Number of iterations to stall, that is, to continue without at least stall.tolerance improvement to the measure value. The mininum value is 2L. Default value is 100L.
n.samples.max: The maximum number of samples to select from sampling. It must be a non-negative integer. Default is 2500.
ft.weighting: Logical argument. Include simultaneous feature weighting and selection?. Default is FALSE.
encoding.type: Encoding method for factor features for feature weighting, default is 'encoded'.
is.debug: Logical argument. Print additional debug messages? Default value is FALSE.
stall.tolerance: Value of stall tolerance. It must be strictly positive. Default value is 1/10^8.
random.state: random state used. Default is 1.
rounding: The number of digits to round results. It must be a positive integer. Default value is 3.
run.parallel: Logical argument. Perform cross-validations in parallel? Default value is TRUE.
n.jobs: Number of cores to use in case of a parallel run. It must be less than or equal to the total number of cores on the host machine. If set to NULL when run.parallel is TRUE, it is taken as one less of the total number of cores.
show.info: If set to TRUE, iteration information is displayed at print frequency.
print.freq: Iteration information printing frequency. It must be a positive integer. Default value is 10L.
num.cv.folds: The number of cross-validation folds when 'cv' is selected as perf.eval.method. The minimum value is 3L. Default value is 5L.
num.cv.reps.eval: The number of cross-validation repetitions for feature subset evaluation. It must be a positive integer. Default value is 3L.
num.cv.reps.grad: The number of cross-validation repetitions for gradient averaging. It must be a positive integer. Default value is 1L.
num.grad.avg: Number of gradients to average for gradient approximation. It must be a positive integer. Default value is 4L.
perf.eval.method: Performance evaluation method. It must be either 'cv' for cross-validation or 'resub' for resubstitution. Default is 'cv'.

References

David V. Akman et al. (2022) k-best feature selection and ranking via stochastic approximation, Expert Systems with Applications, Vol. 213. See tools:::Rd_expr_doi("10.1016/j.eswa.2022.118864")

G.F.A Yeo and V. Aksakalli (2021) A stochastic approximation approach to simultaneous feature weighting and selection for nearest neighbour learners, Expert Systems with Applications, Vol. 185. See tools:::Rd_expr_doi("10.1016/j.eswa.2021.115671")

Description

Usage

Value

Arguments

References

See Also