naiveWrapper: Naive feature selection method utilising the rFerns shadow imporance

Description

Proof-of-concept ensemble of rFerns models, built to stabilise and improve selection based on shadow importance. It employs a super-ensemble of iterations small rFerns forests, each built on a subspace of size attributes, which is selected randomly, but with a higher selection probability for attributes claimed important by previous sub-models. Final selection is a group of attributes which hold a substantial weight at the end of the procedure.

Usage

naiveWrapper(
  x,
  y,
  iterations = 1000,
  depth = 5,
  ferns = 100,
  size = 30,
  lambda = 5,
  threads = 0,
  saveHistory = FALSE
)

Arguments

Data frame containing attributes; must have unique names and contain only numeric, integer or (ordered) factor columns. Factors must have less than 31 levels. No NA values are permitted.

A decision vector. Must a factor of the same length as nrow(X) for ordinary many-label classification, or a logical matrix with each column corresponding to a class for multi-label classification.

iterations

Number of iterations i.e., the number of sub-models built.

depth

The depth of the ferns; must be in 1--16 range. Note that time and memory requirements scale with 2^depth.

ferns

Number of ferns to be build in each sub-model. This should be a small number, around 3-5 times size.

size

Number of attributes considered by each sub-model.

lambda

Lambda parameter driving the re-weighting step of the method.

threads

Number of parallel threads, copied to the underlying rFerns call.

saveHistory

Should weight history be stored.

Value

An object of class naiveWrapper, which is a list with the following components:

found

Names of all selected attributes.

weights

Vector of weights indicating the confidence that certain feature is relevant.

timeTaken

Time of computation.

weightHistory

History of weights over all iterations, present if saveHistory was TRUE.

params

Copies of algorithm parameters, iterations, depth, ferns and size, as a named vector.

References

Kursa MB (2017). Efficient all relevant feature selection with random ferns. In: Kryszkiewicz M., Appice A., Slezak D., Rybinski H., Skowron A., Ras Z. (eds) Foundations of Intelligent Systems. ISMIS 2017. Lecture Notes in Computer Science, vol 10352. Springer, Cham.

Examples

Run this code

# NOT RUN {
set.seed(77)
#Fetch Iris data
data(iris)
#Extend with random noise
noisyIris<-cbind(iris[,-5],apply(iris[,-5],2,sample))
names(noisyIris)[5:8]<-sprintf("Nonsense%d",1:4)
#Execute selection
naiveWrapper(noisyIris,iris$Species,iterations=50,ferns=20,size=8)
# }

Run the code above in your browser using DataLab