FS.greedy.heuristic.reduct.RST: The greedy heuristic algorithm for computing decision reducts and approximate decision reducts

Description

This function implements a greedy heuristic algorithm for computing decision reducts (or approximate decision reducts) based on RST.

Usage

FS.greedy.heuristic.reduct.RST(decision.table,
  attrDescriptions = attr(decision.table, "desc.attrs"),
  decisionIdx = attr(decision.table, "decision.attr"),
  qualityF = X.gini, nAttrs = NULL, epsilon = 0,
  inconsistentDecisionTable = NULL)

Value

A class "FeatureSubset" that contains the following components:

reduct: a list representing a single reduct. In this case, it could be a superreduct or just a subset of features.
type.method: a string representing the type of method which is "greedy.heuristic".
type.task: a string showing the type of task which is "feature selection".
model: a string representing the type of model. In this case, it is "RST" which means rough set theory.
epsilon: the approximation threshold.

Arguments

decision.table

an object of a "DecisionTable" class representing a decision table. See SF.asDecisionTable.

attrDescriptions

a list containing possible values of attributes (columns) in decision.table. It usually corresponds to attr(decision.table, "desc.attrs").

decisionIdx

an integer value representing an index of the decision attribute.

qualityF

a function used for computation of the quality of attribute subsets. Currently, the following functions are included:

X.entropy: See X.entropy.
X.gini: See X.gini.
X.nOfConflicts: See X.nOfConflicts.

nAttrs

an integer between 1 and the number of conditional attributes. It indicates the attribute sample size for the Monte Carlo selection of candidating attributes. If set to NULL (default) all attributes are used and the algorithm changes to a standard greedy method for computation of decision reducts.

epsilon

a numeric value between [0, 1) representing an approximate threshold. It indicates whether to compute approximate reducts or not. If it equals 0 (the default) a standard decision reduct is computed.

inconsistentDecisionTable

logical indicating whether the decision table is suspected to be inconsistent or NULL (the default) which indicated that a test should be made to determine the data consistency.

Author

Andrzej Janusz

Details

In this implementation, we provided some attribute subset quality measures which can be passed to the algorithm by the parameter qualityF. Those measures guide the computations in the search for a decision/approximated reduct. They are used to assess amount of information gained after addition of an attribute. For example, X.entropy corresponds to the information gain measure.

Additionally, this function can use the value of epsilon parameter in order to compute \(\epsilon\)-approximate reducts. The \(\epsilon\)-approximate can be defined as an irreducable subset of attributes B, such that:

\(Quality_{\mathcal{A}}(B) \ge (1 - \epsilon)Quality_{\mathcal{A}}(A)\),

where \(Quality_{\mathcal{A}}(B)\) is the value of a quality measure (see possible values of the parameter qualityF) for an attribute subset \(B\) in decision table \(\mathcal{A}\) and \(\epsilon\) is a numeric value between 0 and 1 expressing the approximation threshold. A lot of monographs provide comprehensive explanations about this topics, for example (Janusz and Stawicki, 2011; Slezak, 2002; Wroblewski, 2001) which are used as the references of this function.

Finally, this implementation allows to restrain the computational complexity of greedy searching for decision reducts by setting the value of the parameter nAttrs. If this parameter is set to a positive integer, the Monte Carlo method of selecting candidating attributes will be used in each iteration of the algorithm.

References

Andrzej Janusz and Dominik Slezak. "Rough Set Methods for Attribute Clustering and Selection". Applied Artificial Intelligence, 28(3):220–242, 2014.

A. Janusz and S. Stawicki, "Applications of Approximate Reducts to the Feature Selection Problem", Proceedings of International Conference on Rough Sets and Knowledge Technology (RSKT), vol. 6954, p. 45 - 50 (2011).

D. Ślęzak, "Approximate Entropy Reducts", Fundamenta Informaticae, vol. 53, no. 3 - 4, p. 365 - 390 (2002).

J. Wróblewski, "Ensembles of Classifiers Based on Approximate Reducts", Fundamenta Informaticae, vol. 47, no. 3 - 4, p. 351 - 360 (2001).

Examples

Run this code

###################################################
## Example 1: Evaluate reduct and generate
##            new decision table
###################################################
data(RoughSetData)
decision.table <- RoughSetData$hiring.dt

## evaluate a single reduct
res.1 <- FS.greedy.heuristic.reduct.RST(decision.table, qualityF = X.entropy,
                                        epsilon = 0.0)

## generate a new decision table corresponding to the reduct
new.decTable <- SF.applyDecTable(decision.table, res.1)

Run the code above in your browser using DataLab