Learn R Programming

CORElearn (version 1.48.0)

attrEval: Attribute evaluation

Description

The method evaluates the quality of the features/attributes/dependent variables specified by the formula with the selected heuristic method. Feature evaluation algorithms available for classification problems are various variants of Relief and ReliefF algorithms (ReliefF, cost-sensitive ReliefF, ...), gain ratio, gini-index, MDL, DKM, information gain, ... For regression problems there are RREliefF, MSEofMean, MSEofModel, MAEofModel, ... Parallel execution on several cores is supported for speedup.

Usage

attrEval(formula, data, estimator, costMatrix = NULL, outputNumericSplits=FALSE, ...)

Arguments

formula
Either a formula specifying the attributes to be evaluated and the target variable, or a name of target variable, or an index of target variable.
data
Data frame with evaluation data.
estimator
The name of the evaluation method.
costMatrix
Optional cost matrix used with certain estimators.
outputNumericSplits
Controls of the output contain also the best split point for numeric attributes. This is only sensible for impurity based estimators (like gini, MDL, gain ratio in classification and MSEofMean in regression). Additionally, the default value of parameter binaryEvaluateNumericAttributes=TRUE shall not be modified. If the value of outputNumericSplits the output is a list instead of vector, see the returned value description.
...
Additional options used by specific evaluation methods as described in helpCore.

Value

The method returns a vector of evaluations for the features in the order specified by the formula. In case of parameter binaryEvaluateNumericAttributes=TRUE the method returns a list with two components: attrEval and splitPointNum. The attrEval contains a vector of evaluations for the features in the order specified by the formula. The splitPointNum contains the split points of numeric attributes which produced the given attribute evaluation scores.

Details

The parameter formula can be interpreted in three ways, where the formula interface is the most elegant one, but inefficient and inappropriate for large data sets. See also examples below. As formula one can specify:
The optional parameter costMatrix can provide nonuniform cost matrix to certain cost-sensitive measures (ReliefFexpC, ReliefFavgC, ReliefFpe, ReliefFpa, ReliefFsmp,GainRatioCost, DKMcost, ReliefKukar, and MDLsmp). For other measures this parameter is ignored. The format of the matrix is costMatrix(true class, predicted class). By default a uniform costs are assumed, i.e., costMatrix(i, i) = 0, and costMatrix(i, j) = 1, for i not equal to j. The estimator parameter selects the evaluation heuristics. For classification problem it must be one of the names returned by infoCore(what="attrEval") and for regression problem it must be one of the names returned by infoCore(what="attrEvalReg") Majority of these feature evaluation measures are described in the references given below, here only a short description is given. For classification problem they are
For regression problem the implemented measures are:
There are some additional parameters ... available which are used by specific evaluation heuristics. Their list and short description is available by calling helpCore. See Section on attribute evaluation. The attributes can also be evaluated via random forest out-of-bag set with function rfAttrEval. Evaluation and visualization of ordered attributes is covered in function ordEval.

References

Marko Robnik-Sikonja, Igor Kononenko: Theoretical and Empirical Analysis of ReliefF and RReliefF. Machine Learning Journal, 53:23-69, 2003 Marko Robnik-Sikonja: Experiments with Cost-sensitive Feature Evaluation. In Lavrac et al.(eds): Machine Learning, Proceedings of ECML 2003, Springer, Berlin, 2003, pp. 325-336 Igor Kononenko: On Biases in Estimating Multi-Valued Attributes. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI'95), pp. 1034-1040, 1995

Some of these references are available also from http://lkm.fri.uni-lj.si/rmarko/papers/

See Also

CORElearn, CoreModel, rfAttrEval, ordEval, helpCore, infoCore.

Examples

Run this code
# use iris data

# run method ReliefF with exponential rank distance  
estReliefF <- attrEval(Species ~ ., iris, 
                            estimator="ReliefFexpRank", ReliefIterations=30)
print(estReliefF)

# alternatively and more appropriate for large data sets 
# one can specify just the target variable
# estReliefF <- attrEval("Species", iris, estimator="ReliefFexpRank", ReliefIterations=30)

# print all available estimators
infoCore(what="attrEval")

Run the code above in your browser using DataLab