evaluate_Weka_classifier: Model Statistics for R/Weka Classifiers

Description

Compute model performance statistics for a fitted Weka classifier.

Usage

evaluate_Weka_classifier(object, newdata = NULL, cost = NULL, 
                         numFolds = 0, complexity = FALSE,
                         class = FALSE, seed = NULL, ...)

Arguments

object

a Weka_classifier object.

newdata

an optional data frame in which to look for variables with which to evaluate. If omitted or NULL, the training instances are used.

cost

a square matrix of (mis)classification costs.

numFolds

the number of folds to use in cross-validation.

complexity

option to include entropy-based statistics.

class

option to include class statistics.

seed

optional seed for cross-validation.

…

further arguments passed to other methods (see details).

Value

An object of class Weka_classifier_evaluation, a list of the following components:

string

character, concatenation of the string representations of the performance statistics.

details

vector, base statistics, e.g., the percentage of instances correctly classified, etc.

detailsComplexity

vector, entropy-based statistics (if selected).

detailsClass

matrix, class statistics, e.g., the true positive rate, etc., for each level of the response variable (if selected).

confusionMatrix

table, cross-classification of true and predicted classes.

Details

The function computes and extracts a non-redundant set of performance statistics that is suitable for model interpretation. By default the statistics are computed on the training data.

Currently argument … only supports the logical variable normalize which tells Weka to normalize the cost matrix so that the cost of a correct classification is zero.

Note that if the class variable is numeric only a subset of the statistics are available. Arguments complexity and class are then not applicable and therefore ignored.

References

I. H. Witten and E. Frank (2005). Data Mining: Practical Machine Learning Tools and Techniques. 2nd Edition, Morgan Kaufmann, San Francisco.

Examples

Run this code

# NOT RUN {
## Use some example data.
w <- read.arff(system.file("arff","weather.nominal.arff", 
	       package = "RWeka"))

## Identify a decision tree.
m <- J48(play~., data = w)
m

## Use 10 fold cross-validation.
e <- evaluate_Weka_classifier(m,
                              cost = matrix(c(0,2,1,0), ncol = 2),
                              numFolds = 10, complexity = TRUE,
                              seed = 123, class = TRUE)
e
summary(e)
e$details
# }

Run the code above in your browser using DataLab