Learn R Programming

C50 (version 0.1.0-21)

C5.0.default: C5.0 Decision Trees and Rule-Based Models

Description

Fit classification tree models or rule-based models using Quinlan's C5.0 algorithm

Usage

C5.0(x, ...)

## S3 method for class 'default': C5.0(x, y, trials = 1, rules= FALSE, weights = NULL, control = C5.0Control(), costs = NULL, ...)

## S3 method for class 'formula': C5.0(formula, data, weights, subset, na.action = na.pass, ...)

Arguments

x
a data frame or matrix of predictors.
y
a factor vector with 2 or more levels
trials
an integer specifying the number of boosting iterations. A value of one indicates that a single model is used.
rules
A logical: should the tree be decomposed into a rule-based model?
weights
an optional numeric vector of case weights. Note that the data used for the case weights will not be used as a splitting variable in the model (see http://www.rulequest.com/see5-win.html#CASEWEIGHT for Quinlan's notes on case weights).
control
a list of control parameters; see C5.0Control
costs
a matrix of costs associated with the possible errors. The matrix should have C columns and rows where C is the number of class levels.
formula
a formula, with a response and at least one predictor.
data
an optional data frame in which to interpret the variables named in the formula.
subset
optional expression saying that only a subset of the rows of the data should be used in the fit.
na.action
a function which indicates what should happen when the data contain NAs. The default is to include missing values since the model can accommodate them.
...
other options to pass into the function (not currently used with default method)

Value

  • An object of class C5.0 with elements:
  • boostResultsa parsed version of the boosting table(s) shown in the output
  • callthe function call
  • caseWeightsnot currently supported.
  • controlan echo of the specifications from C5.0Control
  • costthe text version of the cost matrix (or "")
  • costMatrixan echo of the model argument
  • dimsoriginal dimensions of the predictor matrix or data frame
  • levelsa character vector of factor levels for the outcome
  • namesa string version of the names file
  • outputa string version of the command line output
  • predictorsa character vector of predictor names
  • rbma logical for rules
  • rulesa character version of the rules file
  • sizen integer vector of the tree/rule size (or sizes in the case of boosting)
  • treea string version of the tree file
  • trialsa named vector with elements Requested (an echo of the function call) and Actual (how many the model used)

Details

This model extends the C4.5 classification algorithms described in Quinlan (1992). The details of the extensions are largely undocumented. The model can take the form of a full decision tree or a collection of rules (or boosted versions of either).

When using the formula method, factors and other classes are preserved (i.e. dummy variables are not automatically created). This particular model handles non-numeric data of some types (such as character, factor and ordered data).

The cost matrix should by CxC, where C is the number of classes. Diagonal elements are ignored. Columns should correspond to the true classes and rows are the predicted classes. For example, if C = 3 with classes Red, Blue and Green (in that order), a value of 5 in the (2,3) element of the matrix would indicate that the cost of predicting a Green sample as Blue is five times the usual value (of one). Note that when costs are used, class probabilities cannot be generated using predict.C5.0.

Internally, the code will attempt to halt boosting if it appears to be ineffective. For this reason, the value of trials may be different from what the model actually produced. There is an option to turn this off in C5.0Control.

References

Quinlan R (1993). C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, http://www.rulequest.com/see5-unix.html

See Also

C5.0Control, summary.C5.0, predict.C5.0, C5imp

Examples

Run this code
data(churn)

treeModel <- C5.0(x = churnTrain[, -20], y = churnTrain$churn)
treeModel
summary(treeModel)

ruleModel <- C5.0(churn ~ ., data = churnTrain, rules = TRUE)
ruleModel
summary(ruleModel)

Run the code above in your browser using DataLab