RCAR: Regularized Class Association Rules for Multi-class Problems (RCAR+)

Description

Build a classifier based on association rules mined for an input dataset and weighted with LASSO regularized logistic regression following RCAR (Azmi, et al., 2019). RCAR+ extends RCAR from a binary classifier to a multi-class classifier and can use support-balanced CARs.

Usage

RCAR(formula, data, lambda = NULL, alpha = 1, glmnet.args = NULL, cv.glmnet.args = NULL,
    parameter = NULL, control = NULL, balanceSupport = FALSE,
    disc.method = "mdlp", verbose = FALSE, ...)

Arguments

formula

A symbolic description of the model to be fitted. Has to be of form class ~ . or class ~ predictor1 + predictor2.

data

A data.frame containing the training data.

lambda

The amount of weight given to regularization during the logistic regression learning process. If not specified (NULL) then cross-validation is used to determine the best value (see Details section).

alpha

The elastic net mixing parameter. alpha = 1 is the lasso penalty (default RCAR), and alpha = 0 the ridge penalty.

cv.glmnet.args, glmnet.args

A list of arguments passed on to cv.glmnet and glmnet, respectively. See Example section.

parameter, control

Optional parameter and control lists for apriori.

balanceSupport

balanceSupport parameter passed to mineCARs function.

disc.method

Discretization method for factorizing numeric input (default: "mdlp"). See discretizeDF.supervised for more supervised discretization methods.

verbose

Report progress?

...

For convenience, additional parameters are used to create the parameter control list for apriori (e.g., to specify the support and confidence thresholds).

Value

Returns an object of class CBA representing the trained classifier with the additional field model containing a list with the following elements:

all_rules

all rules used to build the classifier, including the rules with a weight of zero.

reg_model

them multinomial logistic regression model as an object of class glmnet.

contains the results for the cross-validation used determine lambda.

Details

RCAR+ extends RCAR from a binary classifier to a multi-class classifier using regularized multinomial logistic regression via glmnet.

If lambda is not specified (NULL) then cross-validation with the largest value of lambda such that error is within 1 standard error of the minimum is used to determine the best value (see cv.glmnet).

See cv.glmnet for performing cross-validation in parallel.

References

M. Azmi, G.C. Runger, and A. Berrado (2019). Interpretable regularized class association rules algorithm for classification in a categorical data space. Information Sciences, Volume 483, May 2019. Pages 313-331.

Azmi's implementation on GitHub: https://github.com/azemi/RCAR.

Examples

Run this code

# NOT RUN {
data("iris")

classifier <- RCAR(Species~., iris)
classifier

# inspect the rule base sorted by the larges class weight
inspect(sort(rules(classifier), by = "weight"))

# make predictions for the first few instances of iris
predict(classifier, head(iris))

# inspecting the regression model and the cross-validation results to determine lambda
str(classifier$model$reg_model)
plot(classifier$model$cv)

# show progress report and use 5 instead of the default 10 cross-validation folds.
classifier <- RCAR(Species~., iris, cv.glmnet.args = list(nfolds = 5), verbose = TRUE)
# }

Run the code above in your browser using DataLab

Last chance! 50% off unlimited learning