Learn R Programming

nuggets (version 2.1.0)

add_interest.associations: Add additional interest measures for association rules

Description

[Experimental]

This function calculates various additional interest measures for association rules based on their contingency table counts.

Usage

# S3 method for associations
add_interest(x, measures = NULL, smooth_counts = 0, p = 0.5, ...)

add_interest(x, ...)

Value

An S3 object which is an instance of associations and nugget

classes and which is a tibble containing all the columns of the input nugget x, plus additional columns for each of the requested interest measures.

Arguments

x

A nugget of flavour associations, typically created with dig_associations() with argument contingency_table = TRUE.

measures

A character vector specifying which interest measures to calculate. If NULL (the default), all supported measures are calculated. See the Details section for the list of supported measures.

smooth_counts

A non-negative numeric value specifying the amount of Laplace smoothing to apply to the contingency table counts before calculating the interest measures. Default is 0 (no smoothing). Positive values add the specified amount to each of the counts (pp, pn, np, nn), which can help avoid issues with undefined measures due to zero counts. Use smooth_counts = 1 for standard Laplace smoothing. Use smooth_counts = 0.5 for Haldane-Anscombe smoothing, which is often used for odds ratio estimation and in chi-squared tests.

p

A numeric value in the range [0, 1] representing the conditional probability of the consequent being true given that the antecedent is true. This parameter is used in the calculation of GUHA quantifiers "lci", "uci", "dlci", "duci", "lce", and "uce". The default value is 0.5.

...

Currently unused.

Author

Michal Burda

Details

The input nugget object must contain the columns pp (positive antecedent & positive consequent), pn (positive antecedent & negative consequent), np (negative antecedent & positive consequent), and nn (negative antecedent & negative consequent), representing the counts from the contingency table. These columns are typically produced by dig_associations() when the contingency_table argument is set to TRUE.

The supported interest measures that can be calculated include:

All the above measures are primarily intended for use with binary (logical) data. While they can be computed for numerical data as well, their interpretations may not be meaningful in that context - users should exercise caution when applying these measures to non-binary data.

Many measures are based on the contingency table counts, and some may be undefined for certain combinations of counts (e.g., division by zero). This issue can be mitigated by applying smoothing using the smooth_counts argument.

See Also

dig_associations()

Examples

Run this code
d <- partition(mtcars, .breaks = 2)
rules <- dig_associations(d,
                          antecedent = !starts_with("mpg"),
                          consequent = starts_with("mpg"),
                          min_support = 0.3,
                          min_confidence = 0.8,
                          contingency_table = TRUE)
rules <- add_interest(rules,
                   measures = c("conviction", "leverage", "jaccard"))

Run the code above in your browser using DataLab