dig_implications: Search for implicative rules

Description

Implicative rule is a rule of the form \(A \Rightarrow c\), where \(A\) (antecedent) is a set of predicates and \(c\) (consequent) is a predicate.

Usage

dig_implications(
  x,
  antecedent = everything(),
  consequent = everything(),
  disjoint = NULL,
  min_length = 0L,
  max_length = Inf,
  min_coverage = 0,
  min_support = 0,
  min_confidence = 0,
  t_norm = "goguen",
  ...
)

Value

A tibble with found rules and computed quality measures.

Arguments

x: a matrix or data frame with data to search in. The matrix must be numeric (double) or logical. If x is a data frame then each column must be either numeric (double) or logical.
antecedent: a tidyselect expression (see tidyselect syntax) specifying the columns to use in the antecedent (left) part of the rules
consequent: a tidyselect expression (see tidyselect syntax) specifying the columns to use in the consequent (right) part of the rules
disjoint: an atomic vector of size equal to the number of columns of x that specifies the groups of predicates: if some elements of the disjoint vector are equal, then the corresponding columns of x will NOT be present together in a single condition.
min_length: the minimum length, i.e., the minimum number of predicates in the antecedent, of a rule to be generated. Value must be greater or equal to 0. If 0, rules with empty antecedent are generated in the first place.
max_length: The maximum length, i.e., the maximum number of predicates in the antecedent, of a rule to be generated. If equal to Inf, the maximum length is limited only by the number of available predicates.
min_coverage: the minimum coverage of a rule in the dataset x. (See Description for the definition of coverage.)
min_support: the minimum support of a rule in the dataset x. (See Description for the definition of support.)
min_confidence: the minimum confidence of a rule in the dataset x. (See Description for the definition of confidence.)
t_norm: a t-norm used to compute conjunction of weights. It must be one of "goedel" (minimum t-norm), "goguen" (product t-norm), or "lukas" (Lukasiewicz t-norm).
...: Further arguments, currently unused.

Author

Michal Burda

Details

For the following explanations we need a mathematical function \(supp(I)\), which is defined for a set \(I\) of predicates as a relative frequency of rows satisfying all predicates from \(I\). For logical data, \(supp(I)\) equals to the relative frequency of rows, for which all predicates \(i_1, i_2, \ldots, i_n\) from \(I\) are TRUE. For numerical (double) input, \(supp(I)\) is computed as the mean (over all rows) of truth degrees of the formula i_1 AND i_2 AND ... AND i_n, where AND is a triangular norm selected by the t_norm argument.

Implicative rules are characterized with the following quality measures.

Length of a rule is the number of elements in the antecedent.

Coverage of a rule is equal to \(supp(A)\).

Support of a rule is equal to \(supp(A \cup \{c\}\).