A function that calculates the mutual information for sets of variables, calculates the G statistic, determines the significance of the sets, and only keeps those that are significant.
pairmi(data, alpha = 0.05, MI.threshold = NULL, n_elements = 5, sep = "_")A list with the following components:
A data frame containing the original variables and the columns for significant sets (e.g., pair/triplet indicators).
Character vector of the original variable names.
A data frame describing significant sets, including their members, size, MI, G statistic, p-value, and constructed name.
A data frame containing the variables to be paired/combined. Columns should be binary.
Numeric p-value threshold for significance (default used by the implementation if not supplied).
Numeric mutual information threshold. If provided, it overrides `alpha`-based filtering.
Integer giving the maximum size of sets to evaluate (e.g., `2` for pairs, `3` for triplets). Must be >= 2.
String used to join variable names when forming set identifiers (e.g., `"_"`).