interestMeasure: Calculating various additional interest measures

Description

Provides the generic function interestMeasure and the needed S4 method to calculate various additional interest measures for existing sets of itemsets or rules.

Usage

interestMeasure(x, measure, transactions = NULL, reuse = TRUE, ...)

Arguments

a set of itemsets or rules.

measure

name or vector of names of the desired interest measures (see details for available measures). If measure is missing then all available measures are calculated.

transactions

the transaction data set used to mine the associations or a set of different transactions to calculate interest measures from (Note: you need to set reuse=FALSE in the later case).

reuse

logical indicating if information in quality slot should be reuse for calculating the measures. This speeds up the process significantly since only very little (or no) transaction counting is necessary if support, confidence and lift ar

...

further arguments for the measure calculation.

Value

If only one measure is used, the function returns a numeric vector containing the values of the interest measure for each association in the set of associations x.
If more than one measures are specified, the result is a data.frame containing the different measures for each association. NA is returned for rules/itemsets for which a certain measure is not defined.

Details

For itemsets $X$ the following measures are implemented: [object Object],[object Object],[object Object],[object Object]

For rules $X \Rightarrow Y$ the following measures are implemented. In the following we use the notation $supp(X \Rightarrow Y) = supp(X \cup Y)$ to indicate the support of the union of the itemsets $X$ and $Y$, i.e., the proportion of the transactions that contain both itemsets. We also use $\overline{X}$ as the complement itemset to $X$ with $supp(\overline{X}) = 1 - supp(X)$, i.e., the proportion of transactions that do not contain $X$.

[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

References

Agrawal, R., H Mannila, R Srikant, H Toivonen, AI Verkamo (1996). Fast Discovery of Association Rules. Advances in Knowledge Discovery and Data Mining 12 (1), 307--328.

Aze, J. and Y. Kodratoff (2004). Extraction de pepites de connaissances dans les donnees: Une nouvelle approche et une etude de sensibilite au bruit. In Mesures de Qualite pour la fouille de donnees. Revue des Nouvelles Technologies de l'Information, RNTI.

Bayardo, R. , R. Agrawal, and D. Gunopulos (G). Constraint-based rule mining in large, dense databases. Data Mining and Knowledge Discovery, 4(2/3):217--240, 2000.

Berzal, Fernando, Ignacio Blanco, Daniel Sanchez and Maria-Amparo Vila (2002). Measuring the accuracy and interest of association rules: A new framework. Intelligent Data Analysis 6, 221--235.

Brin, Sergey, Rajeev Motwani, Jeffrey D. Ullman, and Shalom Tsur (1997). Dynamic itemset counting and implication rules for market basket data. In SIGMOD 1997, Proceedings ACM SIGMOD International Conference on Management of Data, pages 255--264, Tucson, Arizona, USA.

Hahsler, Michael and Kurt Hornik (2007). New probabilistic interest measures for association rules. Intelligent Data Analysis, 11(5):437--455.

Hofmann, Heike and Adalbert Wilhelm (2001). Visual comparison of association rules. Computational Statistics, 16(3):399--415.

Kenett, Ron and Silvia Salini (2008). Relative Linkage Disequilibrium: A New measure for association rules. In 8th Industrial Conference on Data Mining ICDM 2008, July 16--18, 2008, Leipzig/Germany.

Kodratoff, Y. (1999). Comparing Machine Learning and Knowledge Discovery in Databases: An Application to Knowledge Discovery in Texts. Lecture Notes on AI (LNAI) - Tutorial series.

Kulczynski, S. (1927). Die Pflanzenassoziationen der Pieninen. Bulletin International de l'Academie Polonaise des Sciences et des Lettres, Classe des Sciences Mathematiques et Naturelles B, 57--203.

Lerman, I.C. (1981). Classification et analyse ordinale des donnees. Paris.

Liu, Bing, Wynne Hsu, and Yiming Ma (1999). Pruning and summarizing the discovered associations. In KDD '99: Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 125--134. ACM Press, 1999.

Omiecinski, Edward R. (2003). Alternative interest measures for mining associations in databases. IEEE Transactions on Knowledge and Data Engineering, 15(1):57--69, Jan/Feb 2003.

Piatetsky-Shapiro, G. (1991). Discovery, analysis, and presentation of strong rules. In: Knowledge Discovery in Databases, pages 229--248.

Sebag, M. and M. Schoenauer (1988). Generation of rules with certainty and confidence factors from incomplete and incoherent learning bases. In Proceedings of the European Knowledge Acquisition Workshop (EKAW'88), Gesellschaft fuer Mathematik und Datenverarbeitung mbH, 28.1--28.20.

Smyth, Padhraic and Rodney M. Goodman (1991). Rule Induction Using Information Theory. Knowledge Discovery in Databases, 159--176.

Tan, Pang-Ning and Vipin Kumar (2000). Interestingness Measures for Association Patterns: A Perspective. TR 00-036, Department of Computer Science and Engineering University of Minnesota.

Tan, Pang-Ning, Vipin Kumar, and Jaideep Srivastava (2002). Selecting the right interestingness measure for association patterns. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining (KDD '02), ACM, 32--41.

Tan, Pang-Ning, Vipin Kumar, and Jaideep Srivastava (2004). Selecting the right objective measure for association analysis. Information Systems, 29(4):293--313.

Wu, Tianyi, Yuguo Chen, and Jiawei Han (2007). Association Mining in Large Databases: A Re-examination of Its Measures. In Proceedings of the 11th European conference on Principles and Practice of Knowledge Discovery in Databases (PKDD 2007), Springer-Verlag, Berlin, Heidelberg, 621--628.

Wu, T., Y. Chen, and J. Han (2010). Re-examination of interestingness measures in pattern mining: A unified framework. Data Mining and Knowledge Discovery, 2010.

Xiong, Hui, Pang-Ning Tan, and Vipin Kumar (2003). Mining strong affinity association patterns in data sets with skewed support distribution. In Bart Goethals and Mohammed J. Zaki, editors, Proceedings of the IEEE International Conference on Data Mining, November 19--22, 2003, Melbourne, Florida, pages 387--394.

Examples

Run this code

data("Income")
rules <- apriori(Income)

## calculate a single measure and add it to the quality slot
quality(rules) <- cbind(quality(rules), 
	hyperConfidence = interestMeasure(rules, measure = "hyperConfidence", 
	transactions = Income))

inspect(head(sort(rules, by = "hyperConfidence")))

## calculate several measures
m <- interestMeasure(rules, c("confidence", "oddsRatio", "leverage"), 
	transactions = Income)
inspect(head(rules))
head(m)

## calculate all available measures for the first 5 rules and show them as a 
## table with the measures as rows
t(interestMeasure(head(rules, 5), transactions = Income))

## calculate measures on a differnt set of transactions (I use a sample here)
## Note: reuse = TRUE (default) would just retun the stored support on the
##	data set used for mining
newTrans <- sample(Income, 100)
m2 <- interestMeasure(rules, "support", transactions = newTrans, reuse = FALSE) 
head(m2)

Run the code above in your browser using DataLab