Defines a method to compute confidence intervals for interest measures for association rules.
# S3 method for rules
confint(
object,
parm = "oddsRatio",
level = 0.95,
measure = NULL,
side = c("two.sided", "lower", "upper"),
method = NULL,
replications = 1000,
smoothCounts = 0,
transactions = NULL,
...
)
Returns a matrix with with one row for each rule and the two columns
named "LL"
and "UL"
with the interval boundaries.
The matrix has the following additional attributes:
the interest measure.
the confidence level
the confidence level
used count smoothing.
name of the method to create the interval
description of the used method to calculate the confidence interval. The mentioned references can be found below.
an object of class rules.
name of the interest measures (see interestMeasure()
).
measure
can be used instead of parm
.
the confidence level required.
Should a two-sided confidence interval or a one-sided limit be returned? Lower returns an interval with only a lower limit and upper returns an interval with only an upper limit.
method to construct the confidence interval. The available methods depends on the measure and the most common method is used by default.
number of replications for method "simulation"
. Ignored
for other methods.
pseudo count for addaptive smoothing (Laplace smoothing). Often a pseudo counts of .5 is used for smoothing (see Detail Section).
if the rules object does not contain sufficient quality information, then a set of transactions to calculate the confidence interval for can be specified.
Additional parameters are ignored with a warning.
Michael Hahsler
This method creates a contingency table for each rule and then constructs a confidence interval for the specified measures.
Fast confidence interval approximations are currently available for the
measures "support"
, "count"
, "confidence"
, "lift"
, "oddsRatio"
, and "phi"
.
For all other measures, bootstrap sampling from a multinomial distribution
is used.
Haldan-Anscombe correction (Haldan, 1940; Anscombe, 1956) to avoids issues
with zero counts can be specified by smoothCounts = 0.5
. Here .5 is
added to each count in the contingency table.
Wilson, E. B. (1927). "Probable inference, the law of succession, and statistical inference". Journal of the American Statistical Association, 22 (158): 209-212. tools:::Rd_expr_doi("10.1080/01621459.1927.10502953")
Clopper, C.; Pearson, E. S. (1934). "The use of confidence or fiducial limits illustrated in the case of the binomial". Biometrika, 26 (4): 404-413. tools:::Rd_expr_doi("10.1093/biomet/26.4.404")
Doob, J. L. (1935). "The Limiting Distributions of Certain Statistics". Annals of Mathematical Statistics, 6: 160-169. tools:::Rd_expr_doi("10.1214/aoms/1177732594")
Fisher, R.A. (1962). "Confidence limits for a cross-product ratio". Australian Journal of Statistics, 4, 41.
Woolf, B. (1955). "On estimating the relation between blood group and diseases". Annals of Human Genetics, 19, 251-253.
Haldane, J.B.S. (1940). "The mean and variance of the moments of chi-squared when used as a test of homogeneity, when expectations are small". Biometrika, 29, 133-134.
Anscombe, F.J. (1956). "On estimating binomial response relations". Biometrika, 43, 461-464.
Other interest measures:
coverage()
,
interestMeasure()
,
is.redundant()
,
is.significant()
,
support()
data("Income")
# mine some rules with the consequent "language in home=english"
rules <- apriori(Income, parameter = list(support = 0.5),
appearance = list(rhs = "language in home=english"))
# calculate the confidence interval for the rules' odds ratios.
# note that we use Haldane-Anscombe correction (with smoothCounts = .5)
# to avoid issues with 0 counts in the contingency table.
ci <- confint(rules, "oddsRatio", smoothCounts = .5)
ci
# We add the odds ratio (with Haldane-Anscombe correction)
# and the confidence intervals to the quality slot of the rules.
quality(rules) <- cbind(
quality(rules),
oddsRatio = interestMeasure(rules, "oddsRatio", smoothCounts = .5),
oddsRatio = ci)
rules <- sort(rules, by = "oddsRatio")
inspect(rules)
# use confidence intervals for lift to find rules with a lift significantly larger then 1.
# We set the confidence level to 95%, create a one-sided interval and check
# if the interval does not cover 1 (i.e., the lower limit is larger than 1).
ci <- confint(rules, "lift", level = 0.95, side = "lower")
ci
inspect(rules[ci[, "LL"] > 1])
Run the code above in your browser using DataLab