cnaOpt: Find atomic solution formulas with optimal consistency and coverage

Description

cnaOpt attempts to find atomic solution formulas (asfs) for a given outcome (inferred from crisp-set, "cs", or multi-value, "mv", data) that are optimal with respect to consistency and coverage.

Usage

cnaOpt(x, outcome, ..., crit = quote(con * cov), cond = quote(TRUE))

Arguments

A data.frame or truthTab of type "cs" or "mv"; if x is of type "mv", x must be a truthTab.

outcome

A character string specifying one outcome, i.e. one factor in x.

…

Additional arguments passed to truthTab, for instance type, rm.dup.factors, rm.dup.factors, or case.cutoff.

crit, cond

A quoted expression specifying additional criteria for selecting optimal solutions, passed to selectMax.

Value

cnaOpt returns a data.frame with additional class "condTbl". See the "Value" section in ?condTbl for details.

Details

cnaOpt infers causal models (atomic solution formulas, asf) for the outcome from data x that are optimal with respect to the optimality criterion crit and complying with conditions cond. Data x may be crisp-set ("cs") or multi-value ("mv"), but not fuzzy-set ("fs"). The function first calculates consistency and coverage optima (con-cov optima) for x, then selects the optimum that is best according to crit and cond, builds the canonical disjunctive normal form (DNF) realizing the best optimum and, finally, generates all minimal forms of that canonical DNF.

Roughly speaking, running cnaOpt amounts to sequentially executing truthTab, conCovOpt, selectMax, DNFbuild and condTbl.

Examples

Run this code

# NOT RUN {
# Example 1: Real-life crisp-set data, d.educate.
(res_opt1 <- cnaOpt(d.educate, "E"))

# Using the pipe operator (%>%), the steps processed by cnaOpt in the 
# call above can be reproduced as follows:
library(dplyr)
conCovOpt(d.educate, "E") %>% selectMax %>% DNFbuild("E", reduce = "ereduce") %>% 
  paste("<-> E") %>% condTbl(d.educate)


# Example 2: Simulated crisp-set data.
dat1 <- data.frame(
  A = c(1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0), 
  B = c(0, 0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0), 
  C = c(0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0), 
  D = c(1, 1, 0, 1, 0, 0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 1), 
  E = c(1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1), 
  F = c(0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1)
)

(res_opt2 <- cnaOpt(dat1, "E"))

# Change the optimality criterion.
cnaOpt(dat1, "E", crit = quote(pmin(con, cov)))
# Impose an additional condition.
# }
# NOT RUN {
cnaOpt(dat1, "E", cond = quote(con >= 0.9))
# }
# NOT RUN {

# Example 3: All logically possible configurations.
(res_opt3 <- cnaOpt(full.tt(4), "D"))  # All combinations are equally bad.


# Example 4: Real-life multi-value data, d.pban.
cnaOpt(mvtt(d.pban), outcome = "PB=1")
# }
# NOT RUN {
cnaOpt(mvtt(d.pban), outcome = "PB=1", crit = quote(pmin(con, cov)))
# }
# NOT RUN {
cnaOpt(mvtt(d.pban), outcome = "PB=1", cond = quote(con > 0.93))
# }
# NOT RUN {
cnaOpt(mvtt(d.pban), outcome = "PB=0")
cnaOpt(mvtt(d.pban), outcome = "PB=0", cond = quote(con > 0.93))
cnaOpt(d.pban, type = "mv", outcome = "F=2")
cnaOpt(d.pban, type = "mv", outcome = "F=2", cond = quote(con > 0.75))

# }