cna: Perform Coincidence Analysis

Description

The cna function performs Coincidence Analysis to identify minimally necessary disjunctions of minimally sufficient conditions of all outcomes in the data and, if possible, combines the recovered conditions to common-cause and/or causal-chain structures.

Usage

cna(x, ordering = NULL, strict = FALSE, con = 1, cov = 1,
      notcols = NULL, maxstep = 5, suff.only = FALSE, what = "mac")
## S3 method for class 'cna':
print(x, what = x$what, digits = 3, nsolutions = 5,
      row.names = FALSE, show.cases = FALSE, ...)

Arguments

A data frame or an object of class truthTab (as output by truthTab).

ordering

A list of character vectors reflecting the causal ordering of the factors in x.

strict

Logial; if TRUE, factors on the same level of the causal ordering are not potential causes of each other.

con

Minimum consistency of a sufficient condition (values between 0 and 1).

cov

Minimum coverage of a necessary condition (values between 0 and 1).

maxstep

Maximum number of steps in the algorithm for finding atomic solution formulas.

suff.only

Logical; if TRUE, the function only searches for minimally sufficient conditions and does not search for atomic and complex solution formulas.

notcols

A character vector of factors to be negated in x. If notcols = "all", all factors in x are negated.

what

A character vector specifying what to print.

digits

Number of digits to print in consistency and coverage scores.

nsolutions

Maximum number of conditions, atomic and complex solutions to print.

row.names

Are passed to print.data.frame.

show.cases,...

Logical value specifying whether the attribute cases is printed.

Value

cna returns an object of class cna. Objects of class cna are lists with the following components:
rl{ call: the executed function call x: the processed data frame or truth table ordering: the implemented ordering truthTab: the object of class "truthTab" solution: the solution object, which itself is composed of lists exhibiting msc, asf, and csf for all factors in x what: the values given to the what argument }

Details

Separately for each potential outcome factor, cna first searches all minimally sufficient conditions (msc) that meet the cut-off given by con. Then, it disjunctively combines these minimally sufficient conditions to minimally necessary conditions that meet the cut-off given by cov. The resulting expressions are the atomic solution formulas (asf) for every outcome factor. The default value for con and cov is 1.

Atomic solution formulas are built from the bottom up. That is, in a first step, cna checks whether the msc are necessary for an outcome, next disjunctions of two msc, then of three, etc. are tested for necessity. The argument maxstep defines the number of such steps to be carried out, i.e. disjunctions of up to maxstep msc are examined as potential asf. Differently put, the value of maxstep determines the maximal number of disjuncts in the resulting asf. The default maxstep is 5.

Note that the default consistency and coverage cut-offs of 1 frequently will not yield any atomic solution formulas because real-life data tend to feature noise due to uncontrolled background influences. In such cases, users should gradually lower consistency and coverage cut-offs (e.g. in steps of 0.05) until cna finds solution formulas -- for the aim of a CNA is to find solutions with the highest possible consistency and coverage scores. Consistency and coverage cut-offs should only be lowered below 0.75 with great caution. If cut-offs of 0.75 do not result in solutions, the corresponding data feature such a high degree of noise that there is a severe risk of causal fallacies.

If cna finds asf, it combines them to complex solution formulas (csf). Csf are built by conjunctively concatenating asf with different outcome factors [asf with identical outcome factors are not combined, for they do not represent one complex causal structure but model ambiguities with respect to one outcome]. For instance, the two asf (D + U <-> L) and (G + L <-> E) can be combined to the csf (D + U <-> L) * (G + L <-> E), which represents a causal chain from D + U via L to E.

When prior causal knowledge about an investigated process is available, cna can be told not to treat certain factors as potential causes of other factors by means of the argument ordering. If specified, that argument defines a causal ordering for the factors in x. For example, ordering = list(c("A", "B"), "C") determines that C is causally located after A and B, meaning that C is not a potential cause of A and B. In consequence, cna only checks whether A and B can be modeled as causes of C; the test for a causal dependency in the other direction is skipped. If the argument ordering is not specified or if it is given the NULL value (which is the argument's default value), cna searches for dependencies between all factors in x.

The argument strict determines whether the elements of one level in an ordering can be causally related or not. For example, if ordering = list(c("A", "B"), "C") and strict = TRUE, then A and B -- which are on the same level of the ordering -- are excluded to be causally related and cna skips corresponding tests. By contrast, if ordering = list(c("A", "B"), "C") and strict = FALSE, then cna also searches for dependencies among A and B. The default is strict = FALSE.

The argument notcols is used to calculate asf and csf for negated factors. If notcols = "all", all factors in x are negated, i.e. uppercase factor names are turned into lowercase names and vice versa, and value 1 is turned into value 0 and vice versa. If notcols is given a character vector of factors in x, only the factors in that vector are negated. For example, notcols = c("A", "B") determines that only factors A and B are negated. Note that if an ordering is specified and if some factors are negated by means of notcols, they must likewise appear negated in the ordering. That is, if notcols = c("A", "B"), write ordering = list(c("a", "b"), "C") instead of ordering = list(c("A", "B"), "C"). Likewise, if notcols = "all", an ordering must be given in terms of negated factors, e.g. ordering = list(c("a", "b"), "c")). The default is no negations, i.e. notcols = NULL.

The argument suff.only is applicable in cases of very ambiguous solutions. It may happen that x can be modeled in terms of so many atomic and complex solution formulas that cna does not terminate before the computer's internal memory is exhausted. In such a case, suff.only = TRUE forces cna to stop the analysis after the minimization of sufficient conditions, which will normally yield results even in cases of extreme solution ambiguities. In that manner, it is possible to shed at least some light on the dependencies among the factors in x, in spite of an incomputable solution space.

The argument what can be specified both for the cna and the print function. It regulates what elements of the output of cna are printed. If what is given the value t, the truth table is printed; if it is given an m, the msc are printed; if it is given an a, the asf are printed; if it is given a c, the csf are printed. what = "all" or what = "tmac" determine that the full output is printed. The default is what = "mac".

The arguments digits, nsolutions, and show.cases only apply to the print function, which takes an object of class cna as first input. digits determines how many digits of consistency and coverage scores are printed, while nsolutions fixes the number of conditions and solutions to print. nsolutions applies separately to minimally sufficient conditions, atomic solution formulas, and complex solution formulas. nsolutions = "all" recovers all minimally sufficient conditions, atomic and complex solution formulas. show.cases is applicable if the what argument is given the value t. In that case, show.cases = TRUE yields a truth table featuring a cases column, which assigns cases to configurations.

References

Baumgartner, Michael. 2009a. Inferring Causal Complexity. Sociological Methods & Research 38(1):71-101.

Baumgartner, Michael. 2009b. Uncovering Deterministic Causal Structures: A Boolean Approach. Synthese 170(1):71-96.

Baumgartner, Michael, and Ruedi Epple. 2014. A Coincidence Analysis of a Causal Chain: The Swiss Minaret Vote. Sociological Methods & Research 43(2):280-312.

Krook, Mona Lena. 2010. Women's Representation in Parliament: A Qualitative Comparative Analysis. Political Studies 58 (5):886-908.

Lam, Wai Fung, and Elinor Ostrom. 2010. Analyzing the Dynamic Complexity of Development Interventions: Lessons from an Irrigation Experiment in Nepal. Policy Sciences 43 (2):1-25.

Wollebaek, Dag. 2010. Volatility and Growth in Populations of Rural Associations. Rural Sociology 75:144-166.

Examples

Run this code

# Artificial data on high levels of education
#--------------------------------------------
# Load dataset.
data(d.educate)

# Exhaustive CNA without constraints on the search space; print complete solution without
# the truth table.
cna(d.educate)

# The two resulting complex solution formulas represent a common cause structure 
# and a causal chain, respectively. The common cause structure is graphically depicted 
# in (Note, figure (a)), the causal chain in (Note, figure (b)).


# Print only complex solution formulas.
cna(d.educate, what = "c")

# Print only atomic solution formulas.
cna(d.educate, what = "a")

# Print only minimally sufficient conditions.
cna(d.educate, what = "m")

# Print only the truth table.
cna(d.educate, what = "t")

# CNA with negations of the factors E and L.
cna(d.educate, notcols = c("E","L"))

# CNA with negations of all factors.
cna(d.educate, notcols = "all")


# Lam and Ostrom (2010) on the impact of development interventions on water adequacy in Nepal
#--------------------------------------------------------------------------------------------
# Load dataset. 
data(d.irrigate)

# CNA with causal ordering that corresponds to the ordering in Lam & Ostrom (2010); coverage 
# cut-off at 0.9 (consistency cut-off at 1).
cna(d.irrigate, ordering = list(c("A","R","F","L","C"),"W"), cov = 0.9)

# The previous function call yields a total of 12 complex solution formulas, only
# 5 of which are printed in the default output. 
# Here is how to extract all 12 complex solution formulas.
cna.irrigate <- cna(d.irrigate, ordering = list(c("A","R","F","L","C"),"W"), cov = 0.9)
csf(cna.irrigate)

# Extract all atomic solution formulas.
asf(cna.irrigate)

# Extract all minimally sufficient conditions.
msc(cna.irrigate)

# Alterantively, all minimally sufficient conditions, atomic and complex solution formulas
# can be recovered by means of the nsolutions argument of the print function.
print(cna.irrigate, nsolutions = "all")

# Print the truth table with the "cases" column.
print(cna.irrigate, what = "t", show.cases = TRUE)

# Only build solution formulas with maximally 3 disjuncts.
cna(d.irrigate, ordering = list(c("A","R","F","L","C"),"W"), cov = 0.9, maxstep = 3)

# Only print 2 digits of consistency and coverage scores.
print(cna.irrigate, digits = 2)

# Build all but print only two minimally sufficient conditions for each factor and two 
# solution formulas.
print(cna(d.irrigate, ordering = list(c("A","R","F","L","C"),"W"), cov = 0.9), nsolutions = 2)

# CNA with a different ordering such that the factors on one level of the ordering are causally
# unrelated; consistency cut-off at 0.9 (coverage cut-off at 1); print only complex solution
# formulas.
cna(d.irrigate, ordering = list(c("A","R","L"),c("F","C"),"W"), strict = TRUE, con = 0.9, 
       what = "c")

# Same ordering with negation of factor C, consistency cut-off at 0.8, coverage cut-off at 0.9;
# print only complex solution formulas.
cna(d.irrigate, ordering = list(c("A","R","L"),c("F","c"),"W"), notcols = c("C"), con = 0.8,
       cov = 0.9, what = "c")

# Same ordering with negations of all factors, consistency cut-off at 0.75, coverage cut-off 
# at 0.75.
cna(d.irrigate, ordering = list(c("a","r","l"),c("f","c"),"w"), notcols = "all", con = 0.75,
       cov = 0.75)





# Wollebaek (2010) on very high volatility of grassroots associations in Norway
#------------------------------------------------------------------------------
# Load dataset. 
data(d.volatile)

# CNA with ordering from Wollebaek (2010) [Beware: due to massive ambiguities, this analysis
# will not terminate in reasonable time on most computers! In that case, R has to be 
# interrupted manually, e.g. by ESC, Ctr+C, or Alt+M+Enter.]
cna(d.volatile, ordering = list(c("PG","RB","EL","SE","CS","OD","PC","UP"),"VO2"))

# Using suff.only, CNA can be forced to abandon the analysis after minimization of sufficient 
# conditions. [This analysis terminates reasonably quickly.]
cna(d.volatile, ordering = list(c("PG","RB","EL","SE","CS","OD","PC","UP"),"VO2"), 
       suff.only = TRUE)

# Similarly, using maxstep, CNA can be forced to only search for atomic and complex solutions
# with a maximal number of disjuncts. [This analysis also terminates reasonably quickly, 
# yielding a total of 4264 complex solution formulas.]
cna(d.volatile, ordering = list(c("PG","RB","EL","SE","CS","OD","PC","UP"),"VO2"), maxstep = 3)


# Krook (2010) on representation of women in western-democratic parliaments
#--------------------------------------------------------------------------
# Load dataset. 
data(d.women)

# This example shows that CNA can infer which factors are causes and which ones
# are effects from the data. Without being told which factor is the outcome, 
# CNA reproduces the original QCA of Krook (2010).
cna(d.women)



# Baumgartner and Epple (2014) on the Swiss Minaret Initiative
#-------------------------------------------------------------
# load dataset
data(d.minaret)

# Set up the data frame for calibrated factors
smi <- data.frame(
 matrix(numeric(nrow(d.minaret) * (ncol(d.minaret))), ncol = ncol(d.minaret),
  dimnames = list(row.names(d.minaret), c(toupper(names(d.minaret))))
 )
)

# Calibration
smi$A  <- ifelse(d.minaret$a >= 28.0, 1, 0)
smi$L  <- ifelse(d.minaret$l >= 31.9, 1, 0)
smi$S  <- ifelse(d.minaret$s >= 14.5, 1, 0)
smi$T  <- ifelse(d.minaret$t >=  8.0, 1, 0)
smi$X  <- ifelse(d.minaret$x >= 38.2, 1, 0)
smi$M  <- ifelse(d.minaret$m >= 50.0, 1, 0)

# Replicate the results of Baumgartner and Epple (2014:290-96).
cna(smi, ordering = list(c("A","T"), c("L","S"),"X","M"), cov = 0.94)



# User defined data input
#------------------------
# Data input via data.frame()
dat1 <- data.frame(
A = c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0),
B = c(1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0),
C = c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0),
D = c(1,1,1,1,1,1,0,0,0,0,1,1,1,1,0,0,0,0,1,1,1,1,0,0,0,0,1,1,1,1,0,0,0,0,0),
E = c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0),
G = c(1,1,1,0,0,0,1,1,1,0,1,1,1,0,1,1,1,0,1,1,1,0,1,1,1,0,1,1,1,0,1,0,0,0,0),
H = c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0)
)

# CNA of dat1
cna(dat1)

# Same input via the frequency argument of the truthTab function.
dat1 <- data.frame(
A = c(1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0),
B = c(1,1,1,1,0,0,0,0,1,1,1,1,0,0,0,0),
C = c(1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0),
D = c(1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0),
E = c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0),
G = c(1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0),
H = c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0)
)
dat1_tt <- truthTab(dat1, frequency = c(3,3,3,1,3,1,3,1,3,1,3,1,3,1,1,4))
cna(dat1_tt)


# Data input via matrix() is also possible
dat2 <-
  matrix(scan(what = integer(0)), ncol = 5, byrow = TRUE)
  1 1 1 1 1
  1 1 1 0 1
  1 0 1 1 1
  1 0 1 0 1
  0 1 1 1 1
  0 1 1 0 1
  0 0 0 1 1
  0 0 0 0 0
  0 0 0 1 1
  0 0 0 1 1
  1 1 1 0 1
  1 1 1 0 1
  1 0 1 0 1
  1 1 1 1 1

cna(dat2)

Run the code above in your browser using DataLab