eqmcc: QCA minimization using the enhanced Quine-McCluskey algorithm

Description

This function performs the QCA minimization of some causal conditions, with respect to an outcome. It is called “eqmcc” from the (e)nhanced Quine-McCluskey, a different algorithm which returns the same, exact solutions.

Usage

eqmcc(data, outcome = "", conditions = "",  relation = "suf", n.cut = 1, incl.cut = 1,  explain = "1", include = "", row.dom = FALSE, all.sol = FALSE, omit = NULL, dir.exp = "", details = FALSE, show.cases = FALSE, inf.test = "", use.tilde = FALSE, use.letters = FALSE, ...)

Arguments

data

A truth table object or a data frame containing calibrated causal conditions and an outcome.

outcome

A string containing the outcome(s) name(s), separated by commas.

conditions

A single string containing the conditions' (columns) names separated by commas, or a character vector of conditions' names.

relation

The set relation to outcome, either "suf" or "sufnec".

n.cut

The minimum number of cases under which a truth table row is declared as a remainder.

incl.cut

The inclusion cutoff(s): either a single value for the presence of the output, or a vector of length 2, the second for the absence of the output.

explain

A vector of output values to explain.

include

A vector of other output values to include in the minimization process.

row.dom

Logical, perform row dominance in the prime implicants' chart to eliminate redundant prime implicants.

all.sol

Derive all possible solutions, irrespective of the number of prime implicants.

omit

A vector of row numbers from the truth table, or a matrix of causal combinations to omit from the minimization process.

dir.exp

A vector of directional expectations to derive intermediate solutions.

details

Logical, print more details about the solution.

show.cases

Logical, print case names.

inf.test

Specifies the statistical inference test to be performed (currently only "binom") and the critical significance level. It can be either a vector of length 2, or a single string containing both, separated by a comma.

use.tilde

Logical, use tilde for negation with bivalent variables.

use.letters

Logical, use letters instead of causal conditions' names.

...

Other arguments (mainly for backwards compatibility).

Value

An object of class "qca" when using a single outcomes, or class "mqca" when using multiple outcomes. These objects are lists having the following components:

The truth table object.

excluded

The line number(s) of the negative configuration(s).

initials

The initial positive configuration(s).

PIs

The prime implicant(s).

PIchart

A list containing the PI chart(s).

solution

A list of solution(s).

essential

A list of essential PI(s).

pims

A list of PI membership scores.

A list of simplifying assumptions.

i.sol

A list of components specific to intermediate solution(s), each having a prime implicants

chart, prime implicant membership scores, (non-simplifying) easy counterfactuals and

difficult counterfactuals.

Details

The argument data can be either a truth table object (created with the function truthTable()) or a data frame containing calibrated columns.

Calibration can be either crisp, with 2 or more values starting from 0, or fuzzy with continous scores from 0 to 1. Raw data containing relative frequencies can also be continous between 0 and 1, but these are not calibrated, fuzzy data.

Some columns can contain the placeholder "-" indicating a “don't care”, which is used to indicate the temporal order between other columns in tQCA. These special columns are not causal conditions, hence no parameters of fit will be calculated for them.

The argument outcome specifies the column name to be explained. If the outcome is a multivalue column, it can be specified in curly bracket notation, indicating the value to be explained (the others being automatically converted to zero).

The outcome can be negated using a tilde operator ~X. The logical argument neg.out is now deprecated, but still backwards compatible. Replaced by the tilde in front of the outcome name, it controls whether outcome is to be explained or its negation.

If the outcome column is multi-value, the argument outcome should use the standard curly-bracket notation X{value}. Multiple values are allowed, separated by a comma (for example X{1,2}). Negation of the outcome can also be performed using the tilde ~ operator, for example ~X{1,2}, which is interpreted as: "all values in X except 1 and 2" and it becomes the new outcome to be explained.

Using both neg.out = TRUE and a tilde ~ in the outcome name don't cancel each other out, either one (or even both) signaling if the outcome should be negated.

This function supports multiple outcomes, in which case all of them should also be specified in the argument conditions.

The argument conditions specifies the causal conditions' names among the other columns in the data. For backwards compatibility, this argument also accepts a character vector of condition variables' names. When this argument is not specified, all other columns except for the outcome are taken as causal conditions (and in case there are multiple outcomes, all columns are considered causal conditions).

A good practice advice is to specify both outcome and conditions as upper case letters. It is possible, in a next version, to negate outcomes using lower case letters, situation in which it really does matter how the outcome and/or conditions are specified.

The argument relation is used to identify solutions which are sufficient for the outcome. When using relation = "suf", the function will return all solutions which are sufficient for the outcome (whether necessary or not). If using relation = "sufnec", only those solutions which are both sufficient and necessary will be returned.

The argument n.cut specifies the frequency threshold under which a truth table row is coded as a remainder, irrespective of its inclusion score.

The argument incl.cut replaces the (deprecated, but still backwards compatible) former arguments incl.cut1 and incl.cut0. Most of the analyses use the inclusion cutoff for the presence of the output (code "1"). When users need both inclusion cutoffs (see below), incl.cut can be specified as a vector of length 2, in the form: c(ic1, ic0) where:

ic1

is the inclusion cutoff for the presence of the output,

a minimum sufficiency inclusion score above which the output value is coded with "1".

ic0

is the inclusion cutoff for the absence of the output,

a maximum sufficiency inclusion score below which the output value is coded with "0".

If not specifically declared, the argument ic0 is automatically set equal to ic1, but otherwise ic0 should always be lower than ic1.

Using these two cutoffs, the observed combinations are coded with:

"1"

if they have an inclusion score above ic1

"C"

if they have an inclusion score below ic1 and above ic0 (contradiction)

"0"

if they have an inclusion score below ic0

The argument explain specifies the output values corresponding to the truth table rows which enter in the minimization process. Such values can be "1", "C", "0", "1, C" and "0, C", but not "1, 0" and "1, 0, C". Note that for "0", "C" and "0, C", configurations will be reduced but no solution details printed.

The argument include specifies which other truth table rows are included in the minimization process. Most often, the remainders are included but any value accepted in the argument explain is also accepted in the argument include.

The argument row.dom is used to further eliminate redundant prime implicants when solving the PI chart, applying the principle of row dominance: if a prime implicant X covers the same configurations as another prime implicant Y and in the same time covers other configurations which Y does not cover, then Y is redundant and eliminated.

When solving the PI chart, the algorithm finds the minimal number of prime implicants needed (k) to cover all configurations, then finds all possible pairs of k prime implicants which do cover those configurations. The argument all.sol presents all possible combinations of n prime implicants which solves the PI chart, where n >= k.

all.sol deactivates the argument row.dom, thus inflating the number of possible solutions. Depending on the complexity of the PI chart, sometimes it is not even possible to get all possible solutions.

The argument omit is used to exclude truth table rows from the minimization process, from the positive configurations and/or from the remainders. It can be specified as a vector of truth table line numbers, or as a matrix of causal combinations.

The argument dir.exp is used to specify directional expectations, as described by Ragin (2003). They can be specified as a single string, with values separated by commas. For multi-value directional expectations, they are specified together, separated by semicolons. The total length of the directional expectations must match the number of causal conditions specified in the analysis, using a dash "-" if there are no particular expectations for a specific causal condition.

Activating the details argument has the effect of printing parameters of fit for each prime implicant and each overall solution, the essential prime implicants being listed in the top part of the table. It also prints the truth table, in case the argument data has been provided as a data frame instead of a truth table object.

When argument show.cases is set to TRUE, the case names will be printed at their corresponding row in the truth table, and also at their corresponding prime implicants in the table containing the parameters of fit. Cases separated by commas belong to the same truth table row, while groups separated by semicolons belong to different truth table rows.

The argument inf.test combines the inclusion score with a statistical inference test, in order to assign values in the output column from the truth table (assuming the argument data is not already a truth table object). For the moment, it is only the binomial test, which needs crisp data (it doesn't work with fuzzy sets). For a given (specified) critical significance level, the output for a truth table row will be coded as:

"1"

if the true inclusion score is significanly higher than ic1,

"C"

contradiction, if the true inclusion score is not significantly higher than ic1

but significantly higher than ic0,

"0"

if the true inclusion score is not significantly higher than ic0.

It should be noted that statistical tests perform well only when the number of cases is large, otherwise they are usually not significant. For a low number of cases, depending on the inclusion cutoff value(s), it will be harder to code a value of "1" in the output, and also harder to obtain contradictions if the true inclusion is not signficantly higher than ic0.

The argument use.letters controls using the original names of the causal conditions, or replace them by single letters in alphabetical order. If the causal conditions are already named with single letters, the original letters will be used.

References

Cebotari, V.; Vink, M.P. (2013) “A Configurational Analysis of Ethnic Protest in Europe”. International Journal of Comparative Sociology vol.54, no.4, pp.298-324.

Cebotari, V.; Vink, M.P. (2015) “Replication Data for: A configurational analysis of ethnic protest in Europe”, DOI: http://dx.doi.org/10.7910/DVN/PT2IB9, Harvard Dataverse, V2

Cronqvist, L.; Berg-Schlosser, D. (2009) “Multi-Value QCA (mvQCA)”, in Rihoux, B.; Ragin, C. (eds.) Configurational Comparative Methods. Qualitative Comparative Analysis (QCA) and Related Techniques, SAGE.

Dusa, A.; Thiem, A. (2015) “Enhancing the Minimization of Boolean and Multivalue Output Functions With eQMC” Journal of Mathematical Sociology vol.39, no.2, pp.92-108.

Ragin, C. (2003) Recent Advances in Fuzzy-Set Methods and Their Application to Policy Questions. WP 2003-9, COMPASSS. URL: http://www.compasss.org/wpseries/Ragin2003a.pdf.

Ragin, C. (2009) “Qualitative Comparative Analysis Using Fuzzy-Sets (fsQCA)”, in Rihoux, B.; Ragin, C. (eds.) Configurational Comparative Methods. Qualitative Comparative Analysis (QCA) and Related Techniques, SAGE.

Ragin, C.C.; Strand, S.I. (2008) “Using Qualitative Comparative Analysis to Study Causal Order: Comment on Caren and Panofsky (2005).” Sociological Methods & Research vol.36, no.4, pp.431-441.

Rihoux, B.; De Meur, G. (2009) “Crisp Sets Qualitative Comparative Analysis (mvQCA)”, in Rihoux, B.; Ragin, C. (eds.) Configurational Comparative Methods. Qualitative Comparative Analysis (QCA) and Related Techniques, SAGE.

Examples

Run this code

if (require("QCA")) {

# -----
# Lipset binary crisp data
data(LC)

# the associated truth table
ttLC <- truthTable(LC, "SURV", sort.by = "incl, n")
ttLC

# conservative solution (Rihoux & De Meur 2009, p.57)
cLC <- eqmcc(ttLC)
cLC

# view the Venn diagram for the associated truth table
library(venn)
venn(cLC)

# add details and case names
eqmcc(ttLC, details = TRUE, show.cases = TRUE)

# negating the outcome
ttLCn <- truthTable(LC, "~SURV", sort.by = "incl, n")
eqmcc(ttLCn)

# using a tilde instead of upper/lower case names
eqmcc(ttLCn, use.tilde = TRUE)

# parsimonious solution, positive output
pLC <- eqmcc(ttLC, include = "?", details = TRUE, show.cases = TRUE)
pLC

# the associated simplifying assumptions
pLC$SA

# parsimonious solution, negative output
pLCn <- eqmcc(ttLCn, include = "?", details = TRUE, show.cases = TRUE)
pLCn



# -----
# Lipset multi-value crisp data (Cronqvist & Berg-Schlosser 2009, p.80)
data(LM)

# truth table 
ttLM <- truthTable(LM, "SURV", conditions = "DEV, URB, LIT, IND",
        sort.by = "incl", show.cases = TRUE)

# conservative solution, positive output
eqmcc(ttLM, details = TRUE, show.cases = TRUE)

# parsimonious solution, positive output
eqmcc(ttLM, include = "?", details = TRUE, show.cases = TRUE)

# negate the outcome
ttLMn <- truthTable(LM, "~SURV", conditions = "DEV, URB, LIT, IND",
         sort.by = "incl", show.cases = TRUE)

# conservative solution, negative output
eqmcc(ttLMn, details = TRUE, show.cases = TRUE)

# parsimonious solution, positive output
eqmcc(ttLMn, include = "?", details = TRUE, show.cases = TRUE)



# -----
# Lipset fuzzy sets data (Ragin 2009, p.112)
data(LF)

# truth table using a very low inclusion cutoff
ttLF <- truthTable(LF, "SURV", incl.cut = 0.7,
        show.cases = TRUE, sort.by="incl")

# conservative solution
eqmcc(ttLF, details = TRUE, show.cases = TRUE)

# parsimonious solution
eqmcc(ttLF, include = "?", details = TRUE, show.cases = TRUE)

# intermediate solution using directional expectations
iLF <- eqmcc(ttLF, include = "?", details = TRUE, show.cases = TRUE,
             dir.exp = "1,1,1,1,1")


# -----
# Cebotari & Vink (2013, 2015)
data(CVF) 

ttCVF <- truthTable(CVF, outcome = "PROTEST", incl.cut = 0.8,
	                show.cases = TRUE, sort.by = "incl, n")

pCVF <- eqmcc(ttCVF, include = "?", details = TRUE, show.cases = TRUE)
pCVF

# inspect the PI chart
pCVF$PIchart

# DEMOC*ETHFRACT*poldis is dominated by DEMOC*ETHFRACT*GEOCON
# using row dominance to solve the PI chart
pCVFrd <- eqmcc(ttCVF, include = "?", row.dom = TRUE,
                details = TRUE, show.cases = TRUE)

# plot the prime implicants on the outcome
pims <- pCVFrd$pims

par(mfrow = c(2, 2))
for(i in 1:4) {
    XYplot(pims[, i], CVF$PROTEST, cex.axis = 0.6)
}


# -----
# temporal QCA (Ragin & Strand 2008)
data(RS)
eqmcc(RS, "REC", details = TRUE, show.cases = TRUE)

}

Run the code above in your browser using DataLab