cv.CausalANOVA: Cross validation for the CausalANOVA.

Description

cv.CausalANOVA implements cross-validation for CausalANOVA to select the cost parameter.

Usage

cv.CausalANOVA(formula,data,cv.cost=c(0.1,0.3,0.5,0.7,1.0), type="bin",
               pair.id=NULL,nway=2,diff=TRUE,eps=1e-5,nfolds=10,seed=1234)

Arguments

formula

a formula that specifies outcome and treatment variables.

data

an optional data frame, list or environment (or object coercible by 'as.data.frame' to a data frame) containing the variables in the model. If not found in 'data', the variables are taken from 'environment(formula)', typically the environment from which 'CausalANOVA' is called.

cv.cost

a vector containing candidates for a cost parameter ranging from 0 to 1. 1 corresponds to no regularization and the smaller value corresponds to the stronger regularization. Default is c(0.1,0.3,0.5,0.7,1.0).

type

When the outcome is binary, set type to "bin". Cross-validation error is based on misclassification. WHen the outcome is continuous, set type to "cont". Cross-validation error is based on the mean squared error.

pair.id

unique identifiers for each pair of comparison. This option is used when dif=TRUE.

nway

"2" when the two way causal interactions are of interest and "3" when the three-way and two-way causal interactions are of interest. Default is 2.

diff

a logical indicating whether the outcome is the choice between a pair. If diff=TRUE, pair.id should specify a pair of comparison.

eps

a tolerance parameter in the internal optimization algorithm.

nfolds

number of folds - default is 10. Although nfolds can be as large as the sample size (leave-one-out CV), it is not recommended for large datasets.

seed

an argument for set.seed().

Value

cv.error

The mean cross-validated error - a vector of length length(cv.t).

cv.min

value of t that gives minimum cv.missclass.

cv.1sd

largest value of t such that error is within 1 standard error of the minimum.

cv.each.mat

a matrix containing cross-validation errors for each fold and cost parameter.

cv.cost

the cv.cost used in the function.

Details

Suggested workflow.

Specify the order of levels within each factor using levels(). Since the function places penalties on the differences between adjacent levels when levels are ordered, it is crucial to specify the order of levels within each factor carefully.
Run cv.CausalANOVA. Select the cost parameter minimizing the cross-validation error. Or choose largest value of cost such that error is within 1 standard error of the minimum. plot.cv.CausalANOVA can be used to investigate how cross-validation errors vary depending on cost parameters.
Run CausalANOVA. Run the main model with the chosen cost parameter and see summary by summary.CausalANOVA. If researchers want to compute selection probabilities, set select.prob=TRUE. Given it is computationally intensive, we recommend to compute selection probabilities when the model is finalized. The selection probability for the range of the AME (AMIE) is one minus the proportion of bootstrap replicates in which all coefficients for the corresponding factor (factor interaction) are estimated to be zero. The selection probability of the AME (AMIE) is the proportion of bootstrap replicates in which the sign of the effect is the same as the point estimate.
Investigate two-way interactions. Run plot.CausalANOVA and visualize the AMIEs by choosing two factors of interest. Run AMIE to examine decomposition of the average combination effect into the AMIE and AMEs.

References

Post, J. B. and Bondell, H. D. 2013. ``Factor selection and structural identification in the interaction anova model.'' Biometrics 69, 1, 70--79.

Egami, Naoki and Kosuke Imai. 2016+. ``Causal Interaction in Factorial Experiments: Application to Conjoint Analysis.'' Working paper. http://imai.princeton.edu/research/files/int.pdf

Examples

Run this code

# NOT RUN {
data(Carlson)
## Specify the order of each factor
Carlson$newRecordF<- factor(Carlson$newRecordF,ordered=TRUE,
                         levels=c("YesLC", "YesDis","YesMP",
                             "noLC","noDis","noMP","noBusi"))
Carlson$promise <- factor(Carlson$promise,ordered=TRUE,levels=c("jobs","clinic","education"))
Carlson$coeth_voting <- factor(Carlson$coeth_voting,ordered=FALSE,levels=c("0","1"))
Carlson$relevantdegree <- factor(Carlson$relevantdegree,ordered=FALSE,levels=c("0","1"))

## Run cv.CausalANOVA
# }
# NOT RUN {
cv.fit <- cv.CausalANOVA(won ~ newRecordF + promise + coeth_voting + relevantdegree,
                         data=Carlson,
                         pair.id=Carlson$contestresp,diff=TRUE, nway=2)

cv.fit
plot(cv.fit)
# }
# NOT RUN {
fit <- CausalANOVA(won ~ newRecordF + promise + coeth_voting + relevantdegree,
                    data=Carlson,
                    pair.id=Carlson$contestresp,diff=TRUE, nway=2,cost=0.15)
## Or when we need selection probabilities.
# }
# NOT RUN {
fit <- CausalANOVA(won ~ newRecordF + promise + coeth_voting + relevantdegree,
                    data=Carlson,
                    pair.id=Carlson$contestresp,diff=TRUE,nway=2,cost=0.15,
                    select.prob=TRUE,boot=500,block.id=Carlson$respcodeS)
# }
# NOT RUN {
summary(fit)

# }
# NOT RUN {
## plot 
plot(fit,fac.name=c("newRecordF","coeth_voting"))
# }
# NOT RUN {
## compute AMIEs
amie1 <- AMIE(fit,fac.name=c("promise","newRecordF"),
              level.name=c("jobs","noLC"),
              base.name=c("jobs","YesLC"))

amie2 <- AMIE(fit,fac.name=c("newRecordF","coeth_voting"),
              level.name=c("noBus","1"),
              base.name=c("noMP","0"))


# }

Run the code above in your browser using DataLab