cv.CausalANOVA: Cross validation for the CausalANOVA.

Description

cv.CausalANOVA implements cross-validation for CausalANOVA to select the collapse.cost parameter. CausalANOVA runs this function internally when defaults when collapse.type=cv.min or collapse.type=cv.1Std.

Usage

cv.CausalANOVA(
  formula,
  int2.formula = NULL,
  int3.formula = NULL,
  data,
  nway = 1,
  pair.id = NULL,
  diff = FALSE,
  cv.collapse.cost = c(0.1, 0.3, 0.7),
  nfolds = 5,
  screen = FALSE,
  screen.type = "fixed",
  screen.num.int = 3,
  family = "binomial",
  cluster = NULL,
  maxIter = 50,
  eps = 1e-05,
  seed = 1234,
  fac.level = NULL,
  ord.fac = NULL,
  verbose = TRUE
)

Arguments

formula

a formula that specifies outcome and treatment variables.

int2.formula

(optional). A formula that specifies two-way interactions.

int3.formula

(optional). A formula that specifies three-way interactions.

data

an optional data frame, list or environment (or object coercible by 'as.data.frame' to a data frame) containing the variables in the model. If not found in 'data', the variables are taken from 'environment(formula)', typically the environment from which 'CausalANOVA' is called.

nway

With nway=1, the function estimates the Average Marginal Effects (AMEs) only. With nway=2, the function estimates the AMEs and the two-way Average Marginal Interaction Effects (AMIEs). With nway=3, the function estimates the AMEs, the two-way and three-way AMIEs. Default is 1.

pair.id

(optional).Unique identifiers for each pair of comparison. This option is used when diff=TRUE.

diff

A logical indicating whether the outcome is the choice between a pair. If diff=TRUE, pair.id should specify a pair of comparison. Default is FALSE.

cv.collapse.cost

A vector containing candidates for a cost parameter ranging from 0 to 1. 1 corresponds to no regularization and the smaller value corresponds to the stronger regularization. Default is c(0.1,0.3,0.7).

nfolds

number of folds - default is 5. Although nfolds can be as large as the sample size (leave-one-out CV), it is not recommended for large datasets.

screen

A logical indicating whether select significant factor interactions with glinternet. When users specify interactions using int2.formula or int3.formula, this option is ignored. screen should be used only when users want data-driven selection of factor-interactions. With screen.type, users can specify how to screen factor interactions. We recommend to use this option when the number of factors is large, e.g., more than 6. Default is FALSE.

screen.type

Type for screening factor interactions. (1) "fixed" select the fixed number (specified by screen.num.int) of factor interactions. (2) "cv.min" selects factor-interactions with the tuning parameter giving the minimum cross-validation error. (3) "cv.1Std" selects factor-interactions with the tuning parameter giving a cross-validation error that is within 1 standard deviation of the minimum cv error.

screen.num.int

(optional).The number of factor interactions to select. This option is used when and screen=TRUE and screen.type="fixed". Default is 3.

family

A family of outcome variables. "gaussian" when continuous outcomes "binomial" when binary outcomes. Default is "binomial".

cluster

Unique identifies with which cluster standard errors are computed.

maxIter

The number of maximum iteration for glinternet.

eps

A tolerance parameter in the internal optimization algorithm.

seed

an argument for set.seed().

fac.level

optional. A vector containing the number of levels in each factor. The order of fac.level should match to the order of columns in the data. For example, when the first and second columns of the design matrix is "Education" and "Race", the first and second element of fac.level should be the number of levels in "Education" and "Race", respectively.

ord.fac

optional. logical vectors indicating whether each factor has ordered (TRUE) or unordered (FALSE) levels. When levels are ordered, the function uses the order given by function levels(). If levels are ordered, the function places penalties on the differences between adjacent levels. If levels are unordered, the function places penalties on the differences based on every pairwise comparison.

verbose

whether it prints the value of a cost parameter used.

Value

cv.error

The mean cross-validated error - a vector of length length(cv.t).

cv.min

A value of t that gives minimum cv.missclass.

cv.1Std

The largest value of t such that error is within 1 standard error of the minimum.

cv.each.mat

A matrix containing cross-validation errors for each fold and cost parameter.

cv.cost

The cv.collapse.cost used in the function.

Details

See Details in CausalANOVA.

References

Post, J. B. and Bondell, H. D. 2013. ``Factor selection and structural identification in the interaction anova model.'' Biometrics 69, 1, 70--79.

Egami, Naoki and Kosuke Imai. 2019. Causal Interaction in Factorial Experiments: Application to Conjoint Analysis, Journal of the American Statistical Association. http://imai.fas.harvard.edu/research/files/int.pdf

Examples

Run this code

# NOT RUN {
data(Carlson)
## Specify the order of each factor
Carlson$newRecordF<- factor(Carlson$newRecordF,ordered=TRUE,
                            levels=c("YesLC", "YesDis","YesMP",
                                     "noLC","noDis","noMP","noBusi"))
Carlson$promise <- factor(Carlson$promise,ordered=TRUE,levels=c("jobs","clinic","education"))
Carlson$coeth_voting <- factor(Carlson$coeth_voting,ordered=FALSE,levels=c("0","1"))
Carlson$relevantdegree <- factor(Carlson$relevantdegree,ordered=FALSE,levels=c("0","1"))

## ####################################### 
## Collapsing Without Screening
## ####################################### 
#################### AMEs and two-way AMIEs ####################
## We show a very small example for illustration.
## Recommended to use cv.collapse.cost=c(0.1,0.3,0.5) and nfolds=10 in practice.
fit.cv <- cv.CausalANOVA(formula=won ~ newRecordF + promise + coeth_voting + relevantdegree,
                         int2.formula = ~ newRecordF:coeth_voting,
                         data=Carlson, pair.id=Carlson$contestresp,diff=TRUE,
                         cv.collapse.cost=c(0.1,0.3), nfolds=2,
                         cluster=Carlson$respcodeS, nway=2)
fit.cv

# }

Run the code above in your browser using DataLab