
Last chance! 50% off unlimited learning
Sale ends in
cv.CausalANOVA
implements cross-validation for CausalANOVA
to
select the collapse.cost
parameter. CausalANOVA
runs this
function internally when defaults when collapse.type=cv.min
or
collapse.type=cv.1Std
.
cv.CausalANOVA(
formula,
int2.formula = NULL,
int3.formula = NULL,
data,
nway = 1,
pair.id = NULL,
diff = FALSE,
cv.collapse.cost = c(0.1, 0.3, 0.7),
nfolds = 5,
screen = FALSE,
screen.type = "fixed",
screen.num.int = 3,
family = "binomial",
cluster = NULL,
maxIter = 50,
eps = 1e-05,
seed = 1234,
fac.level = NULL,
ord.fac = NULL,
verbose = TRUE
)
a formula that specifies outcome and treatment variables.
(optional). A formula that specifies two-way interactions.
(optional). A formula that specifies three-way interactions.
an optional data frame, list or environment (or object coercible by 'as.data.frame' to a data frame) containing the variables in the model. If not found in 'data', the variables are taken from 'environment(formula)', typically the environment from which 'CausalANOVA' is called.
With nway=1
, the function estimates the Average Marginal
Effects (AMEs) only. With nway=2
, the function estimates the AMEs
and the two-way Average Marginal Interaction Effects (AMIEs). With
nway=3
, the function estimates the AMEs, the two-way and three-way
AMIEs. Default is 1.
(optional).Unique identifiers for each pair of comparison.
This option is used when diff=TRUE
.
A logical indicating whether the outcome is the choice between a
pair. If diff=TRUE
, pair.id
should specify a pair of
comparison. Default is FALSE
.
A vector containing candidates for a cost parameter
ranging from 0 to 1. 1 corresponds to no regularization and the smaller
value corresponds to the stronger regularization. Default is
c(0.1,0.3,0.7)
.
number of folds - default is 5. Although nfolds can be as large as the sample size (leave-one-out CV), it is not recommended for large datasets.
A logical indicating whether select significant factor
interactions with glinternet
. When users specify interactions using
int2.formula
or int3.formula
, this option is ignored.
screen
should be used only when users want data-driven selection of
factor-interactions. With screen.type
, users can specify how to
screen factor interactions. We recommend to use this option when the number
of factors is large, e.g., more than 6. Default is FALSE
.
Type for screening factor interactions. (1)
"fixed"
select the fixed number (specified by screen.num.int
)
of factor interactions. (2) "cv.min"
selects factor-interactions with
the tuning parameter giving the minimum cross-validation error. (3)
"cv.1Std"
selects factor-interactions with the tuning parameter
giving a cross-validation error that is within 1 standard deviation of the
minimum cv error.
(optional).The number of factor interactions to
select. This option is used when and screen=TRUE
and
screen.type="fixed"
. Default is 3.
A family of outcome variables. "gaussian"
when
continuous outcomes "binomial"
when binary outcomes. Default is
"binomial"
.
Unique identifies with which cluster standard errors are computed.
The number of maximum iteration for glinternet
.
A tolerance parameter in the internal optimization algorithm.
an argument for set.seed()
.
optional. A vector containing the number of levels in each
factor. The order of fac.level
should match to the order of columns
in the data. For example, when the first and second columns of the design
matrix is "Education" and "Race", the first and second element of
fac.level
should be the number of levels in "Education" and "Race",
respectively.
optional. logical vectors indicating whether each factor has
ordered (TRUE
) or unordered (FALSE
) levels. When levels are
ordered, the function uses the order given by function levels()
. If
levels are ordered, the function places penalties on the differences between
adjacent levels. If levels are unordered, the function places penalties on
the differences based on every pairwise comparison.
whether it prints the value of a cost parameter used.
The mean cross-validated error - a vector of length
length(cv.t)
.
A value of t
that gives minimum
cv.missclass
.
The largest value of t
such that
error is within 1 standard error of the minimum.
A matrix containing cross-validation errors for each fold and cost parameter.
The cv.collapse.cost
used in the function.
See Details in CausalANOVA
.
Post, J. B. and Bondell, H. D. 2013. ``Factor selection and structural identification in the interaction anova model.'' Biometrics 69, 1, 70--79.
Egami, Naoki and Kosuke Imai. 2019. Causal Interaction in Factorial Experiments: Application to Conjoint Analysis, Journal of the American Statistical Association. http://imai.fas.harvard.edu/research/files/int.pdf
# NOT RUN {
data(Carlson)
## Specify the order of each factor
Carlson$newRecordF<- factor(Carlson$newRecordF,ordered=TRUE,
levels=c("YesLC", "YesDis","YesMP",
"noLC","noDis","noMP","noBusi"))
Carlson$promise <- factor(Carlson$promise,ordered=TRUE,levels=c("jobs","clinic","education"))
Carlson$coeth_voting <- factor(Carlson$coeth_voting,ordered=FALSE,levels=c("0","1"))
Carlson$relevantdegree <- factor(Carlson$relevantdegree,ordered=FALSE,levels=c("0","1"))
## #######################################
## Collapsing Without Screening
## #######################################
#################### AMEs and two-way AMIEs ####################
## We show a very small example for illustration.
## Recommended to use cv.collapse.cost=c(0.1,0.3,0.5) and nfolds=10 in practice.
fit.cv <- cv.CausalANOVA(formula=won ~ newRecordF + promise + coeth_voting + relevantdegree,
int2.formula = ~ newRecordF:coeth_voting,
data=Carlson, pair.id=Carlson$contestresp,diff=TRUE,
cv.collapse.cost=c(0.1,0.3), nfolds=2,
cluster=Carlson$respcodeS, nway=2)
fit.cv
# }
Run the code above in your browser using DataLab