- dat
data, a data frame with a treatment assignment or missingness indicator, covariates, and possibly outcomes (which are optional).
- ind
treatment assignment or missingness indicator, a string with the name of the binary treatment or missingness indicator, equal to 1 if treated (missing) and 0 otherwise.
When par$par_est = "aux"
, ind
is omitted.
- out
outcome, a vector of strings with the names of the outcome variables. The default is NULL
.
- bal
balance requirements, a list with the requirements for covariate balance with the form
list(bal_cov, bal_alg, bal_tol, bal_std, bal_gri, bal_sam)
, where:
bal_cov
balance covariates, a vector of strings with the names of the covariates in dat
to be balanced.
In simple applications, the balance covariates in bal_cov
will be the column names of dat
(without including the treatment or outcome variables) for the original covariates in the data set. The covariates need to be either continuous or binary.
Categorical covariates need to be transformed into dummies. In more complex applications, the covariates in dat
can be
transformations of the original covariates in order to balance higher order single or multidimensional moments, or other basis functions. If the transformations of the covariates are indicators of the quantiles
of the empirical distribution of a covariate, then balancing all these indicators will tend to balance the entire marginal distribution
of the covariate.
bal_alg
balance algorithm, a logical that indicates whether the tuning algorithm in Wang and Zubizarreta (2020) is
to be used for automatically selecting the degree of approximate covariates balance. The default is TRUE
.
See the argument bal_gri
below for the candidate values for the degree of approximate covariate balance.
bal_tol
balance tolerances, a scalar or vector of scalars
that define the tolerances or maximum differences in means after weighting for the covariates (or transformations thereof) defined in bal_cov
.
Note that if bal_tol
is a vector, then its length has to be equal to the length of bal_cov
.
Otherwise, the first element in bal_tol
will be taken as the balance tolerance for all the constraints in bal_cov
.
bal_std
balance tolerances in standard deviations, a string that represent
how the tolerances are adjusted. If bal_std = "group"
, the tolerances
are proportional to the standard deviations in the group/groups to be weighted.
If bal_std = "target"
, the tolerances are proportional to the standard deviations
in the target group. If bal_std = "manual"
, the tolerances equal to bal_tol
.
The default is "group"
.
bal_gri
grid of values for the tuning algorithm bal_alg
, a vector of candidate values for the degree of approximate covariate balance.
The default is c(0.0001, 0.001, 0.002, 0.005, 0.01, 0.02, 0.05, 0.1)
.
The computational time is roughly proportional to the number of grid values.
bal_sam
number of replicates to be used in bal_alg
, an integer specifying the number of bootstrap sample replicates
to be used to select the degree of approximate covariate balance. See Wang and Zubizarreta (2020) for details. The default is 1000
.
- wei
weighting constraints, a list with all the weighting constraints with the form
list(wei_sum, wei_pos)
, where:
wei_sum
sum of weights, a logical variable indicating whether the weights are constrained to sum up to one, or whether their sum
is unconstrained. The default is TRUE
for the sum of weights equal to one. Note that if wei_sum = TRUE
, then wei_pos = TRUE
.
wei_pos
positive or zero (non-negative) weights, a logical variable indicating whether the weights are constrained to be non-negative, or whether they
are unconstrained. The default is TRUE
for non-negative weights. Again, note that if wei_sum = TRUE
, then wei_pos = TRUE
.
- sol
solver, a list that specifies the solver option with the form
list(sol_nam, sol_dis, sol_pog)
, where:
sol_nam
solver name, a string equal to either "cplex"
, "gurobi"
, "mosek"
, "osqp"
, "pogs"
, or "quadprog"
.
CPLEX, Gurobi and MOSEK are commercial solvers, but free for academic users.
POGS and QUADPROG are free for all. In our experience, POGS is the fastest solver option
and able to handle larger datasets, but it can be difficult to install for non-Mac users
and more difficult to calibrate. MOSEK is more stable than POGS and faster.
The default option is sol_nam = "quadprog"
.
sol_dis
solver display, a logical variable indicating whether the output is to be displayed or not.
The default is FALSE
. This option is specific to "cplex"
, "gurobi"
, "mosek"
, "pogs"
, and "osqp"
.
sol_pog
solver options specific to "pogs"
, with the following default parameters:
sol_pog = list(sol_pog_max_iter = 100000, sol_pog_rel_tol = 1e-4,
sol_pog_abs_tol = 1e-4, sol_pog_gap_stp = TRUE, sol_pog_adp_rho = TRUE)
.
See the POGS manual for details.
- par
parameter of interest, a list describing the parameter of interest or estimand with the form
list(par_est, par_tar)
, where:
par_est
estimand. For causal inference, a string equal to:
"att"
(Average Treatment effect on the Treated),
"atc"
(Average Treatment effect on the Controls),
"ate"
(Average Treatment Effect),
"cate"
(Conditional Average Treatment Effect).
For estimation with incomplete outcome data, a string equal to:
"pop"
(General population means) or
"aux"
(Means for a population specified by the user). The default is "att"
.
par_tar
target, a string, or a vector of scalars.
It specifies the targeted population for inference in terms of the observed covariates
when par_est = "cate"
, "pop"
or "aux"
. Please see the examples.
- mes
a logical variable indicating whether the messages are printed.