For multilevel (and a binary) treatment variables, the cem weights
are calulated with respect to the baseline. Therefore,
matched units with treatment variable equal to the baseline level receive weight 1, the others the usual cem weights. Unless specified,
by default baseline is set
to "1". If this level is not one of the possible values taken by
the treatment variable, then the baseline is set to the first level of the treatment variable.
When specifying cutpoints, several automatic methods may be chosen, including
``sturges'' (Sturges' rule, the default),
``fd'' (Freedman-Diaconis' rule), ``scott''
(Scott's rule) and ``ss'' (Shimazaki-Shinomoto's rule).
See references for a description of each rule.
The grouping option is a list where each element is itself a
list. For example, suppose for variable quest1 you have the
following possible levels "no answer", NA, "negative", "neutral",
"positive" and you want to collect ("no answer", NA, "neutral")
into a single group, then the grouping argument should contain
list(quest1=list(c("no answer", NA, "neutral"))). Or if you have
a discrete variable elements with values 1:10 and you want
to collect it into groups ``1:3,NA'', ``4'',
``5:9'', ``10'' you specify in grouping the
following list list(elements=list(c(1:3,NA), 5:9)). Values not
defined in the grouping are left as they are. If cutpoints
and groupings are defined for the same variable, the
groupings take precedence and the corresponding cutpoints are set
to NULL.
verbose: a number greater or equal to 0. The higher, the
more info are provided during the execution of the algorithm.
If eval.imbalance = TRUE,
cem$imbalance contains the imbalance measure by absolute
difference in means for numerical variables and chi-square distance for
categorical variables. If FALSE (the default) then cem$imbalance is set
to NULL. If data contains missing data, the imbalance measures
are not calculated.
If L1.breaks is missing, the default rule to calculate cutpoints
is the Scott's rule.
If k2k is set to TRUE, the algorithm return strata with
the same number of treated and control units per stratum, otherwise all
the matched units are returned (default). When k2k = TRUE,
the user can choose a method (between `euclidean',
`maximum', `manhattan', `canberra', `binary'
and `minkowski') for nearest neighbor matching inside each
cem strata. By default method is set to `NULL',
which means random matching inside cem strata. For the Minkowski
distance the power can be specified via the argument mpower'.
For more information on method != NULL, refer to
dist help page.
If k2k is set to TRUE also keep.all is set to TRUE.
By default, cem treats missing values as distinct categories and
matches observations with missing values in the same variable in the
same stratum provided that all the remaining (corasened) covariates
match.
If argument data is non-NULL and datalist is
NULL, CEM is applied to the single data set in data.
Argument datalist is a list of (multiply imputed) data frames
(i.e., with missing cell values imputed). If data is
NULL, the function cem is applied independently to each
element of the list, resulting in separately matched data sets with
different numbers of treated and control units.
When data and datalist are both non-NULL, each
multiply imputed observation is assigned to the stratum in which it has
been matched most frequently. In this case, the algorithm outputs the
same matching solution for each multiply imputed data set (i.e., an
observation, and the number of treated and control units matched, in one
data set has the same meaning in all, and is the same for all)