For multilevel (and a binary) treatment variables, the cem weights
are calulated with respect to the baseline
. Therefore,
matched units with treatment variable equal to the baseline level receive weight 1, the others the usual cem weights. Unless specified,
by default baseline
is set
to "1"
. If this level is not one of the possible values taken by
the treatment
variable, then the baseline is set to the first level of the treatment
variable.
When specifying cutpoints, several automatic methods may be chosen, including
``sturges
'' (Sturges' rule, the default),
``fd
'' (Freedman-Diaconis' rule), ``scott
''
(Scott's rule) and ``ss'' (Shimazaki-Shinomoto's rule).
See references for a description of each rule.
The grouping
option is a list where each element is itself a
list. For example, suppose for variable quest1
you have the
following possible levels "no answer", NA, "negative", "neutral",
"positive"
and you want to collect ("no answer", NA, "neutral")
into a single group, then the grouping
argument should contain
list(quest1=list(c("no answer", NA, "neutral")))
. Or if you have
a discrete variable elements
with values 1:10
and you want
to collect it into groups ``1:3,NA
'', ``4
'',
``5:9
'', ``10
'' you specify in grouping
the
following list list(elements=list(c(1:3,NA), 5:9))
. Values not
defined in the grouping
are left as they are. If cutpoints
and groupings
are defined for the same variable, the
groupings
take precedence and the corresponding cutpoints are set
to NULL
.
verbose
: a number greater or equal to 0. The higher, the
more info are provided during the execution of the algorithm.
If eval.imbalance
= TRUE
,
cem$imbalance
contains the imbalance measure by absolute
difference in means for numerical variables and chi-square distance for
categorical variables. If FALSE
(the default) then cem$imbalance
is set
to NULL
. If data contains missing data, the imbalance measures
are not calculated.
If L1.breaks
is missing, the default rule to calculate cutpoints
is the Scott's rule.
If k2k
is set to TRUE
, the algorithm return strata with
the same number of treated and control units per stratum, otherwise all
the matched units are returned (default). When k2k
= TRUE
,
the user can choose a method
(between `euclidean
',
`maximum
', `manhattan
', `canberra
', `binary
'
and `minkowski
') for nearest neighbor matching inside each
cem
strata. By default method
is set to `NULL
',
which means random matching inside cem
strata. For the Minkowski
distance the power can be specified via the argument mpower
'.
For more information on method != NULL
, refer to
dist
help page.
If k2k
is set to TRUE
also keep.all
is set to TRUE
.
By default, cem
treats missing values as distinct categories and
matches observations with missing values in the same variable in the
same stratum provided that all the remaining (corasened) covariates
match.
If argument data
is non-NULL
and datalist
is
NULL
, CEM is applied to the single data set in data
.
Argument datalist
is a list of (multiply imputed) data frames
(i.e., with missing cell values imputed). If data
is
NULL
, the function cem
is applied independently to each
element of the list, resulting in separately matched data sets with
different numbers of treated and control units.
When data
and datalist
are both non-NULL
, each
multiply imputed observation is assigned to the stratum in which it has
been matched most frequently. In this case, the algorithm outputs the
same matching solution for each multiply imputed data set (i.e., an
observation, and the number of treated and control units matched, in one
data set has the same meaning in all, and is the same for all)