This function creates a predictor matrix using the
variable selection procedure described in Van Buuren et
al.~(1999, p.~687--688). The function is designed to aid
in setting up a good imputation model for data with many
variables. Basic workings: The procedure calculates for each
variable pair (i.e. target-predictor pair) two
correlations using all available cases per pair. The
first correlation uses the values of the target and the
predictor directly. The second correlation uses the
(binary) response indicator of the target and the values
of the predictor. If the largest (in absolute value) of
these correlations exceeds mincor, the predictor
will be added to the imputation set. The default value
for mincor is 0.1.
In addition, the procedure eliminates predictors whose
proportion of usable cases fails to meet the minimum
specified by minpuc. The default value is 0, so
predictors are retained even if they have no usable case.
Finally, the procedure includes any predictors named in
the include argument (which is useful for
background variables like age and sex) and eliminates any
predictor named in the exclude argument. If a
variable is listed in both include and
exclude arguments, the include argument
takes precedence.
Advanced topic: mincor and minpuc are
typically specified as scalars, but vectors and squares
matrices of appropriate size will also work. Each element
of the vector corresponds to a row of the predictor
matrix, so the procedure can effectively differentiate
between different target variables. Setting a high values
for can be useful for auxilary, less important,
variables. The set of predictor for those variables can
remain relatively small. Using a square matrix extends
the idea to the columns, so that one can also apply
cellwise thresholds.