The caliper
parameter constrains the maximum distance between units
assigned to the same block. This is implemented by restricting the
edge weight in the graph used to construct the blocks (see
sc_clustering
for details). As a result, the caliper
will affect all blocks and, in general, make it harder for
the function to find good matches even for blocks where the caliper is not
binding. In particular, a too tight caliper
can lead to discarded
units that otherwise would be assigned to a block satisfying both the
matching constraints and the caliper. For this reason, it is recommended
to set the caliper
value quite high and only use it to avoid particularly
poor blocks. It strongly recommended to use the caliper
parameter only
when primary_unassigned_method = "closest_seed"
in the underlying
sc_clustering
function (which is the default
behavior).
The main algorithm used to construct the blocking may produce
some blocks that are much larger than the minimum size constraint. If
break_large_blocks
is TRUE
, all blocks twice as large as
size_constraint
will be broken into two or more smaller blocks. Block
are broken so to ensure that the new blocks satisfy the size constraint.
In general, large blocks are produced when units are highly clustered,
so breaking up large blocks will often only lead to small improvements. The
blocks are broken using the hierarchical_clustering
function.
quickblock
calls sc_clustering
with
seed_method = "inwards_updating"
. The seed_method
parameter
governs how the seeds are selected in the nearest neighborhood graph that
is used to construct the blocks (see sc_clustering
for details). The "inwards_updating"
option generally works well
and is safe with most datasets. Using seed_method = "exclusion_updating"
often leads to better performance (in the sense of blocks with more
similar units), but it may increase run time. Discrete data (or more generally
when units tend to be at equal distance to many other units) will lead to
particularly poor run time with this option. If the dataset has at least one
continuous covariate, "exclusion_updating"
is typically quick. A third
option is seed_method = "lexical"
, which decreases the run time relative
to "inwards_updating"
(sometimes considerably) at the cost of performance.
quickblock
passes parameters on to sc_clustering
,
so to change seed_method
, call quickblock
with the parameter
specified as usual: quickblock(..., seed_method = "exclusion_updating")
.