List balancing similar to ipu
, but using the
Newton-Raphson approach to optimization. Created primarily as a point of
comparison for ipu
.
ipu_nr(
primary_seed,
primary_targets,
secondary_seed = NULL,
secondary_targets = NULL,
target_priority = 1e+07,
relative_gap = 0.01,
max_iterations = 100,
absolute_diff = 10,
weight_floor = 1e-05,
verbose = FALSE,
max_ratio = 10000,
min_ratio = 1e-04
)
In population synthesis or household survey expansion,
this would be the household seed table (each record would represent a
household). It could also be a trip table, where each row represents an
origin-destination pair. Must contain a pid
("primary ID") field
that is unique for each row. Must also contain a geography field that
starts with "geo_".
A named list
of data frames. Each name in the
list defines a marginal dimension and must match a column from the
primary_seed
table. The data frame associated with each named list
element must contain a geography field (starts with "geo_"). Each row in
the target table defines a new geography (these could be TAZs, tracts,
clusters, etc.). The other column names define the marginal categories that
targets are provided for. The vignette provides more detail.
Most commonly, if the primary_seed describes households, the
secondary seed table would describe a unique person with each row. Must
also contain the pid
column that links each person to their
respective household in primary_seed
. Must not contain any geography
fields (starting with "geo_").
Same format as primary_targets
, but they constrain
the secondary_seed
table.
This argument controls how quickly each set of
targets is relaxed. In other words: how important it is to match the target
exactly. Defaults to 10,000,000
, which means that all targets should
be matched exactly.
real
This priority value will be used for each target table.
named list
Each named entry must match an entry in either
primary_targets
or secondary_targets
and have a real
.
This priority will be applied to that target table. Any targets not in the
list will default to 10,000,000
.
data.frame
Column target
must have values that match an
entry in either primary_targets
or secondary_targets
. Column
priority
contains the values to use for priority. Any targets not in
the table will default to 10,000,000
.
After each iteration, the weights are compared to the
previous weights and the
the relative_gap
threshold, then the process terminates.
maximum number of iterations to perform, even if
relative_gap
is not reached.
Upon completion, the ipu()
function will report
the worst-performing marginal category and geography based on the percent
difference from the target. absolute_diff
is a threshold below which
percent differences don't matter.
For example, if if a target value was 2, and the expanded weights equaled 1, that's a 100 is only 1.
Defaults to 10.
Minimum weight to allow in any cell to prevent zero weights. Set to .0001 by default. Should be arbitrarily small compared to your seed table weights.
Print iteration details and worst marginal stats upon
completion? Default FALSE
.
real
number. The average weight per seed record is
calculated by dividing the total of the targets by the number of records.
The max_scale caps the maximum weight at a multiple of that average. Defaults
to 10000
(basically turned off).
real
number. The average weight per seed record is
calculated by dividing the total of the targets by the number of records.
The min_scale caps the minimum weight at a multiple of that average. Defaults
to 0.0001
(basically turned off).
a named list
with the primary_seed
with weight, a
histogram of the weight distribution, and two comparison tables to aid in
reporting.