ipu_nr: Iterative Proportional Updating (Newton-Raphson)

Description

List balancing similar to ipu, but using the Newton-Raphson approach to optimization. Created primarily as a point of comparison for ipu.

Usage

ipu_nr(
  primary_seed,
  primary_targets,
  secondary_seed = NULL,
  secondary_targets = NULL,
  target_priority = 1e+07,
  relative_gap = 0.01,
  max_iterations = 100,
  absolute_diff = 10,
  weight_floor = 1e-05,
  verbose = FALSE,
  max_ratio = 10000,
  min_ratio = 1e-04
)

Arguments

primary_seed

In population synthesis or household survey expansion, this would be the household seed table (each record would represent a household). It could also be a trip table, where each row represents an origin-destination pair. Must contain a pid ("primary ID") field that is unique for each row. Must also contain a geography field that starts with "geo_".

primary_targets

A named list of data frames. Each name in the list defines a marginal dimension and must match a column from the primary_seed table. The data frame associated with each named list element must contain a geography field (starts with "geo_"). Each row in the target table defines a new geography (these could be TAZs, tracts, clusters, etc.). The other column names define the marginal categories that targets are provided for. The vignette provides more detail.

secondary_seed

Most commonly, if the primary_seed describes households, the secondary seed table would describe a unique person with each row. Must also contain the pid column that links each person to their respective household in primary_seed. Must not contain any geography fields (starting with "geo_").

secondary_targets

Same format as primary_targets, but they constrain the secondary_seed table.

target_priority

This argument controls how quickly each set of targets is relaxed. In other words: how important it is to match the target exactly. Defaults to 10,000,000, which means that all targets should be matched exactly.

real: This priority value will be used for each target table.
named list: Each named entry must match an entry in either primary_targets or secondary_targets and have a real. This priority will be applied to that target table. Any targets not in the list will default to 10,000,000.
data.frame: Column target must have values that match an entry in either primary_targets or secondary_targets. Column priority contains the values to use for priority. Any targets not in the table will default to 10,000,000.

relative_gap

After each iteration, the weights are compared to the previous weights and the the relative_gap threshold, then the process terminates.

max_iterations

maximum number of iterations to perform, even if relative_gap is not reached.

absolute_diff

Upon completion, the ipu() function will report the worst-performing marginal category and geography based on the percent difference from the target. absolute_diff is a threshold below which percent differences don't matter.

For example, if if a target value was 2, and the expanded weights equaled 1, that's a 100 is only 1.

Defaults to 10.

weight_floor

Minimum weight to allow in any cell to prevent zero weights. Set to .0001 by default. Should be arbitrarily small compared to your seed table weights.

verbose

Print iteration details and worst marginal stats upon completion? Default FALSE.

max_ratio

real number. The average weight per seed record is calculated by dividing the total of the targets by the number of records. The max_scale caps the maximum weight at a multiple of that average. Defaults to 10000 (basically turned off).

min_ratio

real number. The average weight per seed record is calculated by dividing the total of the targets by the number of records. The min_scale caps the minimum weight at a multiple of that average. Defaults to 0.0001 (basically turned off).

Value

a named list with the primary_seed with weight, a histogram of the weight distribution, and two comparison tables to aid in reporting.