optimum_allocation: Optimum Allocation

Description

Determines the optimum sampling fraction and sample size for each stratum in a stratified random sample, which minimizes the variance of the sample mean according to Neyman Allocation or Exact Optimum Sample Allocation (Wright 2014).

Usage

optimum_allocation(
  data,
  strata,
  y = NULL,
  sd_h = NULL,
  N_h = NULL,
  nsample = NULL,
  ndigits = 2,
  method = c("WrightII", "WrightI", "Neyman"),
  weights = NULL,
  allow.na = FALSE
)

Value

Returns a data frame with the number of samples allocated to each stratum, or just the sampling fractions if nsample is NULL.

Arguments

data

A data frame or matrix with at least one column specifying each unit's stratum, and either 1) a second column holding the value of the continuous variable for which the sample mean variance should be minimized (y) or 2) two columns: one holding the the within-stratum standard deviation for the variable of interest (sd_h) and another holding the stratum sample sizes (N_h). If data contains a column y holding values for the variable of interest, then data should have one row for each sampled unit. If data holds sd_h and N_h, the within-stratum standard deviations and population sizes, then data should have one row per stratum. Other columns are allowed but will be ignored.

strata

a character string or vector of character strings specifying the name(s) of columns which specify the stratum that each unit belongs to. If multiple column names are provided, each unique combination of values in these columns is taken to define one stratum.

y

a character string specifying the name of the continuous variable for which the variance should be minimized. Defaults to NULL and should be left as NULL when data holds stratum standard deviations and sample sizes instead of individual sampling units. If a character vector of length > 1 is supplied, then function performs A-optimal allocation to minimize the sum of variances.

sd_h

a character string specifying the name of the column holding the within-stratum standard deviations for each stratum. Defaults to NULL and should be left as NULL when data holds individual sampling units. If a character vector of length > 1 is supplied, the function performs A-optimal allocation to minimize the sum of variances.

N_h

a character string specifying the name of the column holding the population stratum sizes for each stratum. Defaults to NULL and should be left as NULL when data holds individual sampling units.

nsample

the desired total sample size. Defaults to NULL.

ndigits

a numeric value specifying the number of digits to which the standard deviation and stratum fraction should be rounded. Defaults to 2.

method

a character string specifying the method of optimum sample allocation to use. Must be one of:

"WrightII", the default, uses Algorithm II from Wright (2014) to determine the optimum allocation of a fixed sample size across the strata. It requires that at least two samples are allocated to each stratum.
"WrightI" uses Wright's Algorithm I to determine the optimum sample allocation. It only requires that at least one sample is allocated to each stratum, and can therefore lead to a biased variance estimate.
"Neyman" uses the standard method of Neyman Allocation to determine the optimum sample allocation. When nsample = NULL, the optimal sampling fraction is calculated and returned. When a numeric value is specified for nsample, then the number allocated to each stratum is the optimal sampling fraction times nsample rounded to the nearest integer, which may no longer be optimall.

weights

A numeric vector of length matching the length of y or sd_h that is only applicable if these lengths are > 1. In this case, the values must sum to 1 and correspond to the weights of each variables of interest in A-optimal allocation.

allow.na

logical input specifying whether y should be allowed to have NA values. Defaults to FALSE.

Details

If a character vector of length > 1 is supplied for y or sd_h, then function performs A-optimal allocation to minimize the sum of variances.

References

Wright, T. (2014). A Simple Method of Exact Optimal Sample Allocation under Stratification with any Mixed Constraint Patterns, Research Report Series (Statistics #2014-07), Center for Statistical Research and Methodology, U.S. Bureau of the Census, Washington, D.C.

Examples

Run this code

optimum_allocation(
  data = iris, strata = "Species", y = "Sepal.Width",
  nsample = 40, method = "WrightII"
)

# Or if input data is summary of strata sd and N:
iris_summary <- data.frame(
  strata = unique(iris$Species),
  size = c(50, 50, 50),
  sd = c(0.3791, 0.3138, 0.3225)
)

optimum_allocation(
  data = iris_summary, strata = "strata",
  sd_h = "sd", N_h = "size",
  nsample = 40, method = "WrightII"
)

# A-optimal allocation to minimize the sum of variances if a vector is
# supplied for y or sd_h
optimum_allocation(
  data = iris, strata = "Species", y = c("Sepal.Width", "Sepal.Length"),
  weights = c(0.5,0.5),
  nsample = 40, method = "WrightII"
)

iris_summary2 <- data.frame(
  strata = unique(iris$Species),
  size = c(50, 50, 50),
  sd1 = c(0.3791, 0.3138, 0.3225),
  sd2 = c(0.3525, 0.5162, 0.6359)
)

optimum_allocation(
  data = iris_summary2, strata = "strata",
  sd_h = c("sd1", "sd2"), weights = c(0.5,0.5),
  N_h = "size",
  nsample = 40, method = "WrightII"
)

Run the code above in your browser using DataLab