dopt: Optimum Sample Allocation in Stratified Sampling Schemes

Description

A classical problem in survey methodology in stratified sampling is an optimum sample allocation problem. This problem is formulated as determination of a vector of strata sample sizes that minimizes the variance of the pi-estimator of the population total of a given study variable, under constraint on total sample size.

The dopt() function solves the problem of optimum sample allocation under lower or upper bounds constraints, optionally imposed on strata sample sizes. The allocation computed is valid for all stratified sampling schemes for which the variance of the stratified pi-estimator is of the form: $$D(x_1,...,x_H) = a^2_1/x_1 + ... + a^2_H/x_H - b,$$ where $H$ denotes total number of strata, $x_1, ..., x_H$ are the strata sample sizes, and $b$, $a_w > 0$ do not depend on $x_w, w = 1, ..., H$.

The dopt() function makes use of the following allocation algorithms: rna, sga, sgaplus, coma for optimal sample allocation under one-sided upper bounds constraints, and lrna for optimal sample allocation under one-sided lower bounds constraints. For the allocation under box-constraints, the rnabox algorithm is used. The rna, sga, and coma are described in Wesołowski et al. (2021), while the sgaplus is in Wójciak (2019). The lrna is introduced in Wójciak (2022). The rnabox algorithm is a new optimal allocation algorithm that was developed by the authors of this package and will be published soon.

Usage

dopt(n, a, m = NULL, M = NULL, M_method = "rna")

Value

Numeric vector with optimal sample allocation in strata.

Arguments

n: (number)
total sample size. A strictly positive scalar.
a: (numeric)
parameters $a_1, ..., a_H$ of variance function $D$. Strictly positive numbers.
m: (numeric or NULL)
lower bounds constraints optionally imposed on strata sample sizes. If not NULL, it is then required that n >= sum(m). Strictly positive numbers.
M: (numeric or NULL)
upper bounds constraints optionally imposed on strata sample sizes. If not NULL, it is then required that n <= sum(M). Strictly positive numbers.
M_method: (string)
the name of the underlying algorithm to be used for computing a sample allocation under one-sided upper bounds constraints One of the following: rna (default), sga, sgaplus, coma. This parameter is used only in case when m argument is NULL and M is not NULL.

Details

The dopt() function computes: $$argmin D(x_1,...,x_H),$$ under the equality constraint imposed on total sample size: $$x_1 + ... + x_H = n,$$ and inequality constraints (optimally) imposed on strata sample size: $$m_w <= x_w <= M_w, w = 1,...,H.$$ Here, $H$ denotes total number of strata, $x_1, ..., x_H$ are the strata sample sizes, and $n > 0$, $b$, $a_w > 0, w = 1, ..., H$, are given numbers. Furthermore, $m_w > 0$ and $M_w > 0, w = 1, ..., H$ are lower and upper bounds respectively, optionally imposed on sample sizes in strata.

User of dopt() can choose whether the inequality constraints will be added to the optimization problem or not. This is achieved with the proper use of m and M arguments of the function. In case of no inequality constraints to be added, m and M must be both specified as NULL (default). If only upper bounds constraints should be added, it should be specified using M argument, while leaving m as NULL. If only lower bounds constraints should be added, user must specify it with m argument, while leaving M as NULL. Finally, in case of box-constraints, both parameters m and M must be specified.

For the case of one-sided upper bounds constraints only, there are four different underlying algorithms available to use. These are abbreviated as: "rna" (rna_onesided()), "sga" (sga()), "sgaplus" (sgaplus()), and "coma" (coma()). Functions names that perform given algorithms are given in the brackets. See its help page for more details. For the case of one-sided lower bounds constraints only, the "rna" (rna_onesided()) is used. Finally, for box-constraints, the "rnabox" algorithm is used (rnabox()).

References

Wesołowski, J., Wieczorkowski, R., Wójciak, W. (2021), Optimality of the recursive Neyman allocation, Journal of Survey Statistics and Methodology, tools:::Rd_expr_doi("10.1093/jssam/smab018"), tools:::Rd_expr_doi("10.48550/arXiv.2105.14486")

Wójciak, W. (2022), Minimum sample size allocation in stratified sampling under constraints on variance and strata sample sizes, tools:::Rd_expr_doi("10.48550/arXiv.2204.04035")

Wójciak, W. (2019), Optimal allocation in stratified sampling schemes, MSc Thesis, Warsaw University of Technology, Warsaw, Poland. http://home.elka.pw.edu.pl/~wwojciak/msc_optimal_allocation.pdf

Sarndal, C.-E., Swensson, B., and Wretman, J. (1992), Model Assisted Survey Sampling, New York, NY: Springer.

Examples

Run this code

a <- c(3000, 4000, 5000, 2000)
m <- c(100, 90, 70, 50)
M <- c(300, 400, 200, 90)

# Only lower bounds.
dopt(n = 340, a = a, m = m)
dopt(n = 400, a = a, m = m)
dopt(n = 700, a = a, m = m)

# Only upper bounds.
dopt(n = 190, a = a, M = M)
dopt(n = 700, a = a, M = M)

# Box-constraints.
dopt(n = 340, a = a, m = m, M = M)
dopt(n = 500, a = a, m = m, M = M)
dopt(n = 800, a = a, m = m, M = M)

# Example of execution-time comparison of different algorithms
# using bench R package.
if (FALSE) {
N <- pop969[, "N"]
S <- pop969[, "S"]
a <- N * S
nfrac <- seq(0.01, 0.9, 0.05)
n <- setNames(as.integer(nfrac * sum(N)), nfrac)
lapply(
  n,
  function(ni) {
    bench::mark(
      dopt(ni, a, M = N, M_method = "rna"),
      dopt(ni, a, M = N, M_method = "sga"),
      dopt(ni, a, M = N, M_method = "sgaplus"),
      dopt(ni, a, M = N, M_method = "coma"),
      iterations = 200
    )[c(1, 3)]
  }
)
}

Run the code above in your browser using DataLab