Learn R Programming

MOSAlloc (version 1.2.3)

mosallocSTRS: (Single-stage) stratified random sampling interface for functions mosalloc() and mosallocStepwiseFirst()

Description

Interface for the functions mosalloc() and mosallocStepwiseFirst() of the same package which allows for a user-friendly data handling in the case of single-stage stratified random sampling.

Usage

mosallocSTRS(
  X_var,
  X_tot,
  N,
  listD,
  listA = NULL,
  listC = NULL,
  fpc = TRUE,
  l = 2,
  u = NULL,
  opts,
  ForceOptimality = FALSE,
  X_cost = NULL,
  X_fixed = NULL
)

Value

A mosaSTRS object or a list of mosaSTRS

objects. A mosaSTRS object is a list containing the following components:

$sense

Sense of optimzation; max precision or min_cost.

$method

The method used, either weighted sum scalarization (WSS) or weighted Chebyshev minimization (WCM).

$init_w

The initial preference weightings (opts$init_w).

$opt_w

The optimal weightings w.r.t. opts$init_w as opts$init_w might not lead to Pareto optimality. NULL if ForceOptimality = FALSE.

$n_opt

The vector of optimal sample sizes.

$objective

The objective values, including the sensitivity and the RSE.

$precision

A data frame corresponding to the precision constraints.

$cost

A data frame corresponding to the cost constraints.

$problem_components

A list containing the input data to the optimization problem.

$output_mosalloc

A list of function returns of mosalloc().

If opts$init_w has multiple rows, the function returns a list of mosaSTRS objects whose length equals the number of rows.

Arguments

X_var

(type: matrix) A matrix containing stratum- (row) and variable- (column) specific precision units (e.g. variances).

X_tot

(type: matrix) A matrix containing stratum- (row) and variable- (column) specific totals.

N

(type: vector) A vector of stratum sizes.

listD

(type: list) A list of lists taking subpopulation- (domain/area) specific arguments.
If opts$sense == "max_precision", elements are lists containing the following components which correspond to one specific precision target each:
..$stratum_id (type: numeric) A vector containing the indices of the strata considered for the current restriction. The indices must coincide with the row numbers of X_var and X_tot.
..$variate (type: character or numeric) The column name or column index of X_var to be addressed.
..$measure (type: character or numeric) A character ("relVAR" (relative variance) or "VAR" (variance)) or a numerical value between 0 and 1 indicates a gradation coefficient that balances between "relVar" (..$measure = 0) and "VAR" (..$measure = 1).
..$name (type: character) The name of the subpopulation (domain/area).
If opts$sense == "min_cost", elements are lists containing the following components which correspond to one specific cost target each:
..$stratum_id (type: numeric) A vector containing the indices of the strata considered for the current objective. The indices must coincide with the row numbers of X_cost.
..$c_type (type: character or numeric) The column name or column index of X_cost to be addressed.
..$name (type: character) The name of the subpopulation (domain/area).

listA

(type: list) A list of lists taking subpopulation- (domain/area) specific arguments. Elements are lists containing the following components which inturn correspond to one specific precision restriction:
..$stratum_id (type: numeric) A vector containing the indices of the strata considered for the current restriction. The indices must coincide with the corresponding row numbers of X_var and X_tot.
..$variate (type: character or numeric) The column name or column index of X_var to be addressed.
..$measure (type: character) "RSE" (relative standard error) or "VAR" (variance).
..$bound (type: numeric) An upper bound to "RSE" (or "VAR").
..$name (type: character) The name of the subpopulation (domain/area).

listC

(type: list) A list of lists taking subpopulation- (domain/area) specific arguments. Elements are lists containing the following components which correspond to one specific cost restriction:
..$stratum_id (type: numeric) A vector containing the indices of the strata considered for the current restriction. The indices must coincide with the row numbers of X_var and X_tot.
..$c_coef (type: numeric) A vector of length length(stratum_id) containing the stratum-specific cost components for the set of strata that is going to be bounded by above and/or below.
..$c_upper (type: numeric) The cost upper bound value. NULL if not present.
..$c_lower (type: numeric) The cost lower bound value. NULL if not present.
..$name (type: character) The name of the subpopulation (domain/area).

fpc

(type: logical) A TRUE or FALSE statement indicating whether the finite population correction should be considered.

l

(type: vector) A vector of lower box constraints.

u

(type: vector) A vector of upper box constraints.

opts

(type: list) The options used by the algorithms:
$sense (type: character) Sense of optimization (default = "max_precision", alternative "min_cost").
$df (type: function) The gradient of f (default = NULL).
$Hf (type: function) The Hesse matrix of f (default = NULL).
$method (type: character) A character indicating scalarization method (default = "WCM", alternative "WSS"), $f must be NULL.
$f (type: function) Decision functional over the objective vector (default = NULL). The weighted Chebyshev minimization (WCM) minimizes for weighted maximum objective component. The weighted sum scalarization (WSS) minimizes for the weighted sum. For either case the weights are given by $init_w.
$init_w (type: numeric or matrix) Preference weightings (default = 1; The weight for first objective component must be 1).
$mc_cores (type: integer) The number of cores for parallelizing multiple input weightings stacked rowwise (default = 1L).
$pm_tol (type: numeric) The tolerance for the projection method (default = 1e-5).
max_iters (type: integer) The maximum number of iterations (default = 100L).
$print_pm (type: logical) A TRUE or FALSE statement whether iterations of the projection method should be printed (default = FALSE).

ForceOptimality

(type: logical) A TRUE or FALSE statement indicating whether Pareto optimality should be ensured. Default is FALSE (for most practical problem formulations Pareto optimality is achieved by construction, e.g. in the case of one overall cost constraint). If TRUE, additional computation time is required. In this case, mosallocStepwiseFirst() is used internally.

X_cost

(type: matrix) A matrix containing stratum- (row) and type- (column) specific cost coefficients associated with fixed cost. Types of cost might be, e.g. '$ US', 'minutes', 'sample size', etc. Default is NULL. The argument is required in the case of cost minimization (opt$sense == "min_cost").

X_fixed

(type: matrix) A matrix containing stratum- (rows) and type- (columns) specific cost coefficients associated with fixed cost. Default is NULL. Used soley in the case of cost minimization (opt$sense == "min_cost").

Examples

Run this code
# Artificial population of 50 568 business establishments and 5 business
# sectors (data from Valliant, R., Dever, J. A., & Kreuter, F. (2013).
# Practical tools for designing and weighting survey samples. Springer.
# https://doi.org/10.1007/978-1-4614-6449-5, Example 5.2 pages 133-9)

# See also https://umd.app.box.com/s/9yvvibu4nz4q6rlw98ac/file/297813512360
# file: Code 5.3 constrOptim.example.R

Nh <- c(6221, 11738, 4333, 22809, 5467) # stratum sizes
ch <- c(120, 80, 80, 90, 150) # stratum-specific cost of surveying

# Revenues
mh.rev <- c(85, 11, 23, 17, 126) # mean revenue
Sh.rev <- c(170.0, 8.8, 23.0, 25.5, 315.0) # standard deviation revenue

# Employees
mh.emp <- c(511, 21, 70, 32, 157) # mean number of employees
Sh.emp <- c(255.50, 5.25, 35.00, 32.00, 471.00) # std. dev. employees

# Proportion of estabs claiming research credit
ph.rsch <- c(0.8, 0.2, 0.5, 0.3, 0.9)

# Proportion of estabs with offshore affiliates
ph.offsh <- c(0.06, 0.03, 0.03, 0.21, 0.77)

budget <- 300000 # overall available budget
n.min  <- 100 # minimum stratum-specific sample size

# Matrix containing stratum-specific variance components
X_var <- cbind(Sh.rev**2,
               Sh.emp**2,
               ph.rsch * (1 - ph.rsch) * Nh/(Nh - 1),
               ph.offsh * (1 - ph.offsh) * Nh/(Nh - 1))
colnames(X_var) <- c("rev", "emp", "rsch", "offsh")

# Matrix containing stratum-specific totals
X_tot <- cbind(mh.rev, mh.emp, ph.rsch, ph.offsh) * Nh
colnames(X_tot) <- c("rev", "emp", "rsch", "offsh")

# Examples
#----------------------------------------------------------------------------
# Example 1: Univariate minimization of the variation of estimates for
# revenue subject to cost restrictions and precision restrictions to the
# relative standard error of estimates for the proportion of businesses with
# offshore affiliates. Additionally, there is one overall cost constraint and
# at least half of the provided budget must be spend to strata 1 to 3.

# Specify objectives via listD
listD <- list(list(stratum_id = 1:5, variate = "rev", measure = "relVAR",
                   name = "pop"))

# Specify precision constraints via listA
listA <- list(list(stratum_id = 1:5, variate = "offsh", measure = "RSE",
                   bound = 0.05, name = "pop"))

# Specify cost constraints via listC
listC <- list(list(stratum_id = 1:5, c_coef = ch, c_lower = NULL,
                   c_upper = budget, name = "Overall"),
              list(stratum_id = 1:3, c_coef = ch[1:3],
                   c_lower = 0.5 * budget, c_upper = NULL, name = "1to3"))

# Specify stratum-specific box constraints
l <- rep(n.min, 5) # minimum sample size per stratum
u <- Nh            # maximum sample size per stratum

# Specify parameter for mosalloc (method = "WSS")
opts <- list(sense = "max_precision",
             f = NULL, df = NULL, Hf = NULL,
             method = "WSS", init_w = 1,
             mc_cores = 1L, pm_tol = 1e-05,
             max_iters = 100L, print_pm = FALSE)

# Run mosallocSTRS with weighted sum scalarization (WSS)
resWSS  <- mosallocSTRS(X_var, X_tot, Nh, listD, listA, listC,
                             fpc = TRUE, l, u, opts)

summary(resWSS)

# Specify parameter for mosalloc (method = "WCM")
opts = list(sense = "max_precision",
            f = NULL, df = NULL, Hf = NULL,
            method = "WCM", init_w = 1,
            mc_cores = 1L, pm_tol = 1e-05,
            max_iters = 100L, print_pm = FALSE)

# Run mosallocSTRS with weighted Chebyshec minimization (WCM)
resWCM  <- mosallocSTRS(X_var, X_tot, Nh, listD, listA, listC,
                             fpc = TRUE, l, u, opts)

summary(resWCM)

# The optimal sample sizes vector can also be obtained by
summary(resWCM)$n_opt

# Hint: For univariate allocation problems 'WSS' and 'WCM' are equivalent!

#----------------------------------------------------------------------------
# Example 2: Minimization of the maximum relative variation of estimates for
# the total revenue, the number of employee, the number of businesses claimed
# research credit and the number of businesses with offshore affiliates
# subject to one overall cost constraint and at least half of the provided
# budget must be spend to strata 1 to 3.

# Specify objectives via listD
listD <- list(list(stratum_id = 1:5, variate = "rev", measure = "relVAR",
                  name = "pop"),
             list(stratum_id = 1:5, variate = "emp", measure = "relVAR",
                  name = "pop"),
             list(stratum_id = 1:5, variate = "rsch", measure = "relVAR",
                  name = "pop"),
             list(stratum_id = 1:5, variate = "offsh", measure = "relVAR",
                  name = "pop"))

# Specify cost constraints via listC
listC <- list(list(stratum_id = 1:5, c_coef = ch, c_lower = NULL,
                   c_upper = budget, name = "Overall"),
              list(stratum_id = 1:3, c_coef = ch[1:3],
                   c_lower = 0.5 * budget, c_upper = NULL, name = "1to3"))

# Specify stratum-specific box constraints
l <- rep(n.min, 5) # minimum sample size per stratum
u <- Nh            # maximum sample size per stratum

# Specify parameter for mosalloc (method = "WSS")
opts = list(sense = "max_precision",
            f = NULL, df = NULL, Hf = NULL,
            method = "WSS", init_w = 1,
            mc_cores = 1L, pm_tol = 1e-05,
            max_iters = 100L, print_pm = FALSE)

# Run mosallocSTRS with weighted sum scalarization (WSS)
resWSS  <- mosallocSTRS(X_var, X_tot, Nh, listD, NULL, listC,
                             fpc = TRUE, l, u, opts)

summary(resWSS)

# Specify parameter for mosalloc (method = "WCM")
opts = list(sense = "max_precision",
            f = NULL, df = NULL, Hf = NULL,
            method = "WCM", init_w = 1,
            mc_cores = 1L, pm_tol = 1e-05,
            max_iters = 100L, print_pm = FALSE)

# Run mosallocSTRS with weighted Chebyshec minimization (WCM)
resWCM  <- mosallocSTRS(X_var, X_tot, Nh, listD, NULL, listC,
                             fpc = TRUE, l, u, opts)

summary(resWCM)

# Since the WCM does not necessarily lead to Pareto optimal allocations,
# we might force this via a internal stepwise procedure by setting 
# ForceOptimality = TRUE.

resWCM_FO  <- mosallocSTRS(X_var, X_tot, Nh, listD, NULL, listC,
                             fpc = TRUE, l, u, opts, ForceOptimality = TRUE)

summary(resWCM_FO)

#----------------------------------------------------------------------------
# Example 3: Example 2 with multiple sets of preference weightings.

# Define a set of preference weightings, e.g.
w_1 <- c(1, 1, 1, 1)
w_2 <- c(1, 2, 2, 1)
w_3 <- c(1, 5, 5, 1)

# Combine the weightings to a matrix stacked rowwise
w <- rbind(w_1, w_2, w_3)

# Specify parameter for mosalloc() (method = "WCM"; not yet possible with WSS)
opts = list(sense = "max_precision",
            f = NULL, df = NULL, Hf = NULL,
            method = "WCM", init_w = w,
            mc_cores = 1L, pm_tol = 1e-05,
            max_iters = 100L, print_pm = FALSE)

# Run mosallocSTRS with weighted Chebyshec minimization (WCM)
res  <- mosallocSTRS(X_var, X_tot, Nh, listD, NULL, listC, fpc = TRUE,
                       l, u, opts)

summary(res)

#----------------------------------------------------------------------------
# Example 4: Minimization of survey cost subject to quality restrictions on 
# subpopulation level.

X_cost <- matrix(ch, 5, 1, dimnames = list(1:5,"cost"))

# Specify cost objectives via listD
listD <- list(list(stratum_id = 1:5, c_type = "cost", name = "pop"))

# Specify quailty restrictions via listD. Here: 5 % relative standard error
listA <- list(list(stratum_id = 1:2, variate = "rev", measure = "RSE",
                  bound = 0.05, name = "S1-2"),
              list(stratum_id = 3:5, variate = "rev", measure = "RSE",
                  bound = 0.05, name = "S3-5"),
             list(stratum_id = 1:2, variate = "emp", measure = "RSE",
                  bound = 0.05, name = "S1-2"),
             list(stratum_id = 3:5, variate = "emp", measure = "RSE",
                  bound = 0.05, name = "S3-5"),
             list(stratum_id = 1:2, variate = "rsch", measure = "RSE",
                  bound = 0.05, name = "S1-2"),
             list(stratum_id = 3:5, variate = "rsch", measure = "RSE",
                  bound = 0.05, name = "S3-5"),
             list(stratum_id = 1:2, variate = "offsh", measure = "RSE",
                  bound = 0.05, name = "S1-2"),
             list(stratum_id = 3:5, variate = "offsh", measure = "RSE",
                  bound = 0.05, name = "S3-5"))

# Specify cost constraints
listC <- NULL

# Specify stratum-specific box constraints
l <- rep(n.min, 5) # minimum sample size per stratum
u <- Nh            # maximum sample size per stratum

# Specify parameters for mosalloc()
opts = list(sense = "min_cost",
            f = NULL, df = NULL, Hf = NULL,
            method = "WCM", init_w = 1,
            mc_cores = 1L, pm_tol = 1e-05,
            max_iters = 100L, print_pm = FALSE)

# Run mosallocSTRS()
res  <- mosallocSTRS(X_var, X_tot, Nh, listD, listA, NULL, fpc = TRUE,
                     l, u, opts, X_cost = X_cost)

summary(res)

# Optimal sample sizes
summary(res)$n_opt

Run the code above in your browser using DataLab