dca: Domain-Controlled Allocation (DCA) Algorithm

Description

Functions implementing the Domain-Controlled Allocation (DCA) algorithm described in Wesolowski;textualstratallo and WojciakPhD;textualstratallo. The algorithm solves the following optimum allocation problem, formulated in mathematical optimization terms:

Minimize $$f(T,\, \boldsymbol x) = T$$ over $\mathbb R \times \mathbb R_+^{\lvert \mathcal H \rvert}$, subject to $$\sum_{(d,h) \in \mathcal H} x_{d,h} = n,$$ $$\sum_{h \in \mathcal H_d} (\frac{1}{x_{d,h}} - \frac{1}{N_{d,h}}) \frac{N_{d,h}^2 S_{d,h}^2}{\rho_d^2} = T, \qquad d \in \mathcal D,$$ where:

$(T,\, \boldsymbol x) = (T,\, (x_{d,h},\, (d,h) \in \mathcal H))$: the optimization variable,
$\mathcal H \subset \mathbb N^2$: the set of domain-stratum indices,
$\mathcal D := \{d \in \mathbb N \colon\; \exists h,\, (d,h) \in \mathcal H\}$: the set of domain indices,
$\mathcal H_d := \{h \in \mathbb N \colon\; (d,h) \in \mathcal H\}$: the set of strata indices in domain $d$,
$N_{d,h} > 0$: size of stratum $(d,h)$,
$S_{d,h} > 0$: standard deviation of the study variable in stratum $(d,h)$,
$\rho_d := t_d\, \sqrt{\kappa_d}$: where $t_d$ denotes the total in domain $d$, i.e., the sum of the values of the study variable for population elements in domain $d$, and $\kappa_d$ is a priority weight for domain $d$,
$n \in (0,\, \sum_{(d,h) \in \mathcal H} N_{d,h}]$: total sample size.

Usage

dca0(n, H_counts, N, S, rho, rho2, details = FALSE)
dca(n, H_counts, N, S, rho, rho2, U = NULL, details = FALSE)
dca_nmax(H_counts, N, S)

Value

If details = FALSE, the optimal $\boldsymbol x^*$ is returned. Otherwise, a list is returned containing the optimal $\boldsymbol x^*$

(element named x) along with other internal details of this algorithm. In particular, the lambda element of the list corresponds to the optimal $T^*$.

Arguments

n

(integerish(1))
total sample size $n$. Must satisfy 0 < n <= sum(N).

H_counts

(integerish)
strata counts in each domain.

N

(integerish)
strata sizes $(N_{d,h},\, (d,h) \in \mathcal H)$.

S

(numeric)
standard deviations $(S_{d,h},\, (d,h) \in \mathcal H)$ of surveyed variable in strata.

rho

(numeric)
parameters $(\rho_d,\, d \in \mathcal D)$ of the optimization problem.

rho2

(numeric)
the square of rho (rho^2), provided to reduce potential loss of precision due to finite-precision arithmetic.

details

(logical(1))
whether to produce detailed debug output.

U

(integerish or NULL)
a vector of indices identifying the take-max strata, i.e., the strata $(d,h)$ for which the allocation is fixed to $x_{d,h} = N_{d,h}$. The indices refer to the positions of strata in the set $\mathcal H$, in the same order as in the input vectors (N, S, etc.).

For example, if $\mathcal H = \{(1,1),\, (2,1)\}$ and stratum $(2,1)$ is a take-max stratum, then U = 2.

If U contains all strata from a domain, the dimension of the D matrix is reduced accordingly.

U must satisfy one of the following conditions:

n > sum(N[U]),
n = sum(N[U]) and n = sum(N).

Functions

dca0(): Domain-Controlled Allocation algorithm by Wesolowski;textualstratallo
dca(): Domain-Controlled Allocation algorithm by Wesolowski;textualstratallo, optionally using a set of take-max strata as described in WojciakPhD;textualstratallo.
dca_nmax(): Computes the maximum total sample size $n_{max}$ such that the optimization problem solved by the Domain-Controlled Allocation (DCA) algorithm admits a strictly positive optimal value $T^*$.

Details

For $n \in (0,\, n_{max})$, the optimal value satisfies $T^* > 0$, where $$ n_{max} := \sum_{d \in \mathcal D} \frac{\bigl( \sum_{h \in \mathcal H_d} N_{d,h} S_{d,h} \bigr)^2}{\sum_{h \in \mathcal H_d} N_{d,h} S_{d,h}^2}. $$ See Proposition 2.1 in Wesolowski;textualstratallo or WojciakPhD;textualstratallo for details. The value $n_{max}$ is less than or equal to sum(N) and can be computed with dca_nmax().

References

WojciakPhDstratallo

Wesolowskistratallo

WJWR2017stratallo

Examples

Run this code

# Two domains with 1 and 3 strata, respectively,
# that is, H = {(1,1), (2,1), (2,2), (2,3)}.
H_counts <- c(1, 3)
N <- c(140, 110, 135, 190) # (N_{1,1}, N_{2,1}, N_{2,2}, N_{2,3})
S <- sqrt(c(180, 20, 5, 4)) # (S_{1,1}, S_{2,1}, S_{2,2}, S_{2,3})
total <- c(2, 3)
kappa <- c(0.4, 0.6)
rho <- total * sqrt(kappa) # (rho_1, rho_2)
rho2 <- total^2 * kappa
sum(N) # 575
n_max <- dca_nmax(H_counts, N, S) # 519.0416

n <- floor(n_max) - 1

dca0(n, H_counts, N, S, rho, rho2)
x0 <- dca0(n, H_counts, N, S, rho, rho2, details = TRUE)
x0$x
x0$lambda
x0$k
x0$v
x0$s

n <- ceiling(n_max) + 1
x0 <- dca0(n, H_counts, N, S, rho, rho2, details = TRUE)
x0$x
x0$lambda

n <- floor(n_max) - 1

x1 <- dca(n, H_counts, N, S, rho, rho2, details = TRUE)
x1$x
x1$x_Uc
x1$lambda
x1$s

dca(n, H_counts, N, S, rho, rho2, U = 1)
x2 <- dca(n, H_counts, N, S, rho, rho2, U = 1, details = TRUE)
x2$x
x2$x_Uc
x2$lambda
x2$s

Run the code above in your browser using DataLab