gen_informative_sample: Generate a finite population and take an informative single or two-stage sample.

Description

Used to compare performance of sample design-weighted and unweighted estimation procedures.

Usage

gen_informative_sample(clustering = TRUE, two_stage = FALSE,
  theta = c(0.2, 0.7, 1), M = 3, theta_star = matrix(c(0.3, 0.3, 0.3,
  0.31, 0.72, 2.04, 0.58, 0.83, 1), 3, 3, byrow = TRUE), gp_type = "rq",
  N = 10000, T = 15, L = 10, R = 8, I = 4, n = 750,
  noise_to_signal = 0.05, incl_gradient = "medium")

Arguments

clustering

Boolean input on whether want population generated from clusters of covariance parameters. Defaults to clustering = FALSE

two_stage

Boolean input on whether want two stage sampling, with first stage defining set of L blocks, where membership in blocks determined by quantiles of observation unit variance functions. (They are structured like strata, though they are sub-s

theta

A numeric vector of global covariance parameters in the case of clustering = FALSE. The length, P, of theta must be consistent with the selected gp_type. Defaults to theta = c(0.30.7,1.0)

Scalar input denoting number of clusters to employ if clustering = TRUE. Defaults to M = 3

theta_star

An P x M matrix of cluster location values associated with the choice of M and the selected gp_type. Defaults to matrix(c(0.3,0.3,0.3,0.31,0.72,2.04,0.58,0.83,1.00),3,3,byrow=TRUE)).

gp_type

Input of choice for covariance matrix formulation to be used to generate the functions for the N population units. Choices are c("se","rq"), where "se" denotes the squared exponential covariance function and

A scalar input denoting the number of population units (or establishments).

A scalar input denoting the number of time points in each of N, T x 1 functions that contribute to the N x T population data matrix, y. Defaults to T = 15.

A scalar input that denotes the number of blocks in which to assign the population units to be sub-sampled in the first stage of sampling. Defaults to L = 10.

A scalar input that denotes the number of blocks to sample from L = 10 with probability proportional to the average variance of member functions in each block.

A scalar input denoting the number of strata to form within each block. Population units are divided into equally-sized strata based on variance quantiles. Defaults to I = 4.

Sample size to be generated. Both an informative sample under either single (two_stage = FALSE) or 2-stage (two_stage = TRUE) sample is taken, along with a non-informative, iid sample of the same size (n)

incl_gradient

A character input on whether stratum probabilities from lowest-to-highest is to "high", in which case they are proportional to the exponential of the cluster number. If set to "medium" , the inclusion probabilities are proport

noise_to_signal

A numeric input in the interval, (0,1), denoting the ratio of noise variance to the average variance of the generated functions, bb_i. Defaults to noise_to_signal = 0.05

Value

A list object named dat_sim containing objects related to the generated sample finite population, the informative sample and the non-informative, iid, sample. Some important objects, include:
HA vector of length N, the population size, with cluster assignments for each establishment (unit) in 1,..M clusters.
map.totA data.frame object including unit label identifiers (under establishment), the cluster assignment (if clustering = TRUE), the block (iftwo_stage = TRUE) and stratum assignments and the sample inclusion probabilities.
map.obsA data.frame object configured the same as map.tot, only confined to those establishments/units selected into the informative sample of size n.
map.iidA data.frame object configured the same as map.tot, only confined to those establishments/units selected into the non-informative, iid sample of size n.
(y,bb)N x T matrix objects containing data responses and de-noised ' functions, respectively, for each of the N population units. The order of the N units is consistent with map.
(y_obs,bb_obs)N x T matrix objects containing observed responses and de-noised ' functions, respectively, for each of the n units sampled under an informative sampling design. The order of the n units is consistent with map_obs.
(y_iid,bb_iid)N x T matrix objects containing observed responses and de-noised ' functions, respectively, for each of the n units sampled under a non-informative / iid sampling design. The order of the n units is consistent with map_iid.

Examples

Run this code

library(growfunctions)
## use gen_informative_sample() to generate an
## N X T population drawn from a dependent GP
## By default, 3 clusters are used to generate
## the population.
## A single stage stratified random sample of size n
## is drawn from the population using I = 4 strata.
## The resulting sample is informative in that the
## distribution for this sample is
## different from the population from which
## it was drawn because the strata inclusion
## probabilities are proportional to a feature
## of the response, y (in the case, the variance.
## The stratified random sample over-samples
## large variance strata).
## (The user may also select a 2-stage
## sample with the first stage
## sampling "blocks" of the population and
## the second stage sampling strata within blocks).
dat_sim        <- gen_informative_sample(N = 10000,
                                n = 500, T = 10,
                                noise_to_signal = 0.1)

## extract n x T observed sample under informative
## stratified sampling design.
y_obs                       <- dat_sim$y_obs
T                           <- ncol(y_obs)

Run the code above in your browser using DataLab

Description

Usage

Arguments

Value

See Also

Examples