Learn R Programming

growfunctions (version 0.1)

gen_informative_sample: Generate a finite population and take an informative single or two-stage sample.

Description

Used to compare performance of sample design-weighted and unweighted estimation procedures.

Usage

gen_informative_sample(clustering = TRUE, two_stage = FALSE,
  theta = c(0.2, 0.7, 1), M = 3, theta_star = matrix(c(0.3, 0.3, 0.3,
  0.31, 0.72, 2.04, 0.58, 0.83, 1), 3, 3, byrow = TRUE), gp_type = "rq",
  N = 10000, T = 15, L = 10, R = 8, I = 4, n = 750,
  noise_to_signal = 0.05, incl_gradient = "medium")

Arguments

clustering
Boolean input on whether want population generated from clusters of covariance parameters. Defaults to clustering = FALSE
two_stage
Boolean input on whether want two stage sampling, with first stage defining set of L blocks, where membership in blocks determined by quantiles of observation unit variance functions. (They are structured like strata, though they are sub-s
theta
A numeric vector of global covariance parameters in the case of clustering = FALSE. The length, P, of theta must be consistent with the selected gp_type. Defaults to theta = c(0.30.7,1.0)
M
Scalar input denoting number of clusters to employ if clustering = TRUE. Defaults to M = 3
theta_star
An P x M matrix of cluster location values associated with the choice of M and the selected gp_type. Defaults to matrix(c(0.3,0.3,0.3,0.31,0.72,2.04,0.58,0.83,1.00),3,3,byrow=TRUE)).
gp_type
Input of choice for covariance matrix formulation to be used to generate the functions for the N population units. Choices are c("se","rq"), where "se" denotes the squared exponential covariance function and
N
A scalar input denoting the number of population units (or establishments).
T
A scalar input denoting the number of time points in each of N, T x 1 functions that contribute to the N x T population data matrix, y. Defaults to T = 15.
L
A scalar input that denotes the number of blocks in which to assign the population units to be sub-sampled in the first stage of sampling. Defaults to L = 10.
R
A scalar input that denotes the number of blocks to sample from L = 10 with probability proportional to the average variance of member functions in each block.
I
A scalar input denoting the number of strata to form within each block. Population units are divided into equally-sized strata based on variance quantiles. Defaults to I = 4.
n
Sample size to be generated. Both an informative sample under either single (two_stage = FALSE) or 2-stage (two_stage = TRUE) sample is taken, along with a non-informative, iid sample of the same size (n)
incl_gradient
A character input on whether stratum probabilities from lowest-to-highest is to "high", in which case they are proportional to the exponential of the cluster number. If set to "medium" , the inclusion probabilities are proport
noise_to_signal
A numeric input in the interval, (0,1), denoting the ratio of noise variance to the average variance of the generated functions, bb_i. Defaults to noise_to_signal = 0.05

Value

  • A list object named dat_sim containing objects related to the generated sample finite population, the informative sample and the non-informative, iid, sample. Some important objects, include:
  • HA vector of length N, the population size, with cluster assignments for each establishment (unit) in 1,..M clusters.
  • map.totA data.frame object including unit label identifiers (under establishment), the cluster assignment (if clustering = TRUE), the block (iftwo_stage = TRUE) and stratum assignments and the sample inclusion probabilities.
  • map.obsA data.frame object configured the same as map.tot, only confined to those establishments/units selected into the informative sample of size n.
  • map.iidA data.frame object configured the same as map.tot, only confined to those establishments/units selected into the non-informative, iid sample of size n.
  • (y,bb)N x T matrix objects containing data responses and de-noised ' functions, respectively, for each of the N population units. The order of the N units is consistent with map.
  • (y_obs,bb_obs)N x T matrix objects containing observed responses and de-noised ' functions, respectively, for each of the n units sampled under an informative sampling design. The order of the n units is consistent with map_obs.
  • (y_iid,bb_iid)N x T matrix objects containing observed responses and de-noised ' functions, respectively, for each of the n units sampled under a non-informative / iid sampling design. The order of the n units is consistent with map_iid.

See Also

gpdpgrow, gmrfdpgrow

Examples

Run this code
library(growfunctions)
## use gen_informative_sample() to generate an
## N X T population drawn from a dependent GP
## By default, 3 clusters are used to generate
## the population.
## A single stage stratified random sample of size n
## is drawn from the population using I = 4 strata.
## The resulting sample is informative in that the
## distribution for this sample is
## different from the population from which
## it was drawn because the strata inclusion
## probabilities are proportional to a feature
## of the response, y (in the case, the variance.
## The stratified random sample over-samples
## large variance strata).
## (The user may also select a 2-stage
## sample with the first stage
## sampling "blocks" of the population and
## the second stage sampling strata within blocks).
dat_sim        <- gen_informative_sample(N = 10000,
                                n = 500, T = 10,
                                noise_to_signal = 0.1)

## extract n x T observed sample under informative
## stratified sampling design.
y_obs                       <- dat_sim$y_obs
T                           <- ncol(y_obs)

Run the code above in your browser using DataLab