Learn R Programming

lsasim (version 2.1.6)

cluster_gen: Generate cluster sample

Description

Generate cluster sample

Usage

cluster_gen(
  n,
  N = 1,
  cluster_labels = NULL,
  resp_labels = NULL,
  cat_prop = NULL,
  n_X = NULL,
  n_W = NULL,
  c_mean = NULL,
  sigma = NULL,
  cor_matrix = NULL,
  separate_questionnaires = TRUE,
  collapse = "none",
  sum_pop = sapply(N, sum),
  calc_weights = TRUE,
  sampling_method = "mixed",
  rho = NULL,
  theta = FALSE,
  verbose = TRUE,
  print_pop_structure = verbose,
  ...
)

Value

list with background questionnaire data, grouped by level or not

Arguments

n

numeric vector or list with the number of sampled observations (clusters or subjects) on each level

N

population size of each sampled cluster element on each level. Either a numeric vector or a list of numeric vectors. If N is a list, it must have the same length as n and each element of N must have the same length as the corresponding element of n

cluster_labels

character vector with the names of each cluster level

resp_labels

character vector with the names of the questionnaire respondents on each level

cat_prop

list of cumulative proportions for each item. If theta = TRUE, the first element of cat_prop must be a scalar 1, which corresponds to the theta.

n_X

list of n_X per cluster level

n_W

list of n_W per cluster level

c_mean

vector of means for the continuous variables or list of vectors for the continuous variables for each level. Defaults to 0, but may change if rho is set.

sigma

vector of standard deviations for the continuous variables or list of vectors for the continuous variables for each level. Defaults to 1, but may change if rho is set.

cor_matrix

Correlation matrix between all variables (except weights). By default, correlations are randomly generated.

separate_questionnaires

if TRUE, each level will have its own questionnaire

collapse

if TRUE, function output contains only one data frame with all answers. It can also be "none", "partial" and "full" for finer control on 3+ levels

sum_pop

total population at each level (sampled or not)

calc_weights

if TRUE, sampling weights are calculated

sampling_method

can be "SRS" for Simple Random Sampling, "PPS" for Probabilities Proportional to Size, "mixed" to use PPS for schools and SRS otherwise, or a vector with the sampling method for each level

rho

intraclass correlation (scalar, vector or list, as appropriate)

theta

if TRUE, the first continuous variable will be labeled 'theta'. Otherwise, it will be labeled 'q1'.

verbose

if TRUE, prints output messages

print_pop_structure

if TRUE, prints the population hierarchical structure (as long as it differs from the sample structure)

...

Additional parameters to be passed to questionnaire_gen()

Details

This function relies heavily in two sub-functions---cluster_gen_separate and cluster_gen_together---which can be called independently. This does not make cluster_gen a simple wrapper function, as it performs several operations prior to calling its sub-functions, such as randomly generating n_X and n_W if they are not determined by user. n can have unitary length, in which case all clusters will have the same size. N is not the population size across all elements of a level, but the population size for each element of one level. Regarding the additional parameters to be passed to questionnaire_gen(), they can be passed either in the same format as questionnaire_gen() or as more complex objects that contain information for each cluster level.

See Also

cluster_gen_separate() cluster_gen_together() questionnaire_gen()

Examples

Run this code
# Simple structure of 3 schools with 5 students each
cluster_gen(c(3, 5))

# Complex structure of 2 schools with different number of students,
# sampling weights and custom number of questions
n <- list(3, c(20, 15, 25))
N <- list(5, c(200, 500, 400, 100, 100))
cluster_gen(n, N, n_X = 5, n_W = 2)

# Condensing the output
set.seed(0); cluster_gen(c(2, 4))
set.seed(0); cluster_gen(c(2, 4), collapse=TRUE) # same, but in one dataset

# Condensing the output: 3 levels
str(cluster_gen(c(2, 2, 1), collapse="none"))
str(cluster_gen(c(2, 2, 1), collapse="partial"))
str(cluster_gen(c(2, 2, 1), collapse="full"))

# Controlling the intra-class correlation and the grand mean
x <- cluster_gen(c(5, 1000), rho = .9, n_X = 2, n_W = 0, c_mean = 10)
sapply(1:5, function(s) mean(x$school[[s]]$q1))  # means per school != 10
mean(sapply(1:5, function(s) mean(x$school[[s]]$q1))) # closer to c_mean

# Making the intraclass variance explode by forcing "incompatible" rho and c_mean
x <- cluster_gen(c(5, 1000), rho = .5, n_X = 2, n_W = 0, c_mean = 1:5)
anova(x)

Run the code above in your browser using DataLab