Learn R Programming

gustave (version 0.3.0)

define_variance_wrapper: Define a variance estimation wrapper

Description

Given a variance estimation function (specific to a survey), define_variance_wrapper defines a variance estimation wrapper easier to use (e.g. automatic domain estimation, linearization).

Usage

define_variance_wrapper(variance_function, reference_id, default = list(stat =
  "total", alpha = 0.05), objects_to_include = NULL,
  objects_to_include_from = parent.frame())

Arguments

variance_function

An R function, with input a data matrix and possibly other arguments (e.g. parameters affecting the estimation of variance), and output a numeric vector of estimated variances (or a list whose first element is a numeric vector of estimated variances).

reference_id

A vector containing the ids of all the responding units of the survey. It is compared with default$id to check whether some observations are missing in the survey file. Observations are reordered according to reference_id.

default

a named list specifying the default values for:

  • id: the name of the default identifying variable in the survey file. It can also be an unevaluated expression (enclosed in substitute()) to be evaluated within the survey file.

  • weight: the name of the default weight variable in the survey file. It can also be an unevaluated expression (enclosed in substitute()) to be evaluated within the survey file.

  • stat: the name of the default statistic to compute when none is specified. It is set to "total" by default.

  • alpha: the default threshold for confidence interval derivation. It is set to 0.05 by default.

objects_to_include

A character vector indicating the name of additional R objects to include within the variance wrapper. These objects are to be used to carry out the variance estimation.

objects_to_include_from

The environment to which the additional R objects belong.

Value

An R function that makes the estimation of variance based on the provided variance function easier. Its parameters are:

  • data: the survey data where the interest variables are stored

  • ...: one or more calls to a linearization wrapper (see examples and standard linearization wrappers)

  • where: a logical vector indicating a domain on which the variance estimation is conducted

  • by: a qualitative variable whose levels are used to define domains on which the variance estimation is conducted

  • stat: a character vector of size 1 indicating the linearization wrapper to use when none is specified. Its default value depends on the value of default_stat in define_variance_wrapper

  • alpha: a numeric vector of size 1 indicating the threshold for confidence interval derivation. Its default value depends on the value of default_alpha in define_variance_wrapper

  • id: a character vector of size 1 containing the name of the identifying variable in the survey file. It can also be an unevaluated expression (using substitute()) to be evaluated within the survey file. Its default value depends on the value of default_id in define_variance_wrapper

  • envir: an environment containing a binding to data

Details

Defining variance estimation wrappers is the key feature of the gustave package.

Analytical variance estimation is often difficult to carry out by non-specialists owing to the complexity of the underlying sampling and estimation methodology. This complexity yields complex variance estimation functions which are most often only used by the sampling expert who actually wrote them. A variance estimation wrapper is an intermediate function that is "wrapped around" the (complex) variance estimation function in order to provide the non-specialist with user-friendly features:

  • checks for consistency between the provided dataset and the survey characteristics

  • factor discretization

  • domain estimation

  • linearization of complex statistics (see standard linearization wrappers)

define_variance_wrapper allows the sampling expert to define a variance estimation wrapper around a given variance estimation function and set its default parameters. The produced variance estimation wrapper will be stand-alone in the sense that it can contain additional data which would objects_to_include and objects_to_include_from parameters).

See Also

standard linearization wrappers varDT

Examples

Run this code
# NOT RUN {
### Example from the Information and communication technologies (ICT) survey

# The subset of the (simulated) ICT survey has the following features: 
# - stratified one-stage sampling design of 650 firms;
# - 612 responding firms, non-response correction through reweighting 
# in homogeneous response groups based on economic sub-sector and turnover;
# - calibration on margins (number of firms and turnover broken down
# by economic sub-sector).

# Step 1 : Definition of a variance function

variance_function <- function(y){
  
  # Calibration
  y <- rescal(y, x = x)
  
  # Non-response
  y <- add0(y, rownames = ict_sample$firm_id)
  var_nr <- var_pois(y, pik = ict_sample$response_prob_est, w = ict_sample$w_sample)
  
  # Sampling
  y <- y / ict_sample$response_prob_est
  var_sampling <- var_srs(y, pik = 1 / ict_sample$w_sample, strata = ict_sample$division)
  
  var_sampling + var_nr
  
}

# With x the calibration variables matrix
x <- as.matrix(ict_survey[
  order(ict_survey$firm_id), 
  c(paste0("N_", 58:63), paste0("turnover_", 58:63))
])

# Test of the variance function
y <- as.matrix(ict_survey$speed_quanti)
rownames(y) <- ict_survey$firm_id
variance_function(y)

# Step 2 : Definition of a variance wrapper

variance_wrapper <- define_variance_wrapper(
  variance_function = variance_function,
  reference_id = ict_survey$firm_id,
  default = list(id = "firm_id", weight = "w_calib"),
  objects_to_include = c("x", "ict_sample")
)

# The objects "x" and "ict_sample" are embedded
# within the function variance_wrapper
ls(environment(variance_wrapper))
# Note : variance_wrapper is a closure
# (http://adv-r.had.co.nz/Functional-programming.html#closures)
# As a consequence, the variance wrapper will work even if 
# x is removed from globalenv()
rm(x)

# Step 3 : Features of the variance wrapper

# Better display of results
variance_wrapper(ict_survey, speed_quanti)

# Mean linearization
variance_wrapper(ict_survey, mean(speed_quanti))
# Ratio linearization
variance_wrapper(ict_survey, ratio(turnover, employees))

# Discretization of qualitative variables
variance_wrapper(ict_survey, speed_quali)
# On-the-fly recoding
variance_wrapper(ict_survey, speed_quali == "Between 2 and 10 Mbs")

# 1-domain estimation
variance_wrapper(ict_survey, speed_quanti, where = division == "58")
# Multiple domains estimation
variance_wrapper(ict_survey, speed_quanti, by = division)

# Multiple variables at a time
variance_wrapper(ict_survey, speed_quanti, big_data)
variance_wrapper(ict_survey, speed_quanti, mean(big_data))
# Flexible syntax for where and by arguments
# (similar to the aes() function in ggplot2)
variance_wrapper(ict_survey, where = division == "58", 
  mean(speed_quanti), mean(big_data * 100)
)
variance_wrapper(ict_survey, where = division == "58", 
  mean(speed_quanti), mean(big_data * 100, where = division == "61")
)
variance_wrapper(ict_survey, where = division == "58", 
  mean(speed_quanti), mean(big_data * 100, where = NULL)
)

# }

Run the code above in your browser using DataLab