Learn R Programming

⚠️There's a newer version (1.0.0) of this package.Take me there.

gustave

Gustave (Gustave: a User-oriented Statistical Toolkit for Analytical Variance Estimation) is an R package that provides a toolkit for analytical variance estimation in survey sampling. Apart from the implementation of standard variance estimators (Sen-Yates-Grundy, Deville-Tillé), its main feature is to help the methodologist produce easy-to-use variance estimation "wrappers", where systematic operations (linearization, domain estimation) are handled in a consistent and transparent way for the end user.

Install

As gustave is not yet available on CRAN, for now the simplest way to install it on your computer is to use devtools::install_github():

install.packages("devtools")
devtools::install_github("martinchevalier/gustave")

Example

In this example, we define a variance estimation wrapper adapted to the example data inspired by the Information and communication technology (ICT) survey. The subset of the (simulated) ICT survey has the following features:

  • stratified one-stage sampling design of 650 firms;
  • 612 responding firms, non-response correction through reweighting in homogeneous response groups based on economic sub-sector and turnover;
  • calibration on margins (number of firms and turnover broken down by economic sub-sector).

Step 1: Define the variance function

In this context, the variance estimation function can be defined as follows:

# Definition of the variance function
variance_function <- function(y){
  
  # Calibration
  y <- rescal(y, x = x)
  
  # Non-response
  y <- add0(y, rownames = ict_sample$firm_id)
  var_nr <- var_pois(y, pik = ict_sample$response_prob_est, w = ict_sample$w_sample)

  # Sampling
  y <- y / ict_sample$response_prob_est
  var_sampling <- var_srs(y, pik = 1 / ict_sample$w_sample, strata = ict_sample$division)

  var_sampling + var_nr
  
}

# With x the calibration variables matrix
library(gustave)
x <- as.matrix(ict_survey[
  order(ict_survey$firm_id),
  c(paste0("N_", 58:63), paste0("turnover_", 58:63))
])

# Test of the variance function
y <- as.matrix(ict_survey$speed_quanti)
rownames(y) <- ict_survey$firm_id
variance_function(y)

Step 2: Define the variance wrapper

The next step is the definition of a variance wrapper, which is easier to use than the variance function:

variance_wrapper <- define_variance_wrapper(
  variance_function = variance_function,
  reference_id = ict_survey$firm_id,
  default = list(id = "firm_id", weight = "w_calib"),
  objects_to_include = c("x", "ict_sample")
)

Note The objects x and ict_sample are embedded within the function variance_wrapper() (variance_wrapper is a closure)

Step 3: Features of the variance wrapper

# Better display of results
variance_wrapper(ict_survey, speed_quanti)

# Mean linearization
variance_wrapper(ict_survey, mean(speed_quanti))
# Ratio linearization
variance_wrapper(ict_survey, ratio(turnover, employees))

# Discretization of qualitative variables
variance_wrapper(ict_survey, speed_quali)
# On-the-fly recoding
variance_wrapper(ict_survey, speed_quali == "Between 2 and 10 Mbs")

# 1-domain estimation
variance_wrapper(ict_survey, speed_quanti, where = division == "58")
# Multiple domains estimation
variance_wrapper(ict_survey, speed_quanti, by = division)

# Multiple variables at a time
variance_wrapper(ict_survey, speed_quanti, big_data)
variance_wrapper(ict_survey, speed_quanti, mean(big_data))
# Flexible syntax for where and by arguments
# (similar to the aes() function in ggplot2)
variance_wrapper(ict_survey, where = division == "58", 
 mean(speed_quanti), mean(big_data * 100)
)
variance_wrapper(ict_survey, where = division == "58", 
 mean(speed_quanti), mean(big_data * 100, where = division == "61")
)
variance_wrapper(ict_survey, where = division == "58", 
 mean(speed_quanti), mean(big_data * 100, where = NULL)
)

Copy Link

Version

Install

install.packages('gustave')

Monthly Downloads

344

Version

0.3.0

License

GPL-3

Issues

Pull Requests

Stars

Forks

Maintainer

Martin Chevalier

Last Published

June 6th, 2018

Functions in gustave (0.3.0)

ict_pop

Sampling frame of the Information and communication technologies (ICT) survey in the information and communication sector (NACE rev 2 J section)
sumby

Efficient by-group (weighted) summation
varDT

Variance approximation with Deville-Till<U+00E9> (2005) formula
define_variance_wrapper

Define a variance estimation wrapper
varSYG

Sen-Yates-Grundy variance estimator
var_pois

Variance estimator for a Poisson sampling design
ict_sample

Sample of the Information and communication technologies (ICT) survey in the information and communication sector (NACE rev 2 J section)
linearization_wrapper_standard

Standard linearization wrappers
rescal

Linear Regression Residuals Calculation
ict_survey

Survey data of the Information and communication technologies (ICT) survey in the information and communication sector (NACE rev 2 J section)
add0

Expand a matrix or a data.frame with zeros based on rownames matching
define_linearization_wrapper

Define a linearization wrapper