define_linearization_wrapper: Define a linearization wrapper

Description

Given a linearization function (specific to an estimator), define_linearization_wrapper defines a linearization wrapper to be used together with variance estimation wrappers in order to make variance estimation easier. This function is intended for advanced use only (see Details), standard linearization wrappers are included in the gustave package (see standard linearization wrappers)

Usage

define_linearization_wrapper(linearization_function, arg_type,
  allow_factor = FALSE, arg_not_affected_by_domain = NULL,
  display_function = standard_display)

Arguments

linearization_function

An R function with input the quantities used in the linearization formula and with output a list with two named element:

metadata: a list of metadata to be used by the display function (see display_function argument), including (for the standard display function) est for the point-estimate and n for the number of observations used in the estimation.

arg_type

A named list with three character vectors describing the type of each argument of linearization_function:

data: data argument(s), numerical vector(s) to be used in the linearization formula
weight: weight argument, numerical vector to be used as row weights in the linearization formula
param: parameters, non-data arguments (most of the time boolean) to be used to control some aspect of the linearization formula

allow_factor

A logical vector of length 1 (FALSE by default) indicating whether factor variable are accepted as-is by the linearization wrappers. This should be the case when the linearization function only has one data argument (e.g. total or mean linearization formulae).

arg_not_affected_by_domain

A character vector indicating the (data) arguments which should not be affected by domain-splitting. Such parameters may appear in some complex linearization formula, for instance when the At-Risk of Poverty Rate (ARPR) is estimated by region but with a poverty line calculated at the national level.

display_function

An R function which produces, for each variance estimation, the data.frame row to be displayed by the variance estimation wrapper. It uses three arguments:

metadata the metadata associated with the estimation, especially the one outputted by linearization_function (e.g. est, n)
alpha the level for the construction of confidence intervals (at execution time, its value is taken from the alpha argument of the variance wrapper.)

The default display function (standard_display) uses standard metadata to display usual variance indicator (variance, standard deviation, coefficient of variation, confidence interval) broken down by linearization wrapper, domain (if any) and level (if the variable is a factor, see argument allow_factor).

Value

A function to be used within a variance estimation wrapper to perform a specific linearization (see examples). Its formals are the ones of linearization_function with the addition of by and where (for domain estimation, set to NULL by default).

Details

When the estimator is not the estimator of a total, the application of analytical variance estimation formulae developed for the estimator of a total is not straightforward (Deville, 1999). An asymptotically unbiased variance estimator can nonetheless be obtained if the estimation of variance is performed on a variable obtained from the original data through a linearization step.

define_linearization_wrapper is the function used to create, given a linearization function implementing a given linearization formula, a linearization wrapper which can be used together with a variance wrapper.

Linearization wrappers are quite flexible tools to apply a variance function to an estimator requiring a linearization step (e.g. all estimators except the estimator of a total) with virtually no additional complexity for the end-user. To some extent, linearization wrappers can be seen as ggplot2 geom_ and stat_ functions: they help the end-user in writing down what he or she wants without having to go too deep into the details of the corresponding layers.

standard linearization wrappers are included within the gustave package and automatically added to the variance estimation wrappers. New linearization wrappers can be defined using the define_linearization_wrapper and then explicitly added to the variance estimation wrappers using the objects_to_include argument.

References

Deville J.-C. (1999), "Variance estimation for complex statistics and estimators: linearization and residual techniques", Survey Methodology, 25:193<U+2013>203

Examples

Run this code

# NOT RUN {
### Example from the Information and communication technologies (ICT) survey

# The subset of the (simulated) ICT survey has the following features: 
# - stratified one-stage sampling design of 650 firms;
# - 612 responding firms, non-response correction through reweighting 
# in homogeneous response groups based on economic sub-sector and turnover;
# - calibration on margins (number of firms and turnover broken down
# by economic sub-sector).

# Step 1 : Dummy variance wrapper
# Note : see define_variance_wrapper() for a more 
# realistic variance function and examples.
variance_wrapper <- define_variance_wrapper(
  variance_function = function(y) abs(colSums(y)), 
  reference_id = ict_survey$firm_id, 
  default = list(id = "firm_id", weight = "w_calib")
)
variance_wrapper(ict_survey, total(speed_quanti))

# Step 2 : Redefine the mean linearization wrapper
# The mean() linearization wrapper defined in the gustave 
# package is bulit on top of the ratio() linearization wrapper.
variance_wrapper(ict_survey, mean(speed_quanti))

# Let's redefine it directly from the formula found for instance
# in (Caron, Deville, Sautory, 1998) and without handling NA
# values
mean2 <- define_linearization_wrapper(
  linearization_function = function(y, weight){
    est <- sum(y * weight) / sum(weight)
    lin <- (y - est) / sum(weight)
    list(
      lin = list(lin), 
      metadata = list(est = est, n = length(y))
    )
  },
  arg_type = list(data = "y", weight = "weight"),
  allow_factor = TRUE
)
variance_wrapper(ict_survey, mean(speed_quanti), mean2(speed_quanti))

# }

Run the code above in your browser using DataLab