vardomh: Variance estimation for sample surveys in domain for one or two stage surveys by the ultimate cluster method

Description

Computes the variance estimation in domain for ID_level1.

Usage

vardomh(
  Y,
  H,
  PSU,
  w_final,
  ID_level1,
  ID_level2,
  Dom = NULL,
  period = NULL,
  N_h = NULL,
  PSU_sort = NULL,
  fh_zero = FALSE,
  PSU_level = TRUE,
  Z = NULL,
  dataset = NULL,
  X = NULL,
  periodX = NULL,
  X_ID_level1 = NULL,
  ind_gr = NULL,
  g = NULL,
  q = NULL,
  datasetX = NULL,
  confidence = 0.95,
  percentratio = 1,
  outp_lin = FALSE,
  outp_res = FALSE
)

Value

A list with objects are returned by the function:

lin_out A data.table containing the linearized values of the ratio estimator with ID_level2 and PSU.
res_out A data.table containing the estimated residuals of calibration with ID_level1 and PSU.
betas A numeric data.table containing the estimated coefficients of calibration.
all_result A data.table, which containing variables: variable - names of variables of interest,
Dom - optional variable of the population domains,
period - optional variable of the survey periods,
respondent_count - the count of respondents,
pop_size - the estimated size of population,
n_nonzero - the count of respondents, who answers are larger than zero,
estim - the estimated value,
var - the estimated variance,
se - the estimated standard error,
rse - the estimated relative standard error (coefficient of variation),
cv - the estimated relative standard error (coefficient of variation) in percentage,
absolute_margin_of_error - the estimated absolute margin of error,
relative_margin_of_error - the estimated relative margin of error in percentage,
CI_lower - the estimated confidence interval lower bound,
CI_upper - the estimated confidence interval upper bound,
confidence_level - the positive value for confidence interval,
S2_y_HT - the estimated variance of the y variable in case of total or the estimated variance of the linearised variable in case of the ratio of two totals using non-calibrated weights,
S2_y_ca - the estimated variance of the y variable in case of total or the estimated variance of the linearised variable in case of the ratio of two totals using calibrated weights,
S2_res - the estimated variance of the regression residuals,
S2_res - the estimated variance of the regression residuals,
var_srs_HT - the estimated variance of the HT estimator under SRS for household,
var_cur_HT - the estimated variance of the HT estimator under current design for household,
var_srs_ca - the estimated variance of the calibrated estimator under SRS for household,
deff_sam - the estimated design effect of sample design for household,
deff_est - the estimated design effect of estimator for household,
deff - the overall estimated design effect of sample design and estimator for household

Arguments

Y: Variables of interest. Object convertible to data.table or variable names as character, column numbers.
H: The unit stratum variable. One dimensional object convertible to one-column data.table or variable name as character, column number.
PSU: Primary sampling unit variable. One dimensional object convertible to one-column data.table or variable name as character, column number.
w_final: Weight variable. One dimensional object convertible to one-column data.table or variable name as character, column number.
ID_level1: Variable for level1 ID codes. One dimensional object convertible to one-column data.table or variable name as character, column number.
ID_level2: Variable for unit ID codes. One dimensional object convertible to one-column data.table or variable name as character, column number.
Dom: Optional variables used to define population domains. If supplied, values are calculated for each domain. An object convertible to data.table or variable names as character vector, column numbers.
period: Optional variable for the survey periods. If supplied, the values for each period are computed independently. Object convertible to data.table or variable names as character, column numbers.
N_h: Number of primary sampling units in population for each stratum (and period if period is not NULL). If N_h = NULL and fh_zero = FALSE (default), N_h is estimated from sample data as sum of weights (w_final) in each stratum (and period if period is not NULL) Optional for single-stage sampling design as it will be estimated from sample data. Recommended for multi-stage sampling design as N_h can not be correctly estimated from the sample data in this case. If N_h is not used in case of multi-stage sampling design (for example, because this information is not available), it is advisable to set fh_zero = TRUE. If period is NULL. A two-column data object convertible to data.table with rows for each stratum. The first column should contain stratum code. The second column - the number of primary sampling units in the population of each stratum. If period is not NULL. A three-column data object convertible to data.table with rows for each intersection of strata and period. The first column should contain period. The second column should contain stratum code. The third column - the number of primary sampling units in the population of each stratum and period.
PSU_sort: optional; if PSU_sort is defined, then variance is calculated for systematic sample.
fh_zero: by default FALSE; fh is calculated as division of n_h and N_h in each strata, if TRUE, fh value is zero in each strata.
PSU_level: by default TRUE; if PSU_level is TRUE, in each strata fh is calculated as division of count of PSU in sample (n_h) and count of PSU in frame (N_h). if PSU_level is FALSE, in each strata fh is calculated as division of count of units in sample (n_h) and count of units in frame (N_h), which calculated as sum of weights.
Z: Optional variables of denominator for ratio estimation. Object convertible to data.table or variable names as character, column numbers or logical vector (length of the vector has to be the same as the column count of dataset).
dataset: Optional survey data object convertible to data.table.
X: Optional matrix of the auxiliary variables for the calibration estimator. Object convertible to data.table or variable names as character, column numbers.
periodX: Optional variable of the survey periods. If supplied, residual estimation of calibration is done independently for each time period. Object convertible to data.table or variable names as character, column numbers.
X_ID_level1: Variable for level1 ID codes. One dimensional object convertible to one-column data.table or variable name as character, column number.
ind_gr: Optional variable by which divided independently X matrix of the auxiliary variables for the calibration. One dimensional object convertible to one-column data.table or variable name as character, column number.
g: Optional variable of the g weights. One dimensional object convertible to one-column data.table or variable name as character, column number.
q: Variable of the positive values accounting for heteroscedasticity. One dimensional object convertible to one-column data.table or variable name as character, column number.
datasetX: Optional survey data object in level1 convertible to data.table.
confidence: Optional positive value for confidence interval. This variable by default is 0.95.
percentratio: Positive numeric value. All linearized variables are multiplied with percentratio value, by default - 1.
outp_lin: Logical value. If TRUE linearized values of the ratio estimator will be printed out.
outp_res: Logical value. If TRUE estimated residuals of calibration will be printed out.

Details

Calculate variance estimation in domains for household surveys based on book of Hansen, Hurwitz and Madow.

References

Morris H. Hansen, William N. Hurwitz, William G. Madow, (1953), Sample survey methods and theory Volume I Methods and applications, 257-258, Wiley.
Guillaume Osier and Emilio Di Meglio. The linearisation approach implemented by Eurostat for the first wave of EU-SILC: what could be done from the second wave onwards? 2012
Guillaume Osier, Yves Berger, Tim Goedeme, (2013), Standard error estimation for the EU-SILC indicators of poverty and social exclusion, Eurostat Methodologies and Working papers, URL http://ec.europa.eu/eurostat/documents/3888793/5855973/KS-RA-13-024-EN.PDF.
Eurostat Methodologies and Working papers, Handbook on precision requirements and variance estimation for ESS household surveys, 2013, URL http://ec.europa.eu/eurostat/documents/3859598/5927001/KS-RA-13-029-EN.PDF.
Yves G. Berger, Tim Goedeme, Guillame Osier (2013). Handbook on standard error estimation and other related sampling issues in EU-SILC, URL https://ec.europa.eu/eurostat/cros/content/handbook-standard-error-estimation-and-other-related-sampling-issues-ver-29072013_en
Jean-Claude Deville (1999). Variance estimation for complex statistics and estimators: linearization and residual techniques. Survey Methodology, 25, 193-203, URL https://www150.statcan.gc.ca/n1/pub/12-001-x/1999002/article/4882-eng.pdf.

Examples

Run this code

library("data.table")
library("laeken")
data("eusilc")
dataset1 <- data.table(IDd = paste0("V", 1 : nrow(eusilc)), eusilc)
aa <- vardomh(Y = "eqIncome", H = "db040", PSU = "db030",
             w_final = "rb050", ID_level1 = "db030",
             ID_level2 = "rb030", Dom = "db040", period = NULL,
             N_h = NULL, Z = NULL, dataset = dataset1, X = NULL,
             X_ID_level1 = NULL, g = NULL, q = NULL, 
             datasetX = NULL, confidence = 0.95, percentratio = 1,
             outp_lin = TRUE, outp_res = TRUE)

if (FALSE) {
dataset2 <- copy(dataset1)
dataset1$period <- 1
dataset2$period <- 2
dataset1 <- data.table(rbind(dataset1, dataset2))

# by default without using fh_zero (finite population correction)
aa2 <- vardomh(Y = "eqIncome", H = "db040", PSU = "db030",
               w_final = "rb050", ID_level1 = "db030",
               ID_level2 = "rb030", Dom = "db040", period = "period",
               N_h = NULL, Z = NULL, dataset = dataset1,
               X = NULL, X_ID_level1 = NULL,  
               g = NULL, q = NULL, datasetX = NULL,
               confidence = .95, percentratio = 1,
               outp_lin = TRUE, outp_res = TRUE)
aa2

# without using fh_zero (finite population correction)
aa3 <- vardomh(Y = "eqIncome", H = "db040", PSU = "db030",
               w_final = "rb050", ID_level1 = "db030", 
               ID_level2 = "rb030", Dom = "db040",
               period = "period", N_h = NULL, fh_zero = FALSE, 
               Z = NULL, dataset = dataset1, X = NULL,
               X_ID_level1 = NULL, g = NULL, q = NULL,
               datasetX = NULL, confidence = .95,
               percentratio = 1, outp_lin = TRUE,
               outp_res = TRUE)
aa3

# with using fh_zero (finite population correction)
aa4 <- vardomh(Y = "eqIncome", H = "db040", PSU = "db030",
               w_final = "rb050", ID_level1 = "db030",
               ID_level2 = "rb030", Dom = "db040",
               period = "period", N_h = NULL, fh_zero = TRUE, 
               Z = NULL, dataset = dataset1,
               X = NULL, X_ID_level1 = NULL, 
               g = NULL, q = NULL, datasetX = NULL,
               confidence = .95, percentratio = 1,
               outp_lin = TRUE, outp_res = TRUE)
aa4}

Run the code above in your browser using DataLab