Learn R Programming

vardpoor (version 0.3.4)

varpoord: Estimation of the variance and deff for sample surveys for indicators on social exclusion and poverty

Description

Computes the estimation of the variance for indicators on social exclusion and poverty.

Usage

varpoord(Y, w_final, age=NULL, pl085=NULL, month_at_work=NULL,
                     Y_den=NULL, Y_thres = NULL, wght_thres = NULL,
                 ID_household, id = NULL, H, PSU, N_h, fh_zero = FALSE,
                 PSU_level=TRUE, sort = NULL, Dom = NULL, period = NULL,
                 gender = NULL, dataset = NULL, X = NULL, periodX = NULL,
                 X_ID_household = NULL, ind_gr = NULL, g = NULL, datasetX = NULL,
                 q, percentage = 60, order_quant = 50, alpha = 20,
                 confidence = 0.95, outp_lin = FALSE, outp_res = FALSE,
                 several.ok = FALSE, type = "linrmpg")

Arguments

Y
Study variable (for example equalized disposable income or gross pension income). One dimensional object convertible to one-column data.frame or variable name as character, column number or logical vector with only one TRUE value
w_final
Weight variable. One dimensional object convertible to one-column data.frame or variable name as character, column number or logical vector with only one TRUE value (length of the vector has to be the same as the column count of
age
Age variable. One dimensional object convertible to one-column data.frame or variable name as character, column number or logical vector with only one TRUE value (length of the vector has to be the same as the column count of
pl085
Retirement variable (Number of months spent in retirement or early retirement). One dimensional object convertible to one-column data.frame or variable name as character, column number or logical vector with only one TRUE value (
month_at_work
Variable for total number of month at work (sum of the number of months spent at full-time work as employee, number of months spent at part-time work as employee, number of months spent at full-time work as self-employed (including family worker), number
Y_den
Denominator variable (for example gross individual earnings). One dimensional object convertible to one-column data.frame or variable name as character, column number or logical vector with only one TRUE value (length of the vect
Y_thres
Variable (for example equalized disposable income) used for computation and linearization of poverty threshold. One dimensional object convertible to one-column data.frame or variable name as character, column number or logical vector with on
wght_thres
Weight variable used for computation and linearization of poverty threshold. One dimensional object convertible to one-column data.frame or variable name as character, column number or logical vector with only one TRUE value (len
ID_household
Variable for household ID codes. One dimensional object convertible to one-column data.frame or variable name as character, column number or logical vector with only one TRUE value (length of the vector has to be the same as the
id
Optional variable for unit ID codes. One dimensional object convertible to one-column data.frame or variable name as character, column number or logical vector with only one TRUE value (length of the vector has to be the same as
H
The unit stratum variable. One dimensional object convertible to one-column data.frame or variable name as character, column number or logical vector with only one TRUE value (length of the vector has to be the same as the column
PSU
Primary sampling unit variable. One dimensional object convertible to one-column data.frame or variable name as character, column number or logical vector with only one TRUE value (length of the vector has to be the same as the c
N_h
optional; either a matrix giving the first column - stratum, but the second column - the total of the population in each stratum.
fh_zero
by default FALSE; fh is calculated as division of n_h and N_h in each strata, if true, fh value is zero in each strata.
PSU_level
by default TRUE; if PSU_level is true, in each strata fh is calculated as division of count of PSU in sample (n_h) and count of PSU in frame(N_h). if PSU_level is false, in each strata fh is calculated as division of count of units in sample (n_h) and cou
sort
Optional variable to be used as tie-breaker for sorting. One dimensional object convertible to one-column data.frame or variable name as character, column number or logical vector with only one TRUE value (length of the vector ha
Dom
Optional variables used to define population domains. If supplied, variables is calculated for each domain. An object convertible to data.frame or variable names as character vector, column numbers or logical vector (length of the vector has
period
Optional variable for survey period. If supplied, variables is calculated for each time period. Object convertable to data.frame or variable names as character, column numbers or logical vector (length of the vector has to be the same as the
gender
Numerical variable for gender, where 1 is for males, but 2 is for females. One dimensional object convertible to one-column data.frame or variable name as character, column number or logical vector with only one TRUE value (lengt
dataset
Optional survey data object convertible to data.frame.
X
Optional matrix of the auxiliary variables for the calibration estimator. Object convertible to data.frame or variable names as character, column numbers or logical vector (length of the vector has to be the same as the column count of
periodX
Optional variable of the survey periods. If supplied, residual estimation of calibration is done independently for each time period. Object convertible to data.frame or variable names as character, column numbers or logical vector (length of
X_ID_household
Variable for household ID codes. One dimensional object convertible to one-column data.frame or variable name as character, column number or logical vector with only one TRUE value (length of the vector has to be the same as the
ind_gr
Optional variable by which divided independently auxiliary variables. One dimensional object convertible to one-column data.frame or variable name as character, column number or logical vector with only one TRUE value (length of
g
Optional variable of the g weights. One dimensional object convertible to one-column data.frame or variable name as character, column number or logical vector with only one TRUE value (length of the vector has to be the same as t
datasetX
Optional survey data object in household level convertible to data.frame.
q
Variable of the positive values accounting for heteroscedasticity. One dimensional object convertible to one-column data.frame or variable name as character, column number or logical vector with only one TRUE value (length of the
percentage
A numeric value in range $[0,100]$ for $p$ in the formula for poverty threshold computation: $$\frac{p}{100} \cdot Z_{\frac{\alpha}{100}}.$$ For example, to compute poverty threshold equal to 60% of some income quantile, $p$ should be set equal to 60.
order_quant
A numeric value in range $[0,100]$ for $\alpha$ in the formula for poverty threshold computation: $$\frac{p}{100} \cdot Z_{\frac{\alpha}{100}}.$$ For example, to compute poverty threshold equal to some percentage of median income, $\alpha$ should be set
alpha
a numeric value in range $[0,100]$ for the order of the income quantile share ratio (in percentage).
confidence
Optional positive value for confidence interval. This variable by default is 0.95.
outp_lin
Logical value. If TRUE linearized values of the ratio estimator will be printed out.
outp_res
Logical value. If TRUE estimated residuals of calibration will be printed out.
several.ok
Logical value. If type should be allowed to have more than one.
type
a character vector (of length one unless several.ok is TRUE), example "linarpr","linarpt", "lingpg", "linpoormed", "linrmpg", "lingini", "lingini2", "linqsr", "linarr", "linrmi", "all_choices".

Value

  • A list with objects are returned by the function:
  • lin_outA data.table containing the linearized values of the ratio estimator with id and PSU.
  • res_outA data.table containing the estimated residuals of calibration with id and PSU.
  • all_resultA data.table, which containing variables: respondent_count - the count of respondents, pop_size - the estimated size of population, n_nonzero - the count of respondents, who answers are larger than zero, value - the estimated value, var - the estimated variance, se - the estimated standard error, rse - the estimated relative standard error (coefficient of variation), cv - the estimated relative standard error (coefficient of variation) in percentage, absolute_margin_of_error - the estimated absolute margin of error, relative_margin_of_error - the estimated relative margin of error, CI_lower - the estimated confidence interval lower bound, CI_upper - the estimated confidence interval upper bound, var_srs_HT - the estimated variance of the HT estimator under SRS, var_cur_HT - the estimated variance of the HT estimator under current design, var_srs_ca - the estimated variance of the calibrated estimator under SRS, deff_sam - the estimated design effect of sample design, deff_est - the estimated design effect of estimator, deff - the overall estimated design effect of sample design and estimator, n_eff - the effective sample size.

References

Eric Graf and Yves Tille, Variance Estimation Using Linearization for Poverty and Social Exclusion Indicators, Survey Methodology, June 2014 61 Vol. 40, No. 1, pp. 61-79, Statistics Canada, Catalogue no. 12-001-X, URL http://www.statcan.gc.ca/pub/12-001-x/12-001-x2014001-eng.pdf Guillaume Osier and Emilio Di Meglio. The linearisation approach implemented by Eurostat for the first wave of EU-SILC: what could be done from the second wave onwards? 2012 Guillaume Osier (2009). Variance estimation for complex indicators of poverty and inequality. Journal of the European Survey Research Association, Vol.3, No.3, pp. 167-195, ISSN 1864-3361, URL http://ojs.ub.uni-konstanz.de/srm/article/view/369. Eurostat Methodologies and Working papers, Standard error estimation for the EU-SILC indicators of poverty and social exclusion, 2013, URL http://ec.europa.eu/eurostat/documents/3859598/5927001/KS-RA-13-029-EN.PDF. Jean-Claude Deville (1999). Variance estimation for complex statistics and estimators: linearization and residual techniques. Survey Methodology, 25, 193-203, URL http://www5.statcan.gc.ca/bsolc/olc-cel/olc-cel?lang=eng&catno=12-001-X19990024882. MATTI LANGEL - YVES TILLE, Corrado Gini, a pioneer in balanced sampling and inequality theory. METRON - International Journal of Statistics, 2011, vol. LXIX, n. 1, pp. 45-65, URL ftp://metron.sta.uniroma1.it/RePEc/articoli/2011-1-3.pdf. Morris H. Hansen, William N. Hurwitz, William G. Madow, (1953), Sample survey methods and theory Volume I Methods and applications, 257-258, Wiley. Yves G. Berger, Tim Goedeme, Guillame Osier (2013). Handbook on standard error estimation and other related sampling issues in EU-SILC, URL http://www.cros-portal.eu/content/handbook-standard-error-estimation-and-other-related-sampling-issues-ver-29072013 Working group on Statistics on Income and Living Conditions (2004) Common cross-sectional EU indicators based on EU-SILC; the gender pay gap. EU-SILC 131-rev/04, Eurostat.

See Also

vardom, vardomh, linarpt

Examples

Run this code
data(eusilc)
dataset <- data.frame(1:nrow(eusilc),eusilc)
colnames(dataset)[1] <- "IDd"
dataset1 <- dataset[1:1000,]

aa<-varpoord(Y = "eqIncome", w_final = "rb050",
             Y_thres = NULL, wght_thres = NULL,
             ID_household = "db030", id = "IDd", 
             H = "db040", PSU = "rb030", N_h = NULL,
             sort = NULL, Dom = "db040",
             gender = NULL, X = NULL,
             X_ID_household = NULL, g = NULL,
             datasetX = NULL,
             q = rep(1, if (is.null(datasetX)) 
                        nrow(as.data.frame(H)) else nrow(datasetX)),
             dataset =  dataset1, percentage=60, order_quant=50,
             alpha = 20, confidence = .95, outp_lin = FALSE,
             outp_res = FALSE, several.ok=FALSE, type="linarpt")
aa


aa<-varpoord(Y = "eqIncome", w_final = "rb050",
             Y_thres = NULL, wght_thres = NULL,
             ID_household = "db030", id = "IDd", 
             H = "db040", PSU = "rb030", N_h = NULL,
             sort = NULL, Dom = "db040",
             gender = NULL, X = NULL,
             X_ID_household = NULL, g = NULL,
             datasetX = NULL,
             q = rep(1, if (is.null(datasetX)) 
                        nrow(as.data.frame(H)) else nrow(datasetX)),
             dataset =  dataset, percentage=60, order_quant=50,
             alpha = 20, confidence = .95, outp_lin = TRUE,
             outp_res = TRUE, several.ok=FALSE, type="all_choices")
aa$lin_out[20:40]
aa$res_out[20:40]

Run the code above in your browser using DataLab