varpoord: Estimation of the variance and deff for sample surveys for indicators on social exclusion and poverty

Description

Computes the estimation of the variance for indicators on social exclusion and poverty.

Usage

varpoord(inc, w_final, income_thres = NULL, wght_thres = NULL,
                 ID_household, id = NULL, H, PSU, N_h, fh_zero = FALSE,
                 PSU_level=TRUE, sort = NULL, Dom = NULL, period = NULL,
                 gender = NULL, dataset = NULL, X = NULL, periodX = NULL,
                 X_ID_household = NULL, ind_gr = NULL, g = NULL, datasetX = NULL,
                 q, percentage = 60, order_quant = 50, alpha = 20,
                 confidence = 0.95, outp_lin = FALSE, outp_res = FALSE,
                 na.rm = FALSE, several.ok = FALSE, type = "lin_rmpg")

Arguments

inc

either a numeric vector, 1 column data.frame, matrix, data.table giving the equivalized disposable income, or (if dataset is not NULL) a character string, an integer or a logical vector (length is the same as 'dataset

w_final

optional; either a numeric vector, 1 column data.frame, matrix, data.table giving the personal sample weights, or (if dataset is not NULL) a character string, an integer or a logical vector (length is the same as 'da

income_thres

either a numeric vector, 1 column data.frame, matrix, data.table giving the equivalized disposable income for computation and linearization of the poverty threshold, or (if dataset is not NULL) a character string, an

wght_thres

either a numeric vector, 1 column data.frame, matrix, data.table giving the personal sample weights for computation and linearization of the poverty threshold, or (if dataset is not NULL) a character string, an intege

ID_household

either 1 column data.frame, matrix, data.table with column names giving the household IDs, or (if dataset is not NULL) a character string, an integer or a logical vector (length is the same as 'dataset' column count)

optional; either 1 column data.frame, matrix, data.table with column names giving the personal IDs, or (if dataset is not NULL) a character string, an integer or a logical vector (length is the same as 'dataset' colum

either 1 column data.frame, matrix, data.table with column name giving elements indicating the unit stratum, or (if dataset is not NULL) a character string, an integer or a logical vector (length is the same as 'dataset

PSU

either 1 column data.frame, matrix, data.table giving primary sampling unit, or (if dataset is not NULL) a character string, an integer or a logical vector (length is the same as 'dataset' column count) specifying the co

N_h

either a matrix giving the first column - stratum, but the second column - the total of the population in each stratum.

fh_zero

by default FALSE; fh is calculated as division of n_h and N_h in each strata, if true, fh value is zero in each strata.

PSU_level

by default TRUE; if PSU_level is true, in each strata fh is calculated as division of count of PSU in sample (n_h) and count of PSU in frame(N_h). if PSU_level is false, in each strata fh is calculated as division of count of units in sample (n_

sort

optional; either a numeric vector, 1 column data.frame, matrix, data.table giving the personal IDs to be used as tie-breakers for sorting, or (if dataset is not NULL) a character string, an integer or a logical vector

Dom

optional; either a data.frame, matrix, data.table with column names giving different domains, or (if dataset is not NULL) character strings, integers or a logical vectors (length is the same as 'dataset' column count) sp

period

optional; either a data.frame, matrix, data.table with column names giving different periods, or (if dataset is not NULL) character strings, integers or a logical vectors (length is the same as 'dataset' column coun

gender

either a factor giving the gender, or (if dataset is not NULL) a character string, an integer or a logical vector (length is the same as 'dataset' column count) specifying the corresponding column of dataset

dataset

an optional; name of the individual dataset data.frame.

optional; either a data.frame, matrix, data.table giving auxiliary variables, or (if datasetX is not NULL) character strings, integers or a logical vectors (length is the same as 'dataset' column count) specifying the co

periodX

optional; either a data.frame, matrix, data.table with column names giving different periods for data X, or (if datasetX is not NULL) character strings, integers or a logical vectors (length is the same as 'dataset'

X_ID_household

either 1 column data.frame, matrix, data.table with column name giving the household IDs for auxiliary variables, or (if datasetX is not NULL) a character string, an integer or a logical vector (length is the same as

ind_gr

optional; either a vector, 1 column data.frame, matrix, data.table giving the variable by which divided independently auxiliary variables, or (if datasetX is not NULL) a character string, an integer or a logical vector

optional; either a numeric vector, 1 column data.frame, matrix, data.table giving the g weights, or (if datasetX is not NULL) a character string, an integer or a logical vector (length is the same as 'dataset' column co

datasetX

an optional; name of the individual dataset data.frame.

optional; either a numeric vector, 1 column data.frame, matrix, data.table giving the positive values accounting for heteroscedasticity, or (if datasetX is not NULL) a character string, an integer or a logical vector (l

percentage

a numeric value in $[0,100]$ giving the percentage of the income quantile to be used for the at-risk-of-poverty threshold (see linarpt).

order_quant

a numeric value in $[0,100]$ giving the order of the income quintale (in percentage) to be used for the at-risk-of-poverty threshold (see linarpt).

alpha

a numeric value in $[0,100]$ giving the Order of the income quantile share ratio (in percentage).

confidence

optional; either a positive value for confidence interval. This variable by default is 0.95.

outp_lin

logical. if TRUE linearized values will be printed out

outp_res

logical. if TRUE estimated residuals of calibration will be printed out

na.rm

a logical indicating whether missing values should be removed.

several.ok

logical specifying if type should be allowed to have more than one.

type

a character vector (of length one unless several.ok is TRUE), example "linarpr","linarpt", "lingpg", "linpoormed", "linrmpg", "lingini", "lingini2", "linqsr", "all_choises".

Value

The function returns values:
estima data.frame containing the estimation(s) by domain, or (if Dom is NULL) totals.
vara matrix containing the values of the variance estimation by domains or (if Dom is NULL) totals.
sea matrix containing the values of the standart error by domains or (if Dom is NULL) totals.
rsea data.frame containing the values of the relative standart error (coefficient of variation) by domains or (if Dom is NULL) totals in percentage.
cva data.frame containing the values of the relative standart error (coefficient of variation) by domains or (if Dom is NULL) totals.
absolute_margin_of_errora matrix containing the values of the absolute margin of error by domains or (if Dom is NULL) totals.
relative_margin_of_errora matrix containing the values of the relative margin of error by domains or (if Dom is NULL) totals.
CI_lowera data.frame containing the values of the confidence interval lower bound by domains or (if Dom is NULL) totals.
CI_uppera data.frame containing the values of the confidence interval upper bound by domains or (if Dom is NULL) totals.
var_srs_HTa matrix containing the values of the variance estimation of HT estimator under SRS by domains or (if Dom is NULL) totals.
var_cur_HTa matrix containing the values of the variance estimation of HT estimator under HT estimator under current design by domains or (if Dom is NULL) totals.
var_srs_caa matrix containing the values of the variance estimation of calibrated estimator under SRS by domains or (if Dom is NULL) totals.
deff_sama matrix containing the values of the estimation of the design effect of sample design by domains or (if Dom is NULL) totals.
deff_esta matrix containing the values of the estimation of the design effect of estimator by domains or (if Dom is NULL) totals.
deffa matrix containing the values of the estimation of the overall design effect of sample design and estimator by domains or (if Dom is NULL) totals.
lin_outa data.table containing the linearized values with ID_household and id.
res_outa data.table containing the estimated residuals of calibration with id and PSU.
all_resulta data.frame containing all previosly definited values together by domains or (if Dom is NULL) totals.

References

Yves G. Berger, Tim Goedeme, Guillame Osier (2013). Handbook on standard error estimation and other related sampling issues in EU-SILC, URL http://www.cros-portal.eu/content/handbook-standard-error-estimation-and-other-related-sampling-issues-ver-29072013 Working group on Statistics on Income and Living Conditions (2004) Common cross-sectional EU indicators based on EU-SILC; the gender pay gap. EU-SILC 131-rev/04, Eurostat. Guillaume Osier (2009). Variance estimation for complex indicators of poverty and inequality. Journal of the European Survey Research Association, Vol.3, No.3, pp. 167-195, ISSN 1864-3361, URL https://ojs.ub.uni-konstanz.de/srm/article/view/369. MATTI LANGEL - YVES TILLE, Corrado Gini, a pioneer in balanced sampling and inequality theory. METRON - International Journal of Statistics, 2011, vol. LXIX, n. 1, pp. 45-65, URL ftp://metron.sta.uniroma1.it/RePEc/articoli/2011-1-3.pdf. Deville, J. C. (1999). Variance estimation for complex statistics and estimators: linearization and residual techniques. Survey Methodology, 25, 193-203, URL http://www5.statcan.gc.ca/bsolc/olc-cel/olc-cel?lang=eng&catno=12-001-X19990024882.

Examples

Run this code

data(eusilc)
dataset <- data.frame(1:nrow(eusilc),eusilc)
colnames(dataset)[1] <- "IDd"

aa<-varpoord("eqIncome", "rb050", income_thres = NULL,
             wght_thres = NULL, ID_household = "db030",
             id = NULL, H="db040",
             PSU="rb030", N_h=NULL, sort = NULL,
             Dom = "db040", gender = NULL, X = NULL,
             X_ID_household = NULL,
             g = NULL,
             datasetX = NULL,
             q = rep(1, if (is.null(datasetX)) 
                        nrow(as.data.frame(H)) else nrow(datasetX)),
             dataset =  dataset, percentage=60, order_quant=50,
             alpha = 20, confidence = .95, outp_lin = TRUE,
             outp_res = TRUE, na.rm=FALSE,
             several.ok=FALSE, type="lingini")
aa$lin_out[20:40]
aa$res_out[20:40]

Run the code above in your browser using DataLab