Learn R Programming

vardpoor (version 0.2.0.9.2)

vardcros: Variance estimation for cros-sectional and longitudinal measures for any stage cluster sampling designs

Description

Computes the variance estimation for cros-sectional and longitudinal measures for any stage cluster sampling designs.

Usage

vardcros(Y, H, PSU, w_final, id, Dom = NULL,
         Z = NULL, country, period,
         dataset = NULL, meanY=TRUE,
         withperiod=TRUE, netchanges=TRUE,
         confidence = .95)

Arguments

Y
Variables of interest. Object convertable to data.frame or variable names as character, column numbers or logical vector with only one TRUE value (length of the vector has to be the same as the column count of dataset
H
The unit stratum variable. One dimentional object convertable to one-column data.frame or variable name as character, column number or logical vector with only one TRUE value (length of the vector has to be the same as the column
PSU
Primary sampling unit variable. One dimentional object convertable to one-column data.frame or variable name as character, column number or logical vector with only one TRUE value (length of the vector has to be the same as the c
w_final
Weight variable. One dimentional object convertable to one-column data.frame or variable name as character, column number or logical vector with only one TRUE value (length of the vector has to be the same as the column count of
id
optional; either 1 column data.frame, matrix, data.table with column names giving the IDs, or (if dataset is not NULL) a character string, an integer or a logical vector (length is the same as 'dataset' column count) specifying t
Dom
Optional variables used to define population domains. If supplied, variables is calculated for each domain. An object convertable to data.frame or variable names as character vector, column numbers or logical vector (length of the vector has
Z
Optional variables of denominator for ratio estimation. If supplied, the ratio estimation is computed. Object convertable to data.frame or variable names as character, column numbers or logical vector (length of the vector has to be the same
country
optional; either a data.frame, matrix, data.table with column names giving different countries, or (if dataset is not NULL) character strings, integers or a logical vectors (length is the same as 'dataset' column count) specifyin
period
Optional variable for survey period. If supplied, variables is calculated for each time period. One dimentional object convertable to one-column data.frame or variable name as character, column number or logical vector with only one TRU
dataset
Optional survey data object convertable to data.frame.
meanY
Logical value. If value is TRUE, then is calculated mean of the variables of interest.
withperiod
Logical value. If TRUE is value, the results is with period, if FALSE, without period.
netchanges
Logical value. If value is TRUE, then produce two objects: the first object is aggregation of weighted data by period (if available), country, strata and PSU, the second object is an estimation for Y, the variance, gradient for numerator and denominator b
confidence
Optional positive value for confidence interval. This variable by default is 0.95.

Value

  • A list with three objects are returned by the function:
  • data_net_changesA data.table containing aggregation of weighted data by period (if available), country, strata, PSU.
  • var_gradA data.table containing estimation for Y, the variance, gradient for numerator and denominator by country and period (if available).
  • resultsA data.table containing sample_size - the sample size (in numbers of individuals), pop_size - the population size (in numbers of individuals), total - the estimated totals, variance - the estimated variance of cross-sectional or longitudinal measures, sd_w - the estimated weighted variance of simple random sample, sd_nw - the estimated variance estimation of simple random sample, pop - the population size (in numbers of households), sampl_siz - the sample size (in numbers of households), stderr_w - the estimated weighted standard error of simple random sample, stderr_nw - the estimated standard error of simple random sample, se - the estimated standard error of cross-sectional or longitudinal, rse - the estiamted relative standart error (coefficient of variation), cv - the estimated relative standart error (coefficient of variation) in percentage, absolute_margin_of_error - the estimated absolute margin of error, relative_margin_of_error - the estimated relative margin of error, CI_lower - the estimated confidence interval lower bound, CI_upper - the estimated confidence interval upper bound.

References

Eurostat Methodologies and Working papers, Standard error estimation for the EU-SILC indicators of poverty and social exclusion, 2013, URL http://epp.eurostat.ec.europa.eu/cache/ITY_OFFPUB/KS-RA-13-024/EN/KS-RA-13-024-EN.PDF. Yves G. Berger, Tim Goedeme, Guillame Osier (2013). Handbook on standard error estimation and other related sampling issues in EU-SILC, URL http://www.cros-portal.eu/content/handbook-standard-error-estimation-and-other-related-sampling-issues-ver-29072013

See Also

domain, lin.ratio

Examples

Run this code
# Example 1
data(eusilc)
set.seed(1)
data <- data.table(rbind(eusilc, eusilc),
                      year=c(rep(2010, nrow(eusilc)),
                             rep(2011, nrow(eusilc))),
                   country=c(rep("AT", nrow(eusilc)),
                             rep("AT", nrow(eusilc))))
data[age<0, age:=0]
PSU <- data[,.N, keyby="db030"]
PSU[, N:=NULL]
PSU[, PSU:=trunc(runif(nrow(PSU), 0, 100))]
data <- merge(data, PSU, by="db030", all=TRUE)
PSU <- eusilc <- 0
data[, strata:="XXXX"]
data[, strata:=as.character(strata)]
data[, t_pov:=trunc(runif(nrow(data), 0, 2))]
data[, t_dep:=trunc(runif(nrow(data), 0, 2))]
data[, t_lwi:=trunc(runif(nrow(data), 0, 2))]
data[, exp:= 1]
data[, exp2:= 1 * (age < 60)]

# At-risk-of-poverty (AROP)
data[, pov:= ifelse (t_pov == 1, 1, 0)]

# Severe material deprivation (DEP)
data[, dep:= ifelse (t_dep == 1, 1, 0)]

# Low work intensity (LWI)
data[, lwi:= ifelse (t_lwi == 1 & exp2 == 1, 1, 0)]

# At-risk-of-poverty or social exclusion (AROPE)
data[, arope:= ifelse (pov == 1 | dep == 1 | lwi == 1, 1, 0)]

result11 <- vardcros(Y=c("pov", "dep", "arope"),
                    H="strata", PSU="PSU", w_final="rb050",
                    id="db030", Dom="rb090", Z=NULL,
                    country="country", period="year",
                    dataset=data,
                    meanY=TRUE, 
                    withperiod=TRUE,
                    netchanges=TRUE,
                    confidence = .95)

data2 <- data[exp2==1]
result12 <- vardcros(Y=c("lwi"),
                    H="strata", PSU="PSU", w_final="rb050",
                    id="db030", Dom="rb090", Z=NULL,
                    country="country", period="year",
                    dataset=data2,
                    meanY=TRUE, 
                    withperiod=TRUE,
                    netchanges=TRUE,
                    confidence = .95)

### Example 2
data(eusilc)
set.seed(1)
year <- 2011
data <- data.table(rbind(eusilc, eusilc, eusilc, eusilc),
                   rb010=c(rep(2008, nrow(eusilc)),
                           rep(2009, nrow(eusilc)),
                           rep(2010, nrow(eusilc)),
                           rep(2011, nrow(eusilc))),
                   rb020=c(rep("AT", nrow(eusilc)),
                           rep("AT", nrow(eusilc)),
                           rep("AT", nrow(eusilc)),
                           rep("AT", nrow(eusilc))))
data[, u:=1]
data[age<0, age:=0]
data[, strata:="XXXX"]
PSU <- data[,.N, keyby="db030"]
PSU[, N:=NULL]
PSU[, PSU:=trunc(runif(nrow(PSU), 0, 100))]
data <- merge(data, PSU, by="db030", all=TRUE)
thres <- data.table(rb020=rep("AT",4),
                    thres= c(11406, 11931, 12371, 12791),
                    rb010=2008:2011)
setnames(thres, names(thres), tolower(names(thres)))
setkeyv(data, c("rb010", "rb020"))
setkeyv(thres, c("rb010", "rb020"))
data <- merge(data, thres, all.x=TRUE)
data[is.na(u), u:=0]
data <- data[u==1]
setkeyv(data, c("rb020", "rb030"))

#############
# T3        #
#############

T3 <- data[rb010==year-3]
T3[, strata1:=strata]
T3[, PSU1:=PSU]
T3[, w1:=rb050]
T3[, inc1:=eqIncome]
T3[, rb110_1:=db030]
setnames(T3, "thres", "thres1")
T3[, pov1:=inc1<=thres1]
T3 <- T3[, c("rb020", "rb030", "strata", "PSU", "inc1", "pov1"), with=FALSE]

#############
# T2        #
#############
T2 <- data[rb010==year-2]
T2[, strata2:=strata]
T2[, PSU2:=PSU]
T2[, w2:=rb050]
T2[, inc2:=eqIncome]
T2[, rb110_2:=db030]
setnames(T2, "thres", "thres2")
T2[, pov2:=inc2<=thres2]
T2 <- T2[, c("rb020", "rb030","strata2","PSU2","inc2","pov2"), with=FALSE]
#############
# T1 #
#############
T1 <- data[rb010==year-1]
T1[, strata3:=strata]
T1[, PSU3:=PSU]
T1[, w3:=rb050]
T1[, inc3:=eqIncome]
T1[, rb110_3:=db030]
setnames(T1, "thres", "thres3")
T1[, pov3:=inc3<=thres3]
T1 <- T1[, c("rb020", "rb030", "strata3", "PSU3", "inc3", "pov3"), with=FALSE]
#############
# T0 #
#############
T0 <- data[rb010==year]
T0[, PSU4:=PSU]
T0[, strata4:=strata]
T0[, w4:=rb050]
T0[, inc4:=eqIncome]
T0[, rb110_4:=db030]
setnames(T0, "thres", "thres4")
T0[, pov4:=inc4<=thres4]
T0 <- T0[, c("rb020", "rb030", "strata4", "PSU4", "w4", "inc4", "pov4"), with=FALSE]
apv <- merge(T3, T2, all=TRUE)
apv <- merge(apv, T1, all=TRUE)
apv <- merge(apv, T0, all=TRUE)
apv <- apv[(!is.na(inc1)) & (!is.na(inc2)) & (!is.na(inc3)) & (!is.na(inc4))]
apv[, ppr:=ifelse(((pov4==1)&((pov1==1&pov2==1&pov3==1)|(pov1==1&pov2==1&
pov3==0)|(pov1==1&pov2==0&pov3==1)|(pov1==0&pov2==1&pov3==1))),1,0)]

result20 <- vardcros(Y="ppr", H="strata", PSU="PSU",
                    w_final="w4", id="rb030",
                    Dom = NULL, Z=NULL,
                    country="rb020", period=NULL,
                    dataset=apv,
                    meanY=TRUE,
                    withperiod=FALSE,
                    netchanges=FALSE,
                    confidence = .95)

Run the code above in your browser using DataLab