mbes: Model Based Estimation

Description

mbes is used for model based estimation of population means using auxiliary variables. Difference, ratio and regression estimates are available.

Usage

mbes(formula, data, aux, N = Inf, method = 'all', level = 0.95, ...)

Arguments

formula

object of class formula (or one that can be coerced to that class): symbolic description for connection between primary and secondary information

data

data frame containing variables in the model

aux

known mean of auxiliary variable, which provides secondary information

positive integer for population size. Default is N=Inf, which means that calculations are carried out without finite population correction.

method

estimation method. Options are 'simple','diff','ratio','regr','all'. Default is method='all'.

level

coverage probability for confidence intervals. Default is level=0.95.

…

further options for linear regression model

Value

The function mbes returns an object, which is a list consisting of the components

call

is a list of call components: formula formula, data data frame, aux given value for mean of auxiliary variable, N population size, type type of model based estimation and level coverage probability for confidence intervals

info

is a list of further information components: N population size, n sample size, p number of auxiliary variables, aux true mean of auxiliary variables in population and x.mean sample means of auxiliary variables

simple

is a list of result components, if method='simple' or method='all' is selected: mean mean estimate of population mean for primary information, se standard error of the mean estimate, and ci vector of confidence interval boundaries

diff

is a list of result components, if method='diff' or method='all' is selected: mean mean estimate of population mean for primary information, se standard error of the mean estimate, and ci vector of confidence interval boundaries

ratio

is a list of result components, if method='ratio' or method='all' is selected: mean mean estimate of population mean for primary information, se standard error of the mean estimate, and ci vector of confidence interval boundaries

regr

is a list of result components, if type='regr' or type='all' is selected: mean mean estimate of population mean for primary information, se standard error of mean estimate, ci vector of confidence interval boundaries, and model underlying linear regression model

Details

The option method='simple' calculates the simple sample estimation without using the auxiliary variable. The option method='diff' calculates the difference estimate, method='ratio' the ratio estimate, and method='regr' the regression estimate which is based on the selected model. The option method='all' calculates the simple and all model based estimates. For methods 'diff', 'ratio' and 'all' the formula has to be y~x with y primary and x secondary information. For method 'regr', it is the symbolic description of the linear regression model. In this case, it can be used more than one auxiliary variable. Thus, aux has to be a vector of the same length as the number of auxiliary variables in order as specified in the formula.

References

Kauermann, Goeran/Kuechenhoff, Helmut (2010): Stichproben. Methoden und praktische Umsetzung mit R. Springer.

Examples

Run this code

# NOT RUN {
## 1) simple suppositious example
data(pop)
# Draw a random sample of size=3
set.seed(802016)
data <- pop[sample(1:5, size=3),]
names(data) <- c('id','x','y')
# difference estimator
mbes(formula=y~x, data=data, aux=15, N=5, method='diff', level=0.95)
# ratio estimator
mbes(formula=y~x, data=data, aux=15, N=5, method='ratio', level=0.95)
# regression estimator
mbes(formula=y~x, data=data, aux=15, N=5, method='regr', level=0.95)

## 2) Bundestag election
data(election)
# draw sample of size n = 20
N <- nrow(election)
set.seed(67396)
sample <- election[sort(sample(1:N, size=20)),]
# secondary information SPD in 2002
X.mean <- mean(election$SPD_02)
# forecast proportion of SPD in election of 2005
mbes(SPD_05 ~ SPD_02, data=sample, aux=X.mean, N=N, method='all')
# true value
Y.mean <- mean(election$SPD_05)
Y.mean
# Use a second predictor variable
X.mean2 <- c(mean(election$SPD_02),mean(election$GREEN_02))
# forecast proportion of SPD in election of 2005 with two predictors
mbes(SPD_05 ~ SPD_02+GREEN_02, data=sample, aux=X.mean2, N=N, method= 'regr')

## 3) money sample
data(money)
mu.X <-  mean(money$X)
x <- money$X[which(!is.na(money$y))]
y <- na.omit(money$y)
# estimation
mbes(y~x, aux=mu.X, N=13, method='all')

## 4) model based two-phase sampling with mbes() 
id <- 1:1000
x <- rep(c(1,0,1,0),times=c(10,90,70,830))
y <- rep(c(1,0,NA),times=c(15,85,900))
phase <- rep(c(2,1), times=c(100,900))
data <- data.frame(id,x,y,phase)
# mean of x out of first phase
mean.x <- mean(data$x)
mean.x
N1 <- length(data$x) 
# calculation of estimation for y 
est.y <- mbes(y~x, data=data, aux=mean.x, N=N1, method='ratio')
est.y
# correction of standard error with uncertaincy in first phase
v.y <- var(data$y, na.rm=TRUE)
se.y <- sqrt(est.y$ratio$se^2 + v.y/N1)
se.y
# corrected confidence interval
lower <- est.y$ratio$mean - qnorm(0.975)*se.y
upper <- est.y$ratio$mean + qnorm(0.975)*se.y
c(lower, upper)
# }

Run the code above in your browser using DataLab