estGF: Gross Flows estimation

Description

Gross Flows under complex electoral surveys.

Usage

estGF(
  sampleBase = NULL,
  niter = 100,
  model = NULL,
  colWeights = NULL,
  nonrft = FALSE
)

Value

estGF returns a list containing:

Est.CIV: a data.frame containing the gross flows estimation.
Params.Model: a list that contains the $\hat{\eta}_{i}$, $\hat{p}_{ij}$, $\hat{\psi}(i,j)$, $\hat{\rho}_{RR}(i,j)$, $\hat{\rho}_{MM}(i,j)$ parameters for the estimated model.
Sam.Est: a list containing the sampling estimators $\hat{N}_{ij}$, $\hat{R}_{i}$, $\hat{C}_{j}$, $\hat{M}$, $\hat{N}$.

Arguments

sampleBase: An object of class "data.frame" containing the information of electoral candidates. The data must contain the samplings weights.
niter: The number of iterations for the $\eta_{i}$ and $p_{ij}$ model parameters within the model.
model: A character indicating the model to be used in estimating estimated gross flows. The models available are: "I","II","III","IV" (see also "Details").
colWeights: The column name containing the sampling weights to be used in the fitting process.
nonrft: A logical value indicating a non response for first time.

Details

The population size $N$ must satisfy the condition: $$ N = \sum_{j}\sum_{i} N_{ij} + \sum_{j} C_{j} + \sum_{i} R_{i} + M$$ where, $N_{ij}$ is the amount of people interviewed who have classification $i$ at first time and classification $j$ at second time, $R_{i}$ is the amount of people who did not respond at second time, but did at first time, $C_{j}$ is the amount of people who did not respond at first time, but they did at second time and $M$ is the number of people who did not respond at any time or could not be reached. Let $\eta_{i}$ the initial probability that a person has classification $i$ in the first time, and let $p_{ij}$ the vote transition probability for the cell $(i,j)$, where $\sum_{i} \eta_{i} = 1$ and $\sum_{j} p_{ij} = 1$. Thus, four possibles models for the gross flows are given by:

Model I: This model assumes that a person's initial probability of being classified as $i$ at first time is the same for everyone, that is, $\psi(i,j) = \psi$. Besides, transition probabilities between respond and non response not depend of the classification $(i,j)$, that is $\rho_{MM}(i,j) = \rho_{MM}$ and $\rho_{RR}(i,j) = \rho_{RR}$.
Model II: Unlike 'Model I', this model assumes that person initial probability that person has classification $(i,j)$, only depends of his classification at first time, that is $\psi(i,j) = \psi(i)$.
Model III: Unlike 'Model I', this model assumes that transition probabilities between response and non response only depends of probability classification at first time, that is $\rho_{MM}(i,j) = \rho_{MM}(i)$ and $\rho_{RR}(i,j) = \rho_{RR}(i)$.
Model IV: Unlike 'Model I', this model assumes that transition probabilities between response and non response only depends of probability classification at second time, that is $\rho_{MM}(i,j) = \rho_{MM}(j)$ and $\rho_{RR}(i,j) = \rho_{RR}(j)$.

References

Stasny, E. (1987), `Some markov-chain models for nonresponse in estimating gross', Journal of Oficial Statistics 3, pp. 359-373.
Sarndal, C.-E., Swensson, B. & Wretman, J. (1992), Model Assisted Survey Sampling, Springer-Verlag, New York, USA.
Gutierrez, A., Trujillo, L. & Silva, N. (2014), `The estimation of gross ows in complex surveys with random nonresponse', Survey Methodology 40(2), pp. 285-321.

Examples

Run this code

library(TeachingSampling)
library(data.table)
# Colombia's electoral candidates in 2014
candidates_t0 <- c("Clara","Enrique","Santos","Martha","Zuluaga","WhiteVote", "NoVote")
candidates_t1 <- c("Santos","Zuluaga","WhiteVote", "NoVote")

N <- 100000
nCanT0 <- length(candidates_t0)
nCanT1 <- length(candidates_t1)
# Initial probabilities
eta <- matrix(c(0.10, 0.10, 0.20, 0.17, 0.28, 0.1, 0.05),
				byrow = TRUE, nrow = nCanT0)
# Transition probabilities
P <- matrix(c(0.10, 0.60, 0.15, 0.15,
				 0.30, 0.10, 0.25,0.35,
				 0.34, 0.25, 0.16, 0.25,
				 0.25,0.05, 0.35,0.35,
				 0.10, 0.25, 0.45,0.20,
				 0.12, 0.36, 0.22, 0.30,
				 0.10,0.15, 0.30,0.45),
		byrow = TRUE, nrow = nCanT0)
citaMod <- matrix(, ncol = nCanT1, nrow = nCanT0)
row.names(citaMod) <- candidates_t0
colnames(citaMod) <- candidates_t1

for(ii in 1:nCanT0){
		citaMod[ii,] <- c(rmultinom(1, size = N * eta[ii,], prob = P[ii,]))
}

# # Model I
psiI   <- 0.9
rhoRRI <- 0.9
rhoMMI <- 0.5

citaModI <- matrix(nrow = nCanT0 + 1, ncol = nCanT1 + 1)
rownames(citaModI) <- c(candidates_t0, "Non_Resp")
colnames(citaModI) <- c(candidates_t1, "Non_Resp")
citaModI[1:nCanT0, 1:nCanT1] <- P * c(eta) * rhoRRI * psiI
citaModI[(nCanT0 + 1), (nCanT1 + 1)] <- rhoMMI * (1-psiI)
citaModI[1:nCanT0, (nCanT1 + 1)] <- (1-rhoRRI) * psiI * rowSums(P * c(eta))
citaModI[(nCanT0 + 1), 1:nCanT1 ] <- (1-rhoMMI) * (1-psiI) * colSums(P * c(eta))
citaModI <- round_preserve_sum(citaModI * N)
DBcitaModI <- createBase(citaModI)

# Creating auxiliary information
DBcitaModI[,AuxVar := rnorm(nrow(DBcitaModI), mean = 45, sd = 10)]

# Selects a sample with unequal probabilities
res <- S.piPS(n = 3200, as.data.frame(DBcitaModI)[,"AuxVar"])
sam <- res[,1]
pik <- res[,2]
DBcitaModISam <- copy(DBcitaModI[sam,])
DBcitaModISam[,Pik := pik]

# Gross Flows estimation
estima <- estGF(sampleBase = DBcitaModISam, niter = 500, model = "I", colWeights = "Pik")
estima

Run the code above in your browser using DataLab