Learn R Programming

GFE (version 0.1.1)

estGF: Gross Flows estimation

Description

Gross Flows under complex electoral surveys.

Usage

estGF(
  sampleBase = NULL,
  niter = 100,
  model = NULL,
  colWeights = NULL,
  nonrft = FALSE
)

Value

estGF returns a list containing:

  1. Est.CIV: a data.frame containing the gross flows estimation.

  2. Params.Model: a list that contains the \(\hat{\eta}_{i}\), \(\hat{p}_{ij}\), \(\hat{\psi}(i,j)\), \(\hat{\rho}_{RR}(i,j)\), \(\hat{\rho}_{MM}(i,j)\) parameters for the estimated model.

  3. Sam.Est: a list containing the sampling estimators \(\hat{N}_{ij}\), \(\hat{R}_{i}\), \(\hat{C}_{j}\), \(\hat{M}\), \(\hat{N}\).

Arguments

sampleBase

An object of class "data.frame" containing the information of electoral candidates. The data must contain the samplings weights.

niter

The number of iterations for the \(\eta_{i}\) and \(p_{ij}\) model parameters within the model.

model

A character indicating the model to be used in estimating estimated gross flows. The models available are: "I","II","III","IV" (see also "Details").

colWeights

The column name containing the sampling weights to be used in the fitting process.

nonrft

A logical value indicating a non response for first time.

Details

The population size \(N\) must satisfy the condition: $$ N = \sum_{j}\sum_{i} N_{ij} + \sum_{j} C_{j} + \sum_{i} R_{i} + M$$ where, \(N_{ij}\) is the amount of people interviewed who have classification \(i\) at first time and classification \(j\) at second time, \(R_{i}\) is the amount of people who did not respond at second time, but did at first time, \(C_{j}\) is the amount of people who did not respond at first time, but they did at second time and \(M\) is the number of people who did not respond at any time or could not be reached. Let \(\eta_{i}\) the initial probability that a person has classification \(i\) in the first time, and let \(p_{ij}\) the vote transition probability for the cell \((i,j)\), where \(\sum_{i} \eta_{i} = 1\) and \(\sum_{j} p_{ij} = 1\). Thus, four possibles models for the gross flows are given by:

  1. Model I: This model assumes that a person's initial probability of being classified as \(i\) at first time is the same for everyone, that is, \(\psi(i,j) = \psi\). Besides, transition probabilities between respond and non response not depend of the classification \((i,j)\), that is \(\rho_{MM}(i,j) = \rho_{MM}\) and \(\rho_{RR}(i,j) = \rho_{RR}\).

  2. Model II: Unlike 'Model I', this model assumes that person initial probability that person has classification \((i,j)\), only depends of his classification at first time, that is \(\psi(i,j) = \psi(i)\).

  3. Model III: Unlike 'Model I', this model assumes that transition probabilities between response and non response only depends of probability classification at first time, that is \(\rho_{MM}(i,j) = \rho_{MM}(i)\) and \(\rho_{RR}(i,j) = \rho_{RR}(i)\).

  4. Model IV: Unlike 'Model I', this model assumes that transition probabilities between response and non response only depends of probability classification at second time, that is \(\rho_{MM}(i,j) = \rho_{MM}(j)\) and \(\rho_{RR}(i,j) = \rho_{RR}(j)\).

References

Stasny, E. (1987), `Some markov-chain models for nonresponse in estimating gross', Journal of Oficial Statistics 3, pp. 359-373.
Sarndal, C.-E., Swensson, B. & Wretman, J. (1992), Model Assisted Survey Sampling, Springer-Verlag, New York, USA.
Gutierrez, A., Trujillo, L. & Silva, N. (2014), `The estimation of gross ows in complex surveys with random nonresponse', Survey Methodology 40(2), pp. 285-321.

Examples

Run this code
library(TeachingSampling)
library(data.table)
# Colombia's electoral candidates in 2014
candidates_t0 <- c("Clara","Enrique","Santos","Martha","Zuluaga","WhiteVote", "NoVote")
candidates_t1 <- c("Santos","Zuluaga","WhiteVote", "NoVote")

N <- 100000
nCanT0 <- length(candidates_t0)
nCanT1 <- length(candidates_t1)
# Initial probabilities
eta <- matrix(c(0.10, 0.10, 0.20, 0.17, 0.28, 0.1, 0.05),
				byrow = TRUE, nrow = nCanT0)
# Transition probabilities
P <- matrix(c(0.10, 0.60, 0.15, 0.15,
				 0.30, 0.10, 0.25,0.35,
				 0.34, 0.25, 0.16, 0.25,
				 0.25,0.05, 0.35,0.35,
				 0.10, 0.25, 0.45,0.20,
				 0.12, 0.36, 0.22, 0.30,
				 0.10,0.15, 0.30,0.45),
		byrow = TRUE, nrow = nCanT0)
citaMod <- matrix(, ncol = nCanT1, nrow = nCanT0)
row.names(citaMod) <- candidates_t0
colnames(citaMod) <- candidates_t1

for(ii in 1:nCanT0){
		citaMod[ii,] <- c(rmultinom(1, size = N * eta[ii,], prob = P[ii,]))
}

# # Model I
psiI   <- 0.9
rhoRRI <- 0.9
rhoMMI <- 0.5

citaModI <- matrix(nrow = nCanT0 + 1, ncol = nCanT1 + 1)
rownames(citaModI) <- c(candidates_t0, "Non_Resp")
colnames(citaModI) <- c(candidates_t1, "Non_Resp")
citaModI[1:nCanT0, 1:nCanT1] <- P * c(eta) * rhoRRI * psiI
citaModI[(nCanT0 + 1), (nCanT1 + 1)] <- rhoMMI * (1-psiI)
citaModI[1:nCanT0, (nCanT1 + 1)] <- (1-rhoRRI) * psiI * rowSums(P * c(eta))
citaModI[(nCanT0 + 1), 1:nCanT1 ] <- (1-rhoMMI) * (1-psiI) * colSums(P * c(eta))
citaModI <- round_preserve_sum(citaModI * N)
DBcitaModI <- createBase(citaModI)

# Creating auxiliary information
DBcitaModI[,AuxVar := rnorm(nrow(DBcitaModI), mean = 45, sd = 10)]

# Selects a sample with unequal probabilities
res <- S.piPS(n = 3200, as.data.frame(DBcitaModI)[,"AuxVar"])
sam <- res[,1]
pik <- res[,2]
DBcitaModISam <- copy(DBcitaModI[sam,])
DBcitaModISam[,Pik := pik]

# Gross Flows estimation
estima <- estGF(sampleBase = DBcitaModISam, niter = 500, model = "I", colWeights = "Pik")
estima

Run the code above in your browser using DataLab