UPmaxentropy: Maximum entropy sampling with fixed sample size and unequal probabilities

Description

Maximum entropy sampling with fixed sample size and unequal probabilities is implemented by means of a sequential method.

Usage

UPmaxentropy(pik) 
UPmaxentropypi2(pik)
UPMEqfromw(w,n)
UPMEpikfromq(q) 
UPMEpiktildefrompik(pik,eps=0.0000001)
UPMEsfromq(q)
UPMEpik2frompikw(pik,w)

Arguments

sample size.

pik

vector of prescribed inclusion probabilities.

eps

tolerance in Newton's method; by default is 1E-6.

matrix of the conditional selection probabilities for the sequential algorithm.

parameter vector of the maximum entropy design.

encoding

latin1

Details

The maximum entropy sampling maximizes the entropy criterion: $$I(p) = - \sum_s p(s)\log[p(s)]$$ The main procedure is UPmaxentropy that allows selecting a sample (vector of 0 and 1) from a given vector of inclusion probabilities. The procedure UPmaxentropypi2 returns the matrix of the joint inclusion probabilities from the first order inclusion probabilities vector. The other procedures are intermediate steps. They can be useful to run simulations as shown in the examples below. The procedure UPMEpiktildefrompik computes the vector of the inclusion probabilities (denoted piktilde) of a Poisson sampling from the vector of the inclusion probabilities of the maximum entropy sampling. The maximum entropy sampling is the conditional design given the fixed sample size. The vector w can be easily obtained by w=pikt/(1-pikt). Once piktilde and w are deduced from pik, a matrix of selection probabilities q can be derived from the sample size n and the vector w, by means of the procedure UPMEqfromw. Next, a sample can be selected from q by using UPMEsfromq. In order to generate several samples, it is more efficient to compute the matrix q (which needs some calculation), and then to use the procedure UPMEsfromq. The vector of the inclusion probabilities can be recomputed from q by using UPMEpikfromq, which allows to check the numerical precision of the algorithm. The procedure UPMEpik2frompikw computes the matrix of the joint inclusion probabilities from q and w.

References

Chen, S.X., Liu, J.S. (1997). Statistical applications of the Poisson-binomial and conditional Bernoulli distributions, Statistica Sinica, 7, 875-892; Deville, J.-C. (2000). Note sur l'algorithme de Chen, Dempster et Liu. Technical report, CREST-ENSAI, Rennes. Matei, A., Till�, Y. (2005) Evaluation of variance approximations and estimators in maximum entropy sampling with unequal probability and fixed sample size, Journal of Official Statistics, vol. 21, no 1, to appear.

Examples

Run this code

############
## Example 1
############
# Simple example. Selection of a sample.
pik=c(0.07,0.17,0.41,0.61,0.83,0.91)
# First method
UPmaxentropy(pik)
# Second method by using the intermediate procedures.
n=sum(pik)
pikt=UPMEpiktildefrompik(pik)
w=pikt/(1-pikt)
q=UPMEqfromw(w,n)
UPMEsfromq(q)
# The matrix of inclusion probabilities
# First method: direct computation from pik
UPmaxentropypi2(pik)
# Second method: computation from pik and w
UPMEpik2frompikw(pik,w)
############
## Example 2
############
# Selection of a sample of Belgian municipalities.
data(belgianmunicipalities)
attach(belgianmunicipalities)
n=200
pik=inclusionprobabilities(averageincome,n)
s=UPmaxentropy(pik)
as.character(Commune[s==1])
pi2=UPmaxentropypi2(pik)
rowSums(pi2)/pik/n
############
## Example 3
############
# Selection of 200 samples of Belgian municipalities.
# Once matrix q is computed, the selection of a sample is very quick.
# Simulations are thus possible.
data(belgianmunicipalities)
attach(belgianmunicipalities)
pik=inclusionprobabilities(averageincome,200)
pik=pik[pik!=1]
n=sum(pik)
pikt=UPMEpiktildefrompik(pik)
w=pikt/(1-pikt)
q=UPMEqfromw(w,n)
N=length(pik)
tt=rep(0,times=N)
sim=200
for(i in 1:sim) tt = tt+UPMEsfromq(q)
tt=tt/sim
sum(abs(tt-pik))

Run the code above in your browser using DataLab