Estimate: Update an N-way table given target margins

Description

This function provides several estimating methods to up multiway table (referred as the seed) subject to known constrains/totals: Iterative proportional fitting procedure (ipfp), maximum likelihood method (ml), minimum chi-squared (chi2) and weighted least squares (lsq). Note that the targets can also be multi-dimensional.

Usage

Estimate(seed, target.list, target.data, method = "ipfp", keep.input = FALSE,
         ...)

Arguments

seed

The initial multi-dimensional array to be updated. Each cell must be non-negative if method is ipfp or strictly positive when method is ml, lsq or chi2.

target.list

A list of dimensions of the marginal target constrains in target.data. Each component of the list is an array whose cells indicate which dimension the corresponding margin relates to.

target.data

A list containing the data of the target marginal tables. Each component of the list is an array storing a margin. The list order must follow the ordering defined in target.list. Note that the cells of the arrays must be non-negative.

method

An optional character string indicating which method is to be used to update the seed. This must be on of the strings "ipfp", "ml", "chi2", or "lsq". Default is "ipfp".

keep.input

A Boolean indicating if seed, target.data and target.list when set to TRUE.

…

Additionals argument that can be passed to the functions Ipfp and ObtainModelEstimates. See their respective documentation for more details.

Value

An object of class mipfp is a list containing at least the following components:

x.hat

An array with the same dimension of seed whose margins match those specified in target.list.

p.hat

An array with the same dimension of x.hat containing the updated cell probabilities, i.e. x.hat / sum(x.hat).

error.margins

A list returning, for each margin, the absolute maximum deviation between the desired and generated margin.

conv

A boolean indicating whether the algorithm converged to a solution.

evol.stp.crit

The evolution of the stopping criterion over the iterations (if selected method is "ipfp")).

solnp.res

The estimation process uses the solnp optimisation function from the R package Rsolnp and solnp.res is the corresponding object returned by the solver (if selected method is not "ipfp").

method

The selected method for estimation.

call

The matched call.

The will be also added if keep.input has been set to TRUE: seed, target.data, target.list.

References

Bacharach, M. (1965). Estimating Nonnegative Matrices from Marginal Data. International Economic Review (Blackwell Publishing) 6 (3): 294-310.

Bishop, Y. M. M., Fienberg, S. E., Holland, P. W. (1975). Discrete Multivariate Analysis: Theory and Practice. MIT Press. ISBN 978-0-262-02113-5.

Deming, W. E., Stephan, F. F. (1940). On a Least Squares Adjustment of a Sampled Frequency Table When the Expected Marginal Totals are Known. Annals of Mathematical Statistics 11 (4): 427-444.

Fienberg, S. E. (1970). An Iterative Procedure for Estimation in Contingency Tables. Annals of Mathematical Statistics 41 (3): 907-917.

Little, R. J., Wu, M. M. (1991) Models for contingency tables with known margins when target and sampled populations differ. Journal of the American Statistical Association 86 (413): 87-95.

Lang, J.B. (2004) Multinomial-Poisson homogeneous models for contingency tables. Annals of Statistics 32(1): 340-383.

Stephan, F. F. (1942). Iterative method of adjusting frequency tables when expected margins are known. Annals of Mathematical Statistics 13 (2): 166-178.

Examples

Run this code

# NOT RUN {
# loading the data
data(spnamur, package = "mipfp")
# subsetting the data frame, keeping only the first 3 variables
spnamur.sub <- subset(spnamur, select = Household.type:Prof.status)
# true table
true.table <- table(spnamur.sub)
# extracting the margins
tgt.v1        <- apply(true.table, 1, sum)
tgt.v1.v2     <- apply(true.table, c(1,2), sum)
tgt.v2.v3     <- apply(true.table, c(2,3), sum)
tgt.list.dims <- list(1, c(1,2), c(2,3))
tgt.data      <- list(tgt.v1, tgt.v1.v2, tgt.v2.v3)
# creating the seed, a 10 pct sample of spnamur
seed.df <- spnamur.sub[sample(nrow(spnamur), round(0.10*nrow(spnamur))), ]
seed.table <- table(seed.df)
# applying one fitting method (ipfp)
r.ipfp <- Estimate(seed=seed.table, target.list=tgt.list.dims, 
                   target.data = tgt.data)
print(r.ipfp)
# }

Run the code above in your browser using DataLab