satMarML: Fitting Saturated Models for the Marginal Probabilities of Categorization via Maximum Likelihood under MAR and MCAR assumption

Description

satMarML fits saturated models for the marginal probabilities of categorization as well as missing at random (MAR) or missing completely at random (MCAR) models for the missingness mechanism by maximum likelihood (ML) methodology. It is based on input data of a readCatdata object. Linear, log-linear and functional linear models may be subsequently fitted, respectively, using functions linML(), loglinML() and funlinWLS().

Usage

satMarML(catdataobj, missing="MAR", method="EM", start, zero, 
maxit=100, trace=0, epsilon1=1e-6, epsilon2=1e-6, zeroN, digits)

Arguments

catdataobj

readCatdata object.

missing

the covariance matrix (based on a Fisher information matrix) of the estimates for the marginal probabilities of categorization may be computed under "MAR" (default) or under "MCAR" model.

method

the iterative processes available are: "EM" (Expectation-Maximization), "FS-MCAR" (Fisher scoring under MCAR), and "NR/FS-MAR" (Fisher scoring under MAR or Newton-Raphson under MAR or MCAR); "EM" is the default option, because it is the most stable, although in some cases, the default maximum number of iterations may not be enough due to its slow rate of convergence; as the ML estimates of the marginal probabilities are the same either under MAR or MCAR, one may use the iterative process "FS-MCAR" even though one is willing to assume MAR; "FS-MCAR" is generally more stable than "NR/FS-MAR" when there are sampling zeros, but both iterative processes still may easily jump to a negative estimate and/or generate a singular covariance matrix.

start

by default, the function uses the proportions of the complete data as starting values in the iterative process, but the current argument allows the user to inform an alternative starting value for all marginal probabilities except the one corresponding to the last category of each multinomial, i.e., a vector of dimension S*(R-1), where S represents the number of subpopulations and R, the number of response categories.

zero

when there are sampling zeros in the complete data, these frequencies are replaced by small values just for the computation of the starting values; this avoids the use of starting values on the boundary of the parameter space and also allows to incorporate information from other missingness patterns in the EM iterative process; by default, the function replaces the values by 1/(R*ns1), where ns1 is the sample size associated to the subpopulation with completely classified data; the user may indicate an alternative vector with S values to be used for each subpopulation or an unique value to be used for all subpopulations; the values must be non-negative and less or equal to 0.5.

maxit

the maximum number of iterations (the default is 100).

trace

the alternatives are: 0 for no printing (default), 1 for showing only the value of the likelihood ratio statistic at each iteration of the iterative process, and 2 for including also the parameter estimates at each iteration.

epsilon1

the convergence criterion of the iterative process is attained if the absolute difference of the values of the likelihood ratio statistic of successive iterations is less than the value defined in epsilon1, 1e-6 by default.

epsilon2

the convergence criterion of the iterative process is attained if the absolute differences of the values of estimates for all parameters of the marginal probabilities of categorization in consecutive iterations are less than the value defined in epsilon2, 1e-6 by default.

zeroN

values used to replace null frequencies in the denominator of the Neyman statistic; by default, the function replaces the values by 1/(R*nst), where nst is the sample size of the missingness pattern associated to the corresponding subpopulation; the user may indicate alternative values in a matrix with S rows and an additional column relatively to the number of columns of Rp; the first column relates to the completely categorized "missingness" patterns, and the remaining columns to the other missingness patterns as they appear in Rp; the values must be non-negative and less or equal to 0.5.

digits

integer value indicating the number of decimal places to round results when shown by print and summary; this argument works also when specified directly in both generic functions; default value is 4.

Value

theta: vector of ML estimates for all product-multinomial probabilities under the saturated model for the marginal probabilities of categorization and an assumption of an ignorable missingness mechanism; this is the same under MAR and under MCAR.
Vtheta: corresponding estimated covariance matrix based on the Fisher information matrix obtained under the assumed missingness mechanism, leading to different results depending whether the assumption is MAR or MCAR).
QvMCAR: likelihood ratio statistic for the conditional test of MCAR given a MAR assumption.
QpMCAR: Pearson statistic for the conditional test of MCAR given a MAR assumption.
QnMCAR: Neyman statistic for the conditional test of MCAR given a MAR assumption.
glMCAR: degrees of freedom for the conditional tests of MCAR given a MAR assumption.
alphast: ML estimates for the conditional probabilities of missingness under the assumed missingness mechanism (MAR or MCAR).
yst: ML estimates for the augmented frequencies under the saturated model for the marginal probabilities and the assumed missingness mechanism (MAR or MCAR).

Details

The generic functions print and summary are used to print the results and to obtain a summary thereof.

References

Paulino, C.D. e Singer, J.M. (2006). Analise de dados categorizados (in Portuguese). Sao Paulo: Edgard Blucher.

Poleto, F.Z. (2006). Analise de dados categorizados com omissao (in Portuguese). Dissertacao de mestrado. IME-USP. http://www.poleto.com/missing.html.

Poleto, F.Z., Singer, J.M. e Paulino, C.D. (2007). Analyzing categorical data with complete or missing responses using the Catdata package. Unpublished vignette. http://www.poleto.com/missing.html.

Poleto, F.Z., Singer, J.M. e Paulino, C.D. (2012). A product-multinomial framework for categorical data analysis with missing responses. To appear in Brazilian Journal of Probability and Statistics. http://imstat.org/bjps/papers/BJPS198.pdf.

Singer, J. M., Poleto, F. Z. and Paulino, C. D. (2007). Catdata: software for analysis of categorical data with complete or missing responses. Actas de la XII Reunion Cientifica del Grupo Argentino de Biometria y I Encuentro Argentino-Chileno de Biometria. http://www.poleto.com/SingerPoletoPaulino2007GAB.pdf.

Examples

Run this code

#Example 13.4 of Paulino and Singer (2006)
e134.TF<-c(12,4,5,2, 50,31, 27,12)
e134.Zp<-cbind(kronecker(diag(2),rep(1,2)),kronecker(rep(1,2),diag(2)))
e134.Rp<-c(2,2)
e134.catdata<-readCatdata(TF=e134.TF,Zp=e134.Zp,Rp=e134.Rp)
e134.satmcarml<-satMarML(e134.catdata,missing="MCAR")
e134.satmarml<-satMarML(e134.catdata,method="FS-MCAR")
e134.satmarml2<-satMarML(e134.catdata,method="NR/FS-MAR")
e134.satmcarml

#Compare the estimates of the probabilities, standard errors, 
#number of iterations and augmented frequencies
summary(e134.satmcarml)
summary(e134.satmarml)
summary(e134.satmarml2)

#Example 1 of Poleto et al (2012)
smoking.TF<-rbind(c(167,17,19,10,1,3,52,10,11, 176,24,121, 28,10,12),
				  c(120,22,19, 8,5,1,39,12,12, 103, 3, 80, 31, 8,14))

smoking.Zp <- kronecker(t(rep(1,2)),
					cbind(kronecker(diag(3),rep(1,3)),
  				    kronecker(rep(1,3),diag(3))))

smoking.Rp<-rbind(c(3,3),c(3,3))
smoking.catdata<-readCatdata(TF=smoking.TF,Zp=smoking.Zp,Rp=smoking.Rp)
smoking.catdata

smoking.satmcarml<-satMarML(smoking.catdata,missing="MCAR")
smoking.satmarml<-satMarML(smoking.catdata,method="FS-MCAR")
smoking.satmarml2<-satMarML(smoking.catdata,method="NR/FS-MAR")
smoking.satmarml
summary(smoking.satmcarml)
summary(smoking.satmarml)
summary(smoking.satmarml2)

Run the code above in your browser using DataLab