pattnpml.fit: NPML estimation for paired comparison models

Description

Fits a mixture model to overdispersed paired comparison data using non-parametric maximum likelihood (Aitkin, 1996a).

Usage

pattnpml.fit(formula, random = ~1, k = 1, design,
        tol = 0.5, startp = NULL, EMmaxit = 500, EMdev.change = 0.001,
        seed = NULL, pr.it = FALSE)

Value

The function produces an object of class pattNPML. The object contains the following 29 components:

coefficients: a named vector of coefficients (including the mass points). In case of Gaussian quadrature, the coefficient given at z corresponds to the standard deviation of the mixing distribution.
residuals: the difference between the true response and the empirical Bayes predictions.
fitted.values: the empirical Bayes predictions (Aitkin, 1996b) on the scale of the responses.
family: the `family' object used.
linear.predictors: the extended linear predictors $\hat{\eta}_{ik}$.
disparity: the disparity (-2logL) of the fitted mixture regression model.
deviance: the deviance of the fitted mixture regression model.
null.deviance: The deviance for the null model (just containing an intercept), comparable with ‘deviance.’
df.residual: the residual degrees of freedom of the fitted model (including the random part).
df.null: the residual degrees of freedom for the null model.
y: the (extended) response vector.
call: the matched call.
formula: the formula supplied.
random: the random term of the model formula.
data: the data argument.
model: the (extended) design matrix.
weights: the case weights initially supplied.
offset: the offset initially supplied.
mass.points: the fitted mass points.
masses: the mass point probabilities corresponding to the patterns.
sdev: a list of the two elements sdev$sdev and sdev$sdevk. The former is the estimated standard deviation of the Gaussian mixture components (estimated over all mixture components), and the latter gives the unequal or smooth component-specific standard deviations. All values are equal if lambda = 0.
shape: a list of the two elements shape$shape and shape$shapek, to be interpreted in analogy to sdev.
rsdev: estimated random effect standard deviation.
post.prob: a matrix of posteriori probabilities.
post.int: a vector of `posteriori intercepts' (as in Sofroniou et al. (2006)).
ebp: the empirical Bayes Predictions on the scale of the linear predictor. For compatibility with older versions.
EMiter: gives the number of iterations of the EM algorithm.
EMconverged: logical value indicating if the EM algorithm converged.
lastglm: the fitted glm object from the last EM iteration.
Misc: contains additional information relevant for the summary and plot functions, in particular the disparity trend and the EM trajectories.

For further details see the help file for function alldist in package npmlreg.

Arguments

formula: A formula defining the response (the count of the number of cases of each pattern) and the fixed effects (e.g. y ~ x).
random: A formula defining the random model. If there are three objects labelled o1, o2, o3, set random = ~o1+o2+o3 to model overdispersion. For more details, see below.
k: The number of mass points (latent classes). Up to 21 mass points are supported.
design: The design data frame for paired comparison data as generated using patt.design (mandatory, even if it is attached to the workspace!).
tol: The tol scalar (usually, $0 < $tol $\le 1$). This scalar sets the scaling factor for the locations of the initial mass points. A larger value means that the starting point locations are more widely spread.
startp: Optional numerical vector of length k specifying the starting probabilities for the mass points to initialise the EM algorithm. The default is to take Gaussian quadrature probabilities.
EMmaxit: The maximum number of EM iterations.
EMdev.change: Stops EM algorithm when deviance change falls below this value.
seed: Seed for random weights. If NULL, the seed is set using the system time.
pr.it: A dot is printed at each iteration cycle of the EM algorithm if set to TRUE.

Author

Originally translated from the GLIM 4 functions alldist and allvc (Aitkin & Francis, 1995) to R by Ross Darnell (2002). Modified, extended, and prepared for publication by Jochen Einbeck and John Hinde (2006). Adapted for paired comparison modelling by Reinhold Hatzinger and Brian Francis (2009).

Details

The function pattnpml.fit is a wrapper function for alldistPC which in turn is a modified version of the function alldist from the npmlreg package.

The non-parametric maximum likelihood (NPML) approach was introduced in Aitkin (1996) as a tool to fit overdispersed generalised linear models. The idea is to approximate the unknown and unspecified distribution of the random effect by a discrete mixture of exponential family densities, leading to a simple expression of the marginal likelihood which can then be maximised using a standard EM algorithm.

This function extends the NPML approach to allow fitting of overdispersed paired comparison models. It assumes that overdispersion arises because of dependence in the patterns. Fitting a non-parametric random effects term is equivalent to specifying distinct latent classes of response patterns.

The number of components k of the finite mixture has to be specified beforehand.

The EM algorithm used by the function takes the Gauss-Hermite masses and mass points as starting points. The position of the starting points can be concentrated or extended by setting tol smaller or larger, respectively; the initial mass point probabilities of the starting points can also be specified through startp.

Fitting models for overdispersion can be achieved by specifying the paired comparison items as additive terms in the random part of the model formula. A separate estimate for each item and for each mass point is produced.

Fitting subject covariate models with the same effect for each mass point component is achieved by specifying as part of the formula a) a subject factor giving a different estimate for each covariate combination b) an interaction of the chosen subject covariates with the objects. For models with subject factor covariates only, the first term is simply the interaction of all of the factor covariates.

Fitting subject covariate models with a different effect for each mass point component (sometimes called random coefficient models, see Aitkin, Francis, Hinde and Darnell, 2009, pp. 497) is possible by specifying an interaction of the subject covariates with the items in the random term, and also in the formula part. Thus the setting random = ~x:(o1+o2+o3 gives a model with a set of random slopes (one set for each mass point) and a set of random intercepts, one set for each mass point.

The AIC and BIC functions from the stats-package can be used.

References

Aitkin, M. (1996). A general maximum likelihood analysis of overdispersion in generalized linear models. Statistics and Computing, 6(3), 251--262. tools:::Rd_expr_doi("10.1007/BF00140869")

Aitkin, M., Francis, B., Hinde, J., & Darnell, R. (2009). Statistical Modelling in R. Oxford: Oxford University Press.

Einbeck, J., & Hinde, J. (2006). A Note on NPML Estimation for Exponential Family Regression Models with Unspecified Dispersion Parameter. Austrian Journal of Statistics, 35(2&3), 233--243.

Sofroniou, N., Einbeck, J., & Hinde, J. (2006). Analyzing Irish suicide rates with mixture models. Proceedings of the 21st International Workshop on Statistical Modelling in Galway, Ireland, 2006.

Examples

Run this code

# two latent classes for paired comparison data
dfr   <- patt.design(dat4, 4)
modPC <- pattnpml.fit(y ~ 1, random = ~o1 + o2 + o3, k = 2, design = dfr)
modPC

# estimated proportion of cases in each mixture component
apply(modPC$post.prob, 2, function(x){ sum(x * dfr$y / sum(dfr$y)) })

if (FALSE) {
# fitting a model for two latent classes and fixed categorical subject
# covariates to the Eurobarometer 55.2 data (see help("euro55.2.des"))
# on rankings of sources of information on scientific developments

model2cl <- pattnpml.fit(
  y ~ SEX:AGE4 + (SEX + AGE4):(TV + RAD + NEWSP + SCIMAG + WWW + EDINST) - 1,
  random = ~ TV + RAD + NEWSP + SCIMAG + WWW + EDINST,
  k = 2, design = euro55.2.des, pr.it = TRUE)
summary(model2cl)
BIC(model2cl)}

Run the code above in your browser using DataLab