gamlssNP: A function to fit finite mixtures using the gamlss family of distributions

Description

This function will fit a finite (or normal) mixture distribution where the kernel distribution can belong to any gamlss family of distributions using the EM algorithm. The function is based on functions alldist() and allvc of the npmlreg package of Jochen Einbeck, John Hinde and Ross Darnell.

Usage

gamlssNP(formula, random = ~1, family = NO(), data = NULL, K = 4, 
          mixture = c("np", "gq"), 
          tol = 0.5, weights, pluginz, control = NP.control(...), 
          g.control = gamlss.control(trace = FALSE), ...)

Arguments

Value

The function gamlssNP produces an object of class "gamlssNP". This object contain several components.familythe name of the gamlss familytypethe type of distribution which in this case is "Mixture"parametersthe parameters for the kernel gamlss family distributioncallthe call of the gamlssNP functionythe response variablebdthe binomial demominator, only for BI and BB modelscontrolthe NP.control settingsweightsthe vector of weights of te expanded fitG.deviancethe global devianceNthe number of observations in the fitrqresa function to calculate the normalized (randomized) quantile residuals of the object (here is the gamlss object rather than gamlssNP and it should change??)iterthe number of external iterations in the last gamlss fitting (?? do we need this?)typethe type of the distribution or the response variable here set to "Mixture"methodwhich algorithm is used for the gamlss fit, RS(), CG() or mixed()contraststhe type of contrasts use in the fitconvergedwhether the gamlss fit has convergedresidualsthe normalized (randomized) quantile residuals of the modelmu.fvthe fitted values of the extended mu model, also sigma.fv, nu.fv, tau.fv for the other parameters if presentmu.lpthe linear predictor of the extended mu model, also sigma.lp, nu.lp, tau.lp for the other parameters if presentmu.wvthe working variable of the extended mu model, also sigma.wv, nu.wv, tau.wv for the other parameters if presentmu.wtthe working weights of the mu model, also sigma.wt, nu.wt, tau.wt for the other parameters if presentmu.linkthe link function for the mu model, also sigma.link, nu.link, tau.link for the other parameters if presentmu.termsthe terms for the mu model, also sigma.terms, nu.terms, tau.terms for the other parameters if presentmu.xthe design matrix for the mu, also sigma.x, nu.x, tau.x for the other parameters if presentmu.qrthe QR decomposition of the mu model, also sigma.qr, nu.qr, tau.qr for the other parameters if presentmu.coefficientsthe linear coefficients of the mu model, also sigma.coefficients, nu.coefficients, tau.coefficients for the other parameters if presentmu.formulathe formula for the mu model, also sigma.formula, nu.formula, tau.formula for the other parameters if presentmu.dfthe mu degrees of freedom also sigma.df, nu.df, tau.df for the other parameters if presentmu.nl.dfthe non linear degrees of freedom, also sigma.nl.df, nu.nl.df, tau.nl.df for the other parameters if presentdf.fitthe total degrees of freedom use by the modeldf.residualthe residual degrees of freedom left after the model is fitteddatathe original data setEMiterthe number of EM iterationsEMconvergedwhether the EM has convergedallresidualsthe residuas for the long fitmass.pointsthe estimates mass point (if "np" mixture is used)Kthe number of mass points usedpost.probcontains a matrix of posteriori probabilities,probthe estimated mixture probalilitiesaicthe Akaike information criterionsbcthe Bayesian information criterionformulathe formula used in the expanded fitrandomthe random effect formulapweightsprior weightsebpthe Empirical Bayes Predictions (Aitkin, 1996b) on the scale of the linear predictorNote that in case of Gaussian quadrature, the coefficient given at 'z' in coefficients corresponds to the standard deviation of the mixing distribution. As a by-product, gamlssNP produces a plot showing the global deviance against the iteration number. Further, a plot with the EM trajectories is given. The x-axis corresponds to the iteration number, and the y-axis to the value of the mass points at a particular iteration. This plot is not produced when mixture is set to "gq"

Details

The function gamlssNP() is a modification of the R functions alldist() and allvc created by Jochen Einbeck and John Hinde. Both functions were originally created by Ross Darnell (2002). Here the two functions are merged to one gamlssNP and allows finite mixture from gamlss family of distributions. The following are comments from the original Einbeck and Hinde documentation. "The nonparametric maximum likelihood (NPML) approach was introduced in Aitkin (1996) as a tool to fit overdispersed generalized linear models. Aitkin (1999) extended this method to generalized linear models with shared random effects arising through variance component or repeated measures structure. Applications are two-stage sample designs, when firstly the primary sampling units (the upper-level units, e.g. classes) and then the secondary sampling units (lower-level units, e.g. students) are selected, or longitudinal data. Models of this type have also been referred to as multi-level models (Goldstein, 2003). This R function is restricted to 2-level models. The idea of NPML is to approximate the unknown and unspecified distribution of the random effect by a discrete mixture of k exponential family densities, leading to a simple expression of the marginal likelihood, which can then be maximized using a standard EM algorithm. When option 'gq' is set, then Gauss-Hermite masses and mass points are used and considered as fixed, otherwise they serve as starting points for the EM algorithm. The position of the starting points can be concentrated or extended by setting tol smaller or larger than one, respectively. Variance component models with random coefficients (Aitkin, Hinde & Francis, 2005, p. 491) are also possible, in this case the option random.distribution is restricted to the setting 'np' . The weights have to be understood as frequency weights, i.e. setting all weights equal to 2 will duplicate each data point and hence double the disparity and deviance. Warning: There might be some options and circumstances which had not been tested and where the weights do not work." Note that in keeping with the gamlss notation disparity is called global deviance.

References

Aitkin, M. and Francis, B. (1995). Fitting overdispersed generalized linear models by nonparametric maximum likelihood. GLIM Newsletter 25 , 37-45. Aitkin, M. (1996a). A general maximum likelihood analysis of overdispersion in generalized linear models. Statistics and Computing 6 , 251-262. Aitkin, M. (1996b). Empirical Bayes shrinkage using posterior random effect means from nonparametric maximum likelihood estimation in general random effect models. Statistical Modelling: Proceedings of the 11th IWSM 1996 , 87-94. Aitkin, M., Francis, B. and Hinde, J. (2005) Statistical Modelling in GLIM 4. Second Edition, Oxford Statistical Science Series, Oxford, UK. Einbeck, J. & Hinde, J. (2005). A note on NPML estimation for exponential family regression models with unspecified dispersion parameter. Technical Report IRL-GLWY-2005-04, National University of Ireland, Galway. Einbeck, J. Darnell R. and Hinde J. (2006) npmlreg: Nonparametric maximum likelihood estimation for random effect models, R package version 0.34 Hinde, J. (1982). Compound Poisson regression models. Lecture Notes in Statistics 14 ,109-121. Rigby, R. A. and Stasinopoulos D. M. (2005). Generalized additive models for location, scale and shape, (with discussion), Appl. Statist., 54, part 3, pp 507-554. Stasinopoulos D. M., Rigby R.A. and Akantziliotou C. (2003) Instructions on how to use the GAMLSS package in R. Accompanying documentation in the current GAMLSS help files, (see also http://www.gamlss.org/).

Examples

Run this code

data(enzyme)
# equivalent model using gamlssNP
mmNP1 <- gamlssNP(act~1, data=enzyme, random=~1,family=NO, K=2)
mmNP2 <- gamlssNP(act~1, data=enzyme, random=~1, sigma.fo=~MASS, family=NO, K=2)
AIC(mmNP1, mmNP2)

Run the code above in your browser using DataLab