Learn R Programming

flexCWM (version 1.0)

gcwm: Flexible Cluster-Weighted Modeling

Description

Run the Generalized Linear Gaussian Cluster-Weighted Model

Usage

gcwm (Y, X, familyY="Gaussian", k=2, ic=c("BIC", "AIC", "ICL"), mY=1, 
  method="Nelder-Mead", initialization="random.soft", start.z=NULL, 
  iter.max=1000, threshold=1.0e-04, loglikplot=FALSE, seed=NULL)

Arguments

Y
numerical vector for the response variable.
X
matrix for the covariates.
familyY
the exponential distribution used for Y|x in each cluster; it can be:
  • "Gaussian"
  • "Poisson"
  • "Binomial"
  • "Gamma"
Default value is "Gaussian".
k
a vector containing the numbers of clusters to be tried. The one with the lowest information criterion is selected. Default value is 2.
ic
the information criteria by which the best model is selected when lencgth(k)>1. Possible values are
  • "BIC"
  • "AIC"
  • "ICL"
mY
When familyY="Binomial", it sets the sample size. Default value is 1 (Bernoulli distribution).
method
optimization method used in the M-step of the EM algorithm (see optim). Default value is "Nelder-Mead".
initialization
initialization strategy for the EM-algorithm. It can be:
  • "random.soft"
  • "random.hard"
  • "manual"
Default value is "random.soft".
start.z
matrix of soft or hard classification: it is used only if initialization="manual".
iter.max
maximum number of iterations in the EM-algorithm. Default value is 200.
threshold
threshold for Aitken acceleration procedure. Default value is 1.0e-04.
loglikplot
if TRUE, the log-likelihood values against the iterations are plotted. Default value FALSE.
seed
the seed for the random number generator, when random initializations are used; if NULL, current seed is not changed. Default value is NULL.

Value

  • This function returns a list of values related to the model selected. It contains:
  • Yresponse variable
  • Xcovariates
  • familyYexponential distribution used for Y|x in each cluster
  • pnumber of covariates
  • knumber of groups
  • nsample size
  • nparnumber of parameters
  • mYsample size, used when familyY="Binomial"
  • priorweights for the mixture components
  • muXcovariates means
  • VarXcovariates variances
  • PXmarginal distribution of X for each cluster
  • betaregression coefficients
  • muYmean of Y
  • dispYdispersion parameter of Y
  • VarFunYvariance function of Y
  • VarYvariance of Y
  • nuYwhen familyY="Gamma", the gamma distribution is parameterized according to muY and nuY (see McCullagh, P. and Nelder, J. 1989)
  • PYconditional distribution of Y|x for each cluster
  • iter.stopnumber of iterations performed in EM algorithm
  • zmatrix of posterior probabilities
  • groupclassification vector
  • loglikfinal log-likelihood value
  • AIC
  • BIC
  • ICL
  • callan object of class call

References

Ingrassia, S., Minotti, S. C., and Vittadini, G. (2012). Local statistical modeling via the cluster-weighted approach with elliptical distributions. Journal of Classification, 29(3), 363-401. Ingrassia, S., Minotti, S. C., Punzo, A., and Vittadini, G. (2012). Generalized linear Gaussian cluster-weighted modeling. arXiv.org e-print 1211.1171, available at: http://arxiv.org/abs/1211.1171. McCullagh, P. and Nelder, J. (1989). Generalized Linear Models. Chapman & Hall, Boca Raton, 2nd edition

See Also

flexCWM-package, tourism

Examples

Run this code
data(tourism)
Y <- tourism$overnights
X <- tourism$attendance
res <- gcwm(Y=Y,X=X,k=1:4,seed=1)
plot(cbind(Y,X),col=res$best$group)

Run the code above in your browser using DataLab