cwm(formulaY = NULL, familyY = gaussian, data, Xnorm = NULL, Xbin = NULL, Xpois = NULL, Xmult = NULL, modelXnorm = NULL, Xbtrials = NULL, k = 1:3, initialization = c("random.soft", "random.hard", "kmeans", "mclust", "manual"), start.z = NULL, seed = NULL, maxR = 1, iter.max = 1000, threshold = 1.0e-04, eps = 1e-100, parallel = FALSE)formula" (or one that can be coerced to that class): a symbolic description of the model to be fitted.
family function, a family function or the result of a call to a family function.
The following family functions are supported:
binomial(link = "logit")
gaussian(link = "identity")
Gamma(link = "log")
inverse.gaussian(link = "1/mu^2")
poisson(link = "log")
student.t(link = "identity")
Default value is gaussian(link = "identity").
data.frame, list, or environment with the variables needed to use formulaY.
Xnorm. The default is c("E", "V") for a single continuous covariate, and c("EII", "VII", "EEI", "VEI", "EVI", "VVI", "EEE", "VEE", "EVE", "EEV", "VVE", "VEV", "EVV", "VVV") for multivariate continuous covariates (see mixture:gpcm for details).
Xbin. If omitted, the maximum of each column in Xbin is used.
1:3.
"random.soft"
"random.hard"
"kmeans"
"mclust"
"manual"
Default value is "random.soft".
initialization = "manual".
NULL, current seed is not changed. Default value is NULL.
Xnorm. Default value is 1e-100.
TRUE, the package parallel is used for parallel computation. When several models are estimated, computational time is reduced. The number of cores to use may be set with the global option cl.cores; default value is detected using detectCores().
cwm object, which is a list of values related to the model selected. It contains:call.formula containing a symbolic description of the model fitted.data.frame with the variables needed to use formulaY.Xnorm, Xbin, Xpois, Xmult.XbtrialsXbin.posterior posterior probabilities
iter number of iterations performed in EM algorithm
k number of (fitted) mixture components.
size estimated size of the groups.
cluster classification vector
loglik final log-likelihood value
df overall number of estimated parameters
prior weights for the mixture components
IC list containing values of the information criteria
converged logical; TRUE if EM algorithm converged
GLModels a list; each element is related to a mixture component and contains:
model a "glm" class object.
sigma estimated local scale parameters of the conditional distribution of $Y$, when familyY is gaussian or student.t
t_df estimated degrees of freedom of the t distribution, when familyY is student.t
nuY estimated shape parameter, when familyY is Gamma. The gamma distribution is parameterized according to McCullagh & Nelder (1989, p. 30)
concomitant a list with estimated concomitant variables parameters for each mixture component
normal.d, multinomial.d, poisson.d, binomial.d marginal distribution of concomitant variables
normal.mu mixture component means for Xnorm
normal.Sigma mixture component covariance matrices for Xnorm
normal.model models fitted for Xnorm
multinomial.probs multinomial distribution probabilities for Xmult
poisson.lambda lambda parameters for Xpois
binomial.p binomial probabilities for Xbin
familyY = binomial, the response variable must be a matrix with two columns, where the first column is the number of "successes" and the second column is the number of "failures".
When several models have been estimated, methods summary and print consider the best model according to the information criterion in criterion, among the estimated models having a number of components among those in k an error distribution among those in familyY and a parsimonious model among those in modelXnorm.
Ingrassia, S., Minotti, S. C., and Vittadini, G. (2012). Local Statistical Modeling via the Cluster-Weighted Approach with Elliptical Distributions. Journal of Classification, 29(3), 363-401.
Ingrassia, S., Minotti, S. C., and Punzo, A. (2014). Model-based clustering via linear cluster-weighted models. Computational Statistics and Data Analysis, 71, 159-182.
Ingrassia, S., Punzo, A., and Vittadini, G. (2015). The Generalized Linear Mixed Cluster-Weighted Model. Journal of Classification, 32(forthcoming)
McCullagh, P. and Nelder, J. (1989). Generalized Linear Models. Chapman & Hall, Boca Raton, 2nd edition
Punzo, A. (2014). Flexible Mixture Modeling with the Polynomial Gaussian Cluster-Weighted Model. Statistical Modelling, 14(3), 257-291.
flexCWM-package
## an exemple with artificial data
data("ExCWM")
attach(ExCWM)
str(ExCWM)
# mixtures of binomial distributions
resXbin <- cwm(Xbin = Xbin, k = 1:2, initialization = "kmeans")
getParXbin(resXbin)
# Mixtures of Poisson distributions
resXpois <- cwm(Xpois = Xpois, k = 1:2, initialization = "kmeans")
getParXpois(resXpois)
# parsimonious mixtures of multivariate normal distributions
resXnorm <- cwm(Xnorm = cbind(Xnorm1,Xnorm2), k = 1:2, initialization = "kmeans")
getParXnorm(resXnorm)
## an exemple with real data
data("students")
attach(students)
str(students)
# CWM
fit2 <- cwm(WEIGHT ~ HEIGHT + HEIGHT.F , Xnorm = cbind(HEIGHT, HEIGHT.F),
k = 2, initialization = "kmeans", modelXnorm = "EEE")
summary(fit2, concomitant = TRUE)
plot(fit2)
Run the code above in your browser using DataLab