Learn R Programming

flexCWM (version 1.4)

cwm: Fit for the CWM

Description

Maximum likelihood fitting of the cluster-weighted model by the EM algorithm.

Usage

cwm(formulaY = NULL, familyY = gaussian, data, Xnorm = NULL, Xbin = NULL,
  Xpois = NULL, Xmult = NULL, modelXnorm = NULL, Xbtrials = NULL, k = 1:3, 
  initialization = c("random.soft", "random.hard", "kmeans", "mclust", "manual"), 
  start.z = NULL, seed = NULL, maxR = 1, iter.max = 1000, threshold = 1.0e-04, 
  eps = 1e-100, parallel = FALSE)

Arguments

formulaY
an optional object of class "formula" (or one that can be coerced to that class): a symbolic description of the model to be fitted.
familyY
a description of the error distribution and link function to be used for the conditional distribution of $Y$ in each mixture component. This can be a character string naming a family function
data
an optional data.frame, list, or environment with the variables
Xnorm, Xbin, Xpois, Xmult
an optional matrix containing variables to be used for marginalization having normal, binomial, Poisson and multinomial distributions.
modelXnorm
an optional vector of character strings indicating the parsimonious models to be fitted for variables in Xnorm. The default is c("E", "V") for a single continuous covariate, and c("EII", "VII", "EEI", "VEI", "EVI", "VVI",
Xbtrials
an optional vector containing the number of trials for each column in Xbin. If omitted, the maximum of each column in Xbin is used.
k
an optional vector containing the numbers of mixture components to be tried. Default value is 1:3.
initialization
an optional character string. It sets the initialization strategy for the EM-algorithm. It can be:
  • "random.soft"
  • "random.hard"
  • "kmeans"
  • "mclust"
  • "manual"<
start.z
matrix of soft or hard classification: it is used only if initialization = "manual".
seed
an optional scalar. It sets the seed for the random number generator, when random initializations are used; if NULL, current seed is not changed. Default value is NULL.
maxR
number of initializations to be tried. Default value is 1.
iter.max
an optional scalar. It sets the maximum number of iterations in the EM-algorithm. Default value is 200.
threshold
an optional scalar. It sets the threshold for the Aitken acceleration procedure. Default value is 1.0e-04.
eps
an optional scalar. It sets the smallest value for eigenvalues of covariance matrices for Xnorm. Default value is 1e-100.
parallel
When TRUE, the package parallel is used for parallel computation. When several models are estimated, computational time is reduced. The number of cores to use may be s

Value

  • This function returns a class cwm object, which is a list of values related to the model selected. It contains:
  • callan object of class call.
  • formulaYan object of class formula containing a symbolic description of the model fitted.
  • familyYthe distribution used for the conditional distribution of $Y$ in each mixture component.
  • dataa data.frame with the variables needed to use formulaY.
  • concomitanta list containing Xnorm, Xbin, Xpois, Xmult.
  • Xbtrialsnumber of trials used for Xbin.
  • modelsa list; each element is related to one of the models fitted. Each element is a list and contains:
    • posterior
    {posterior probabilities}
  • iter
  • {number of iterations performed in EM algorithm}
  • k
  • {number of (fitted) mixture components.}
  • size
  • {estimated size of the groups.}
  • cluster
  • {classification vector}
  • loglik
  • {final log-likelihood value}
  • df
  • {overall number of estimated parameters}
  • prior
  • {weights for the mixture components}
  • IC
  • {list containing values of the information criteria }
  • converged
  • {logical; TRUE if EM algorithm converged}
  • GLModels
  • {a list; each element is related to a mixture component and contains:}
    • model
    {a "glm" class object.}
  • sigma
  • {estimated local scale parameters of the conditional distribution of $Y$, when familyY is gaussian or student.t}
  • t_df
  • {estimated degrees of freedom of the t distribution, when familyY is student.t}
  • nuY
  • {estimated shape parameter, when familyY is Gamma. The gamma distribution is parameterized according to McCullagh & Nelder (1989, p. 30)}

item

  • concomitant {a list with estimated concomitant variables parameters for each mixture component}
  • normal.mu
  • normal.Sigma
  • normal.model
  • multinomial.probs
  • poisson.lambda
  • binomial.p

itemize

  • normal.d, multinomial.d, poisson.d, binomial.d

code

Xbin

Details

When familyY = binomial, the response variable must be a matrix with two columns, where the first column is the number of "successes" and the second column is the number of "failures". When several models have been estimated, methods summary and print consider the best model according to the information criterion in criterion, among the estimated models having a number of components among those in k an error distribution among those in familyY and a parsimonious model among those in modelXnorm.

References

Ingrassia, S., Minotti, S. C., and Vittadini, G. (2012). Local Statistical Modeling via the Cluster-Weighted Approach with Elliptical Distributions. Journal of Classification, 29(3), 363-401. Ingrassia, S., Minotti, S. C., and Punzo, A. (2014). Model-based clustering via linear cluster-weighted models. Computational Statistics and Data Analysis, 71, 159-182. Ingrassia, S., Punzo, A., and Vittadini, G. (2015). The Generalized Linear Mixed Cluster-Weighted Model. Journal of Classification, 32(forthcoming) McCullagh, P. and Nelder, J. (1989). Generalized Linear Models. Chapman & Hall, Boca Raton, 2nd edition Punzo, A. (2014). Flexible Mixture Modeling with the Polynomial Gaussian Cluster-Weighted Model. Statistical Modelling, 14(3), 257-291.

See Also

flexCWM-package

Examples

Run this code
## an exemple with artificial data
data("ExCWM")
attach(ExCWM)
str(ExCWM)

# mixtures of binomial distributions
resXbin <- cwm(Xbin = Xbin, k = 1:2, initialization = "kmeans")
getParXbin(resXbin)

# Mixtures of Poisson distributions
resXpois <- cwm(Xpois = Xpois, k = 1:2, initialization = "kmeans")
getParXpois(resXpois)

# parsimonious mixtures of multivariate normal distributions
resXnorm <- cwm(Xnorm = cbind(Xnorm1,Xnorm2), k = 1:2, initialization = "kmeans")
getParXnorm(resXnorm)

## an exemple with real data
data("students")
attach(students)
str(students)
# CWM
fit2 <- cwm(WEIGHT ~ HEIGHT + HEIGHT.F , Xnorm = cbind(HEIGHT, HEIGHT.F), 
  k = 2, initialization = "kmeans", modelXnorm = "EEE")
summary(fit2, concomitant = TRUE)
plot(fit2)

Run the code above in your browser using DataLab