flexmixedruns fits a latent class
mixture (clustering) model where some variables are continuous
and modelled within the mixture components by Gaussian distributions
and some variables are categorical and modelled within components by
independent multinomial distributions. The fit is by maximum
likelihood estimation computed with the EM-algorithm. The number of
components can be estimated by the BIC.Note that at least one categorical variable is needed, but it is possible to use data without continuous variable.
flexmixedruns(x,diagonal=TRUE,xvarsorted=TRUE,
continuous,discrete,ppdim=NULL,initial.cluster=NULL,
simruns=20,n.cluster=1:20,verbose=TRUE,recode=TRUE,
allout=TRUE,control=list(minprior=0.001),silent=TRUE)TRUE, Gaussian models are fitted
restricted to diagonal covariance matrices. Otherwise, covariance
matrices are unrestricted. TRUE is consistent with the
"within class independence" assumption for the multTRUE, the continuous variables
are assumed to be the first ones, and the categorical variables to
be behind them.xvarsorted=TRUE, a single integer,
number of continuous variables.xvarsorted=TRUE, a single integer,
number of categorical variables.recode=TRUE, this can be omitted and is computed
automatically.cluster
parameter in flexmix and should only be specified if
simruns=1 and n.cluster is a single number.
Either a matrix with n.cluster columns of initialTRUE, some information about the
different runs of the EM algorithm is given out.TRUE, the function
discrete.recode is applied in order to recode categorical
data so that the lcmixed-method can use it. Only set this
to FALSE if your data already has that formaTRUE, the regular
flexmix-output is given out for every single number of
clusters, which can create a huge output object.flexmix, for
details see the help page of FLXcontrol-class.try-function. If FALSE, error messages from
failed runs of flexmix are suppressed. (The information that
a flexmix-error occuflexmix object with
optimal number of components.allout=TRUE, list of flexmix output objects
for all numbers of components, for details see the help page of
flexmix-class. Slots that can be used
include for example cluster and components. So
if fo is the flexmixedruns-output object,
fo$flexout[[fo$optimalk]]@cluster gives a component number
vector for the observations (maximum posterior rule), and
fo$flexout[[fo$optimalk]]@components gives the estimated
model parameters, which for lcmixed and therefore
flexmixedruns are called
[object Object],[object Object],[object Object]
If allout=FALSE, only the flexmix output object for the
optimal number of components, i.e., the [[fo$optimalk]]
indexing above can then be omitted.flexmixedruns as category
1, 2, 3 etc.flexmixedruns tolerates these
and treats them as non-optimal runs. (Higher simruns or
different control may be required to get a valid solution.)
General documentation on flexmix can be found in
Friedrich Leisch's "FlexMix: A General Framework for Finite Mixture
Models and Latent Class Regression in R",
lcmixed, flexmix,
FLXcontrol-class,
flexmix-class,
discrete.recode.set.seed(776655)
v1 <- rnorm(100)
v2 <- rnorm(100)
d1 <- sample(1:5,100,replace=TRUE)
d2 <- sample(1:4,100,replace=TRUE)
ldata <- cbind(v1,v2,d1,d2)
fr <- flexmixedruns(ldata,
continuous=2,discrete=2,simruns=2,n.cluster=2:3,allout=FALSE)
print(fr$optimalk)
print(fr$optsummary)
print(fr$flexout@cluster)
print(fr$flexout@components)Run the code above in your browser using DataLab