Computes an ML-estimator for clusterwise linear regression under a
regression mixture model with Normal errors. Parameters are
proportions, regression coefficients and error variances, all
independent of the values of the independent variable, and all may
differ for different clusters. Computation is by the EM-algorithm. The
number of clusters is estimated via the Bayesian Information Criterion
(BIC). Note that package `flexmix`

has more sophisticated tools to do the
same thing and is recommended. The functions are kept in here only for
compatibility reasons.

`regmix(indep, dep, ir=1, nclust=1:7, icrit=1.e-5, minsig=1.e-6, warnings=FALSE)`regem(indep, dep, m, cln, icrit=1.e-5, minsig=1.e-6, warnings=FALSE)

indep

numerical matrix or vector. Independent variables.

dep

numerical vector. Dependent variable.

ir

positive integer. Number of iteration runs for every number of clusters.

nclust

vector of positive integers. Numbers of clusters.

icrit

positive numerical. Stopping criterion for the iterations (difference of loglikelihoods).

minsig

positive numerical. Minimum value for the variance parameters (likelihood is unbounded if variances are allowed to converge to 0).

warnings

logical. If `TRUE`

, warnings are given during the
EM iteration in case of collinear regressors, too small mixture
components and error variances
smaller than minimum. In the former two cases, the algorithm is
terminated without a result, but an optimal solution is still
computed from other algorithm runs (if there are others). In the
latter case, the corresponding variance is set to the minimum.

cln

positive integer. (Single) number of clusters.

m

matrix of positive numericals. Number of columns must be
`cln`

. Number of rows must be number of data points. Columns
must add up to 1. Initial configuration for the EM iteration in
terms of a probabilty vector for every point which gives its degree
of membership to every cluster. As generated by `randcmatrix`

.

`regmix`

returns a list
containing the components ```
clnopt, loglik, bic,
coef, var, eps, z, g
```

.

`regem`

returns a list
containing the components ```
loglik,
coef, var, z, g, warn
```

.

optimal number of clusters according to the BIC.

loglikelihood for the optimal model.

vector of BIC values for all numbers of clusters in `nclust`

.

matrix of regression coefficients. First row: intercept parameter. Second row: parameter of first independent variable and so on. Columns corresponding to clusters.

vector of error variance estimators for the clusters.

vector of cluster proportion estimators.

matrix of estimated a posteriori probabilities of the points
(rows) to be generated by the clusters (columns). Compare input
argument `m`

.

integer vector of estimated cluster numbers for the points
(via argmax over `z`

).

logical. `TRUE`

if one of the estimated clusters has
too few points and/or collinear regressors.

The result of the EM iteration depends on the initial configuration,
which is generated randomly by `randcmatrix`

for `regmix`

.
`regmix`

calls `regem`

. To provide the initial configuration
manually, use parameter `m`

of
`regem`

directly. Take a look at the example about how to
generate `m`

if you want to specify initial parameters.

The original paper DeSarbo and Cron (1988) suggests the AIC for
estimating the number of clusters. The use of the BIC is advocated by
Wedel and DeSarbo (1995). The BIC is defined here as
`2*loglik - log(n)*((p+3)*cln-1)`

, `p`

being the number of
independent variables, i.e., the larger the better.

See the entry for the input parameter `warnings`

for the
treatment of several numerical problems.

DeSarbo, W. S. and Cron, W. L. (1988) A maximum likelihood methodology
for clusterwise linear regression, *Journal of
Classification* 5, 249-282.

Wedel, M. and DeSarbo, W. S. (1995) A mixture likelihood approach for
generalized linear models, *Journal of
Classification* 12, 21-56.

Regression mixtures can also (and probably better) be computed with the
flexmix package, see `flexmix`

. (When I first write
the `regmix`

-function, `flexmix`

didn't exist.)

`fixreg`

for fixed point clusters for clusterwise linear regression.

`EMclust`

for Normal mixture model fitting (non-regression).

# NOT RUN { # This apparently gives slightly different # but data-analytically fine results # on some versions of R. set.seed(12234) data(tonedata) attach(tonedata) rmt1 <- regmix(stretchratio,tuned,nclust=1:2) # nclust=1:2 makes the example fast; # a more serious application would rather use the default. rmt1$g round(rmt1$bic,digits=2) # start with initial parameter values cln <- 3 n <- 150 initcoef <- cbind(c(2,0),c(0,1),c(0,2.5)) initvar <- c(0.001,0.0001,0.5) initeps <- c(0.4,0.3,0.3) # computation of m from initial parameters m <- matrix(nrow=n, ncol=cln) stm <- numeric(0) for (i in 1:cln) for (j in 1:n){ m[j,i] <- initeps[i]*dnorm(tuned[j],mean=initcoef[1,i]+ initcoef[2,i]*stretchratio[j], sd=sqrt(initvar[i])) } for (j in 1:n){ stm[j] <- sum(m[j,]) for (i in 1:cln) m[j,i] <- m[j,i]/stm[j] } rmt2 <- regem(stretchratio, tuned, m, cln) # }