regmix: Mixture Model ML for Clusterwise Linear Regression

Description

Computes an ML-estimator for clusterwise linear regression under a regression mixture model with Normal errors. Parameters are proportions, regression coefficients and error variances, all independent of the values of the independent variable, and all may differ for different clusters. Computation is by the EM-algorithm. The number of clusters is estimated via the Bayesian Information Criterion (BIC). Note that package flexmix has more sophisticated tools to do the same thing and is recommended. The functions are kept in here only for compatibility reasons.

Usage

regmix(indep, dep, ir=1, nclust=1:7, icrit=1.e-5, minsig=1.e-6, warnings=FALSE)
regem(indep, dep, m, cln, icrit=1.e-5, minsig=1.e-6, warnings=FALSE)

Arguments

indep

numerical matrix or vector. Independent variables.

dep

numerical vector. Dependent variable.

positive integer. Number of iteration runs for every number of clusters.

nclust

vector of positive integers. Numbers of clusters.

icrit

positive numerical. Stopping criterion for the iterations (difference of loglikelihoods).

minsig

positive numerical. Minimum value for the variance parameters (likelihood is unbounded if variances are allowed to converge to 0).

warnings

logical. If TRUE, warnings are given during the EM iteration in case of collinear regressors, too small mixture components and error variances smaller than minimum. In the former two cases, the algorithm is terminated without

cln

positive integer. (Single) number of clusters.

matrix of positive numericals. Number of columns must be cln. Number of rows must be number of data points. Columns must add up to 1. Initial configuration for the EM iteration in terms of a probabilty vector for every point which

Value

regmix returns a list containing the components clnopt, loglik, bic, coef, var, eps, z, g.
regem returns a list containing the components loglik, coef, var, z, g, warn.
clnoptoptimal number of clusters according to the BIC.
loglikloglikelihood for the optimal model.
bicvector of BIC values for all numbers of clusters in nclust.
coefmatrix of regression coefficients. First row: intercept parameter. Second row: parameter of first independent variable and so on. Columns corresponding to clusters.
varvector of error variance estimators for the clusters.
epsvector of cluster proportion estimators.
zmatrix of estimated a posteriori probabilities of the points (rows) to be generated by the clusters (columns). Compare input argument m.
ginteger vector of estimated cluster numbers for the points (via argmax over z).
warnlogical. TRUE if one of the estimated clusters has too few points and/or collinear regressors.

Details

The result of the EM iteration depends on the initial configuration, which is generated randomly by randcmatrix for regmix. regmix calls regem. To provide the initial configuration manually, use parameter m of regem directly. Take a look at the example about how to generate m if you want to specify initial parameters.

The original paper DeSarbo and Cron (1988) suggests the AIC for estimating the number of clusters. The use of the BIC is advocated by Wedel and DeSarbo (1995). The BIC is defined here as 2*loglik - log(n)*((p+3)*cln-1), p being the number of independent variables, i.e., the larger the better.

See the entry for the input parameter warnings for the treatment of several numerical problems.

References

DeSarbo, W. S. and Cron, W. L. (1988) A maximum likelihood methodology for clusterwise linear regression, Journal of Classification 5, 249-282.

Wedel, M. and DeSarbo, W. S. (1995) A mixture likelihood approach for generalized linear models, Journal of Classification 12, 21-56.

Examples

Run this code

set.seed(12234)
data(tonedata)
attach(tonedata)
rmt1 <- regmix(stretchratio,tuned,nclust=1:2)
# nclust=1:2 makes the example fast;
# a more serious application would rather use the default.
rmt1$g
rmt1$bic
# start with initial parameter values
cln <- 3
n <- 150
initcoef <- cbind(c(2,0),c(0,1),c(0,2.5))
initvar <- c(0.001,0.0001,0.5)
initeps <- c(0.4,0.3,0.3)
# computation of m from initial parameters
m <- matrix(nrow=n, ncol=cln)
stm <- numeric(0)
for (i in 1:cln)
  for (j in 1:n){
    m[j,i] <- initeps[i]*dnorm(tuned[j],mean=initcoef[1,i]+
              initcoef[2,i]*stretchratio[j], sd=sqrt(initvar[i]))
  }
  for (j in 1:n){
    stm[j] <- sum(m[j,])
    for (i in 1:cln)
      m[j,i] <- m[j,i]/stm[j]
  } 
rmt2 <- regem(stretchratio, tuned, m, cln)

Run the code above in your browser using DataLab