Learn R Programming

mixreg (version 0.0-2)

mixreg: Fit a mixture of linear regressions.

Description

Estimates the parameters for a mixture of linear regressions, assuming Gaussian errors, using the EM algorithm.

Usage

mixreg(x, y, ncomp=2, intercept=TRUE, eq.var=FALSE,
       theta.start=NULL, itmax=1000, eps=1e-06, verb=TRUE,
       digits=7, max.try=5, data.name=NULL)

Arguments

x
A matrix of predictors for each of the regression models in the mixture. It should NOT include an initial column of 1s. If there is only one predictor, x may be a vector.
y
The vector of responses for the regression models.
ncomp
The number of components in the mixture.
intercept
Logical argument specifying whether the linear regressions should have intercepts fitted.
eq.var
Logical argument specifying whether the error variance should be the same for all components, or each component should be allowed a different error variance.
theta.start
A list giving starting values for the estimation procedure. Each component of the list is in turn a list with components beta (vector of linear coefficients), sigsq (variance) and lambda (mixing probability). If eq.var is TRUE, then it is sensible to ha
itmax
The maximum number of EM steps to be undertaken.
eps
A value specifying the convergence criterion for the EM algorithm. If the maximum absolute value of the change in the parameters is less than eps the algorithm is considered to have converged.
verb
Logical argument; if verb is TRUE then details of the progress of the algorithm are printed out at each EM step.
digits
The number of digits to which the details are printed out, when verb is TRUE.
max.try
If the algorithm encounters a singularity in the likelihood (as may occur when eq.var is FALSE) the algorithm is restarted using new (randomly generated) starting values. The restart is attempted a maximum of max.try times.
data.name
A character string specifying a name associated with the data being analyzed, for identification purposes.

Value

  • A list, of class mixreg, with components
  • parmatThe parameters of the fitted model arranged as a matrix, each row corresponding to one component of the mixture.
  • thetaThe parameters of the fitted model as a list, each entry of the list being itself a list (like those in theta.start) corresponding to one component of the mixture.
  • log.likeThe log likelihood of the fitted model, based on Gaussian errors.
  • aicThe Akaike Information Criterion value for the fitted model; aic is equal to -2 * log.like + 2*M where M is the number of parameters in the model.
  • interceptThe intercept argument of the call to mixreg.
  • eq.varThe eq.var argument of the call to mixreg.
  • bnmsA vector of names associated with the linear components of the regression models. The names are formed from the column names of the argument x if these exist; otherwise they are "beta1", "beta2", .... The name "Int" is prepended if intecept is TRUE.
  • nstepsThe number of steps the EM algorithm took to converge.
  • convergedLogical value indicating whether the algorithm did converge or stopped because it reach the itmax EM step.
  • data.nameThe data.name argument if supplied; otherwise is formed as "name-of-y.on.name-of-x".

Details

Even if eq.var is TRUE, each component of theta still has its own sigsq component. The values of these will all be equal however if eq.var is TRUE.

References

Turner, T. R. (2000) Estimating the rate of spread of a viral infection of potato plants via mixtures of regressions. Appl. Statist. vol. 49, Part 3, pp. 371 -- 384.

Dempster, A. P., Laird, N. M., and Rubin, D. B. Maximum likelihood from incomplete data via the EM algorithm, J. Royal Statist. Soc. B, vol. 39, pp. 1--22, 1977.

See Also

bootcomp, cband, covmix, plot.cband, plot.mresid, qq.mix, resid.mix

Examples

Run this code
data(aphids)
x   <- aphids$n.aphids
y   <- aphids$n.inf
TS  <- list(list(beta=c(3.0,0.1),sigsq=16,lambda=0.5),
            list(beta=c(0.0,0.0),sigsq=16,lambda=0.5))
fit <- mixreg(x,y,ncomp=2,theta.start=TS,data.name='aphids')
cvm <- covmix(fit,x,y)
cbd <- cband(fit,cvm,x,y)
plot(cbd)
r <- resid.mix(fit,x,y)
plot(r)
r <- resid.mix(fit,x,y,std=TRUE)
qq.mix(r)

Run the code above in your browser using DataLab