oregMclust: Orthogonal Regression Clustering

Description

Computation of center points for regression data by means of orthogonal regression. A cluster method based on redescending M-estimators is used.

Usage

oregMclust(datax, datay, bw, method = "const",
    xrange = range(datax), yrange = range(datay),
    prec = 4, na = 1, sa = NULL, nl = 10, nc = NULL,
    brmaxit = 1000)
  regparm(reg)
  # S3 method for oregMclust
plot(x, datax, datay, prec = 3, rcol = "black",
  rlty = 1, rlwd = 3, ...)
  # S3 method for oregMclust
print(x, ...)

Arguments

datax, datay

numerical vectors of coordinates of the observations. Alternatively, a matrix with two or three columns can be given. Then, the first two columns are interpreted as coordinates of the observations and, if available, the third is passed to parameter sa.

positive number. Bandwidth for the cluster method.

method

optional string. Method of choosing starting values for maximization. Possible values are:

"const": a constant number of angles for every observation is used. By default, one horizontal line through any observation is used as starting value. If a value for parameter na is passed, na lines through any observation are used. Alternatively, with the parameter sa a proper starting angle for every observation can be specified. In this case, na is ignored. The length of sa must be the number of observations.
"all": every line through any two observations is used.
"prob": Clusters are searched iteratively with randomly chosen starting values until either no new clusters are found (default), or until nc clusters are found. The precision of distinguishing the clusters can be tuned with the parameter prec. In each iteration, nl times a line through two randomly chosen observations is used as starting value.

xrange, yrange

optional numerical intervals describing the domains of the observations. This is only used for normalization of the data. Note that both intervals should have approximately the same length or should be transformed otherwise. This is not done automatically, since this transformation affects the choice of the bandwidth.

prec

optional positive integer. Tuning parameter for distinguishing different clusters, which is passed to deldupMclust.

optional positive integer. Number of angles per observation used as starting values for method = "const" (default).

optional numerical vector. Angles (within [0,2pi)) used as starting values for method = "const" (default).

optional positive integer. Number of starting lines in each iteration for method = "prob".

optional positive integer. Number of clusters to search if method "const" is chosen. Note that if nc is too large, i.e., nc clusters cannot be found, the function does not terminate. Attention! Using Windows, it is impossible to interrupt the routine manually in this case!

brmaxit

optional positive integer. Since the maximization could be very slow in some cases depending on the starting value, the maximization is stopped after brmaxit iterations.

reg, x

object returned from oregMclust.

rcol, rlty, rlwd

optional graphic parameters used for plotting regression lines.

...

additional parameters passed to plot.

Value

A numerical matrix containing one row for every found regression center line. The columns "alpha" and "beta" are their parameters in the representation (cos(alpha), sin(alpha)) * (x,y)' = beta, where alpha is within [0,2pi). For the alternative representation y = mx + b, the return value can be passed to regparm.

The columns "value" and "count" give the value of the objective function and the number how often they are found.

Details

oregMclust implements a cluster method based on redescending M-estimators for the case of orthogonal regression. This method is introduced by Mueller and Garlipp in 2003 (see references).

regparm transforms the columns "alpha" and "beta" to "intersept" and "slope".

See also bestMclust, projMclust, and envMclust for choosing the 'best' clusters out of all found clusters.

References

Mueller, C. H., & Garlipp, T. (2005). Simple consistent cluster methods based on redescending M-estimators with an application to edge identification in images. Journal of Multivariate Analysis, 92(2), 359--385.

Examples

Run this code

# NOT RUN {
  x = c(rnorm(100, 0, 3), rnorm(100, 5, 3))
  y = c(-2 * x[1:100] - 5, 0.5 * x[101:200] + 30)/2
  x = x + rnorm(200, 0, 0.5)
  y = y + rnorm(200, 0, 0.5)

  reg = oregMclust(x, y, 1, method = "prob")
  reg = projMclust(reg, x, y)
  reg
  plot(bestMclust(reg, 2, crit = "proj"), x, y)
# }

Run the code above in your browser using DataLab