Computes linear regression fixed point clusters (FPCs), i.e., subsets of the data, which consist exactly of the non-outliers w.r.t. themselves, and may be interpreted as generated from a homogeneous linear regression relation between independent and dependent variable. FPCs may overlap, are not necessarily exhausting and do not need a specification of the number of clusters.

Note that while `fixreg`

has lots of parameters, only one (or
few) of them have usually to be specified, cf. the examples. The
philosophy is to allow much flexibility, but to always provide
sensible defaults.

```
fixreg(indep=rep(1,n), dep, n=length(dep),
p=ncol(as.matrix(indep)),
ca=NA, mnc=NA, mtf=3, ir=NA, irnc=NA,
irprob=0.95, mncprob=0.5, maxir=20000, maxit=5*n,
distcut=0.85, init.group=list(),
ind.storage=FALSE, countmode=100,
plot=FALSE)
```# S3 method for rfpc
summary(object, ...)

# S3 method for summary.rfpc
print(x, maxnc=30, ...)

# S3 method for rfpc
plot(x, indep=rep(1,n), dep, no, bw=TRUE,
main=c("Representative FPC No. ",no),
xlab="Linear combination of independents",
ylab=deparse(substitute(indep)),
xlim=NULL, ylim=range(dep),
pch=NULL, col=NULL,...)

# S3 method for rfpc
fpclusters(object, indep=NA, dep=NA, ca=object$ca, ...)

rfpi(indep, dep, p, gv, ca, maxit, plot)

indep

numerical matrix or vector. Independent
variables.
Leave out for clustering one-dimensional data.
`fpclusters.rfpc`

does not need specification of `indep`

if `fixreg`

was run with `ind.storage=TRUE`

.

dep

numerical vector. Dependent variable.
`fpclusters.rfpc`

does not need specification of `dep`

if `fixreg`

was run with `ind.storage=TRUE`

.

n

optional positive integer. Number of cases.

p

optional positive integer. Number of independent variables.

ca

optional positive number. Tuning constant, specifying
required cluster
separation. By default determined automatically as a
function of `n`

and `p`

, see function `can`

,
Hennig (2002a).

mnc

optional positive integer. Minimum size of clusters
to be reported.
By default determined automatically as a function of
`mncprob`

. See Hennig (2002a).

mtf

optional positive integer. FPCs must be found at
least `mtf`

times to be reported by `summary.rfpc`

.

ir

optional positive integer. Number of algorithm runs.
By default determined
automatically as a function of `n`

, `p`

, `irnc`

,
`irprob`

, `mtf`

,
`maxir`

. See function `itnumber`

and Hennig (2002a).

irnc

optional positive integer. Size of the smallest
cluster to be found with
approximated probability `irprob`

.

irprob

optional value between 0 and 1. Approximated
probability for a cluster of size `irnc`

to be found.

mncprob

optional value between 0 amd 1. Approximated
probability for a cluster of size `mnc`

to be found.

maxir

optional integer. Maximum number of algorithm runs.

maxit

optional integer. Maximum number of iterations per algorithm run (usually an FPC is found much earlier).

distcut

optional value between 0 and 1. A similarity
measure between FPCs, given in Hennig (2002a), and the corresponding
Single Linkage groups of FPCs with similarity larger
than `distcut`

are computed.
A single representative FPC is selected for each group.

init.group

optional list of logical vectors of length
`n`

.
Every vector indicates a starting configuration for the fixed
point algorithm. This can be used for datasets with high
dimension, where the vectors of `init.group`

indicate cluster
candidates found by graphical inspection or background
knowledge.

ind.storage

optional logical. If `TRUE`

,
then all indicator
vectors of found FPCs are given in the value of `fixreg`

.
May need lots of memory, but is a bit faster.

countmode

optional positive integer. Every `countmode`

algorithm runs `fixreg`

shows a message.

plot

optional logical. If `TRUE`

, you get a scatterplot
of first independent vs. dependent variable at each iteration.

object

object of class `rfpc`

, output of `fixreg`

.

x

object of class `rfpc`

, output of `fixreg`

.

maxnc

positive integer. Maximum number of FPCs to be reported.

no

positive integer. Number of the representative FPC to be plotted.

bw

optional logical. If `TRUE`

, plot is black/white,
FPC is
indicated by different symbol. Else FPC is indicated red.

main

plot title.

xlab

label for x-axis.

ylab

label for y-axis.

xlim

plotted range of x-axis. If `NULL`

, the range of the
plotted linear combination of independent variables is used.

ylim

plotted range of y-axis.

pch

plotting symbol, see `par`

.
If `NULL`

, the default is used.

col

plotting color, see `par`

.
If `NULL`

, the default is used.

gv

logical vector of length `n`

. Indicates the initial
configuration for the fixed point algorithm.

...

additional parameters to be passed to `plot`

(no effects elsewhere).

`fixreg`

returns an object of class `rfpc`

. This is a list
containing the components ```
nc, g, coefs, vars, nfound, er, tsc,
ncoll, grto, imatrix, smatrix, stn, stfound, sfpc, ssig, sto, struc,
n, p, ca, ir, mnc, mtf, distcut
```

.

`summary.rfpc`

returns an object of class `summary.rfpc`

.
This is a list containing the components ```
coefs, vars, stfound,
stn, sn, ser, tsc, sim, ca, ir, mnc, mtf
```

.

`fpclusters.rfpc`

returns a list of indicator vectors for the
representative FPCs of stable groups.

`rfpi`

returns a list with the components ```
coef, var, g,
coll, ca
```

.

integer. Number of FPCs.

list of logical vectors. Indicator vectors of FPCs. `FALSE`

if `ind.storage=FALSE`

.

list of numerical vectors. Regression coefficients of
FPCs. In `summary.rfpc`

, only for representative
FPCs of stable groups and sorted according to
`stfound`

.

list of numbers. Error variances of FPCs. In
`summary.rfpc`

, only for representative
FPCs of stable groups and sorted according to
`stfound`

.

vector of integers. Number of findings for the FPCs.

numerical vector. Expectation ratios of FPCs. Can be taken as a stability measure.

integer. Number of algorithm runs leading to too small or too seldom found FPCs.

integer. Number of algorithm runs where collinear regressor matrices occurred.

vector of integers. Numbers of FPCs to which algorithm
runs led, which were started by `init.group`

.

vector of integers. Size of intersection between
FPCs. See `sseg`

.

numerical vector. Similarities between
FPCs. See `sseg`

.

integer. Number of representative FPCs of stable groups. In
`summary.rfpc`

sorted according to `stfound`

.

vector of integers. Number of findings of members of
all groups of FPCs. In
`summary.rfpc`

sorted according to `stfound`

.

vector of integers. Numbers of representative FPCs.

vector of integers. As `sfpc`

, but only for stable
groups.

vector of integers. Number of representative FPC of most, 2nd most, ..., often found group of FPCs.

vector of integers. Number of group an FPC belongs to.

see arguments.

see arguments.

see arguments.

see arguments.

see arguments.

see arguments.

see arguments.

vector of integers. Number of points of representative FPCs.

numerical vector. Expectation ratio for stable groups.

vector of integers. Size of intersections between
representative FPCs of stable groups. See `sseg`

.

vector of regression coefficients.

error variance.

logical indicator vector of iterated FPC.

logical. `TRUE`

means that singular covariance
matrices occurred during the iterations.

A linear regression FPC is a data subset
that reproduces itself under the following operation:
Compute linear regression and error variance estimator for the data
subset, and compute all points of the dataset for which the squared
residual is smaller than `ca`

times the error variance.
Fixed points of this operation can be considered as clusters,
because they contain only
non-outliers (as defined by the above mentioned procedure) and all other
points are outliers w.r.t. the subset.
`fixreg`

performs `ir`

fixed point algorithms started from
random subsets of size `p+2`

to look for
FPCs. Additionally an algorithm is started from the whole dataset,
and algorithms are started from the subsets specified in
`init.group`

.
Usually some of the FPCs are unstable, and more than one FPC may
correspond to the same significant pattern in the data. Therefore the
number of FPCs is reduced: FPCs with less than `mnc`

points are
ignored. Then a similarity matrix is computed between the remaining
FPCs. Similarity between sets is defined as the ratio between
2 times size of
intersection and the sum of sizes of both sets. The Single Linkage
clusters (groups)
of level `distcut`

are computed, i.e. the connectivity
components of the graph where edges are drawn between FPCs with
similarity larger than `distcut`

. Groups of FPCs whose members
are found `mtf`

times or more are considered as stable enough.
A representative FPC is
chosen for every Single Linkage cluster of FPCs according to the
maximum expectation ratio `ser`

. `ser`

is the ratio between
the number of findings of an FPC and the estimated
expectation of the number of findings of an FPC of this size,
called *expectation ratio* and
computed by `clusexpect`

.
Usually only the representative FPCs of stable groups
are of interest.
The choice of the involved tuning constants such as `ca`

and
`ir`

is discussed in detail in Hennig (2002a). Statistical theory
is presented in Hennig (2003).
Generally, the default settings are recommended for
`fixreg`

. In cases where they lead to a too large number of
algorithm runs (e.g., always for `p>4`

), the use of
`init.group`

together with `mtf=1`

and `ir=0`

is useful. Occasionally, `irnc`

may be chosen
smaller than the default,
if smaller clusters are of interest, but this may lead to too many
clusters and too many algorithm runs. Decrease of
`ca`

will often lead to too many clusters, even for homogeneous
data. Increase of `ca`

will produce only very strongly
separated clusters. Both may be of interest occasionally.

`rfpi`

is called by `fixreg`

for a single fixed point
algorithm and will usually not be executed alone.

`summary.rfpc`

gives a summary about the representative FPCs of
stable groups.

`plot.rfpc`

is a plot method for the representative FPC of stable
group
no. `no`

. It produces a scatterplot of the linear combination of
independent variables determined by the regression coefficients of the
FPC vs. the dependent variable. The regression line and the region
of non-outliers determined by `ca`

are plotted as well.

`fpclusters.rfpc`

produces a list of indicator vectors for the
representative FPCs of stable groups.

Hennig, C. (2002) Fixed point clusters for linear regression:
computation and comparison, *Journal of
Classification* 19, 249-276.

Hennig, C. (2003) Clusters, outliers and regression:
fixed point clusters, *Journal of
Multivariate Analysis* 86, 183-212.

`fixmahal`

for fixed point clusters in the usual setup
(non-regression).

`regmix`

for clusterwiese linear regression by mixture
modeling ML.

`can`

, `itnumber`

for computation of the default
settings.

`clusexpect`

for estimation of the expected number of
findings of an FPC of given size.

`itnumber`

for the generation of the number of fixed point
algorithms.

`minsize`

for the smallest FPC size to be found with a given
probability..

`sseg`

for indexing the similarity/intersection vectors
computed by `fixreg`

.

# NOT RUN { set.seed(190000) options(digits=3) data(tonedata) attach(tonedata) tonefix <- fixreg(stretchratio,tuned,mtf=1,ir=20) summary(tonefix) # This is designed to have a fast example; default setting would be better. # If you want to see more (and you have a bit more time), # try out the following: # } # NOT RUN { set.seed(1000) tonefix <- fixreg(stretchratio,tuned) # Default - good for these data summary(tonefix) plot(tonefix,stretchratio,tuned,1) plot(tonefix,stretchratio,tuned,2) plot(tonefix,stretchratio,tuned,3,bw=FALSE,pch=5) toneclus <- fpclusters(tonefix,stretchratio,tuned) plot(stretchratio,tuned,col=1+toneclus[[2]]) tonefix2 <- fixreg(stretchratio,tuned,distcut=1,mtf=1,countmode=50) # Every found fixed point cluster is reported, # no matter how instable it may be. summary(tonefix2) tonefix3 <- fixreg(stretchratio,tuned,ca=7) # ca defaults to 10.07 for these data. summary(tonefix3) subset <- c(rep(FALSE,5),rep(TRUE,24),rep(FALSE,121)) tonefix4 <- fixreg(stretchratio,tuned, mtf=1,ir=0,init.group=list(subset)) summary(tonefix4) # }