Note that while fixreg
has lots of parameters, only one (or
few) of them have usually to be specified, cf. the examples. The
philosophy is to allow much flexibility, but to always provide
sensible defaults.
fixreg(indep=rep(1,n), dep, n=length(dep),
p=ncol(as.matrix(indep)),
ca=NA, mnc=NA, mtf=3, ir=NA, irnc=NA,
irprob=0.95, mncprob=0.5, maxir=20000, maxit=5*n,
distcut=0.85, init.group=list(),
ind.storage=FALSE, countmode=100,
plot=FALSE)## S3 method for class 'rfpc':
summary(object, ...)
## S3 method for class 'summary.rfpc':
print(x, maxnc=30, ...)
## S3 method for class 'rfpc':
plot(x, indep=rep(1,n), dep, no, bw=TRUE,
main=c("Representative FPC No. ",no),
xlab="Linear combination of independents",
ylab=deparse(substitute(indep)),
xlim=NULL, ylim=range(dep),
pch=NULL, col=NULL,...)
## S3 method for class 'rfpc':
fpclusters(object, indep=NA, dep=NA, ca=object$ca, ...)
rfpi(indep, dep, p, gv, ca, maxit, plot)
fpclusters.rfpc
does not need specification of indep
if fixreg
was run with ind.storage=TRUE
fpclusters.rfpc
does not need specification of dep
if fixreg
was run with ind.storage=TRUE
.n
and p
, see function can
,
Hennimncprob
. See Hennig (2002a).mtf
times to be reported by summary.rfpc
.n
, p
, irnc
,
irprob
, mtf
,
maxir
. See function
irprob
.irnc
to be found.mnc
to be found.distcut
are computed.
A single representative FPC is sn
.
Every vector indicates a starting configuration for the fixed
point algorithm. This can be used for datasets with high
dimension, where the vectors of init.group
indicTRUE
,
then all indicator
vectors of found FPCs are given in the value of fixreg
.
May need lots of memory, but is a bit faster.countmode
algorithm runs fixreg
shows a message.TRUE
, you get a scatterplot
of first independent vs. dependent variable at each iteration.rfpc
, output of fixreg
.rfpc
, output of fixreg
.TRUE
, plot is black/white,
FPC is
indicated by different symbol. Else FPC is indicated red.NULL
, the range of the
plotted linear combination of independent variables is used.par
.
If NULL
, the default is used.par
.
If NULL
, the default is used.n
. Indicates the initial
configuration for the fixed point algorithm.plot
(no effects elsewhere).fixreg
returns an object of class rfpc
. This is a list
containing the components nc, g, coefs, vars, nfound, er, tsc,
ncoll, grto, imatrix, smatrix, stn, stfound, sfpc, ssig, sto, struc,
n, p, ca, ir, mnc, mtf, distcut
. summary.rfpc
returns an object of class summary.rfpc
.
This is a list containing the components coefs, vars, stfound,
stn, sn, ser, tsc, sim, ca, ir, mnc, mtf
.
fpclusters.rfpc
returns a list of indicator vectors for the
representative FPCs of stable groups.
rfpi
returns a list with the components coef, var, g,
coll, ca
.
FALSE
if ind.storage=FALSE
.summary.rfpc
, only for representative
FPCs of stable groups and sorted according to
stfound
.summary.rfpc
, only for representative
FPCs of stable groups and sorted according to
stfound
.init.group
.sseg
.sseg
.summary.rfpc
sorted according to stfound
.summary.rfpc
sorted according to stfound
.sfpc
, but only for stable
groups.sseg
.TRUE
means that singular covariance
matrices occurred during the iterations.ca
times the error variance.
Fixed points of this operation can be considered as clusters,
because they contain only
non-outliers (as defined by the above mentioned procedure) and all other
points are outliers w.r.t. the subset.
fixreg
performs ir
fixed point algorithms started from
random subsets of size p+2
to look for
FPCs. Additionally an algorithm is started from the whole dataset,
and algorithms are started from the subsets specified in
init.group
.
Usually some of the FPCs are unstable, and more than one FPC may
correspond to the same significant pattern in the data. Therefore the
number of FPCs is reduced: FPCs with less than mnc
points are
ignored. Then a similarity matrix is computed between the remaining
FPCs. Similarity between sets is defined as the ratio between
2 times size of
intersection and the sum of sizes of both sets. The Single Linkage
clusters (groups)
of level distcut
are computed, i.e. the connectivity
components of the graph where edges are drawn between FPCs with
similarity larger than distcut
. Groups of FPCs whose members
are found mtf
times or more are considered as stable enough.
A representative FPC is
chosen for every Single Linkage cluster of FPCs according to the
maximum expectation ratio ser
. ser
is the ratio between
the number of findings of an FPC and the estimated
expectation of the number of findings of an FPC of this size,
called expectation ratio and
computed by clusexpect
.
Usually only the representative FPCs of stable groups
are of interest.
The choice of the involved tuning constants such as ca
and
ir
is discussed in detail in Hennig (2002a). Statistical theory
is presented in Hennig (2003).
Generally, the default settings are recommended for
fixreg
. In cases where they lead to a too large number of
algorithm runs (e.g., always for p>4
), the use of
init.group
together with mtf=1
and ir=0
is useful. Occasionally, irnc
may be chosen
smaller than the default,
if smaller clusters are of interest, but this may lead to too many
clusters and too many algorithm runs. Decrease of
ca
will often lead to too many clusters, even for homogeneous
data. Increase of ca
will produce only very strongly
separated clusters. Both may be of interest occasionally. rfpi
is called by fixreg
for a single fixed point
algorithm and will usually not be executed alone.
summary.rfpc
gives a summary about the representative FPCs of
stable groups.
plot.rfpc
is a plot method for the representative FPC of stable
group
no. no
. It produces a scatterplot of the linear combination of
independent variables determined by the regression coefficients of the
FPC vs. the dependent variable. The regression line and the region
of non-outliers determined by ca
are plotted as well.
fpclusters.rfpc
produces a list of indicator vectors for the
representative FPCs of stable groups.
Hennig, C. (2003) Clusters, outliers and regression: fixed point clusters, Journal of Multivariate Analysis 86, 183-212.
fixmahal
for fixed point clusters in the usual setup
(non-regression).regmix
for clusterwiese linear regression by mixture
modeling ML.
can
, itnumber
for computation of the default
settings.
clusexpect
for estimation of the expected number of
findings of an FPC of given size.
itnumber
for the generation of the number of fixed point
algorithms.
minsize
for the smallest FPC size to be found with a given
probability..
sseg
for indexing the similarity/intersection vectors
computed by fixreg
.
set.seed(190000)
data(tonedata)
attach(tonedata)
tonefix <- fixreg(stretchratio,tuned,mtf=1,ir=20)
summary(tonefix)
# This is designed to have a fast example; default setting would be better.
# If you want to see more (and you have a bit more time),
# try out the following:
# set.seed(1000)
# tonefix <- fixreg(stretchratio,tuned)
## Default - good for these data
# summary(tonefix)
# plot(tonefix,stretchratio,tuned,1)
# plot(tonefix,stretchratio,tuned,2)
# plot(tonefix,stretchratio,tuned,3,bw=FALSE,pch=5)
# toneclus <- fpclusters(tonefix,stretchratio,tuned)
# plot(stretchratio,tuned,col=1+toneclus[[2]])
# tonefix2 <- fixreg(stretchratio,tuned,distcut=1,mtf=1,countmode=50)
## Every found fixed point cluster is reported,
## no matter how instable it may be.
# summary(tonefix2)
# tonefix3 <- fixreg(stretchratio,tuned,ca=7)
## ca defaults to 10.07 for these data.
# summary(tonefix3)
# subset <- c(rep(FALSE,5),rep(TRUE,24),rep(FALSE,121))
# tonefix4 <- fixreg(stretchratio,tuned,
# mtf=1,ir=0,init.group=list(subset))
# summary(tonefix4)
Run the code above in your browser using DataLab