Various methods for classification of unclustered points from
clustered points for use within functions `nselectboot`

and `prediction.strength`

.

```
classifdist(cdist,clustering,
method="averagedist",
centroids=NULL,nnk=1)
```classifnp(data,clustering,
method="centroid",cdist=NULL,
centroids=NULL,nnk=1)

cdist

dissimilarity matrix or `dist`

-object. Necessary for
`classifdist`

but optional for `classifnp`

and there only
used if `method="averagedist"`

(if not provided, `dist`

is
applied to `data`

).

data

something that can be coerced into a an
`n*p`

-data matrix.

clustering

integer vector. Gives the cluster number (between 1 and k for k clusters) for clustered points and should be -1 for points to be classified.

method

one of ```
"averagedist", "centroid", "qda",
"knn"
```

. See details.

centroids

for `classifnp`

a k times p matrix of cluster
centroids. For `classifdist`

a vector of numbers of centroid
objects as provided by `pam`

. Only used if
`method="centroid"`

; in that case mandatory for
`classifdist`

but optional for `classifnp`

, where cluster mean
vectors are computed if `centroids=NULL`

.

nnk

number of nearest neighbours if `method="knn"`

.

An integer vector giving cluster numbers for all observations; those for the observations already clustered in the input are the same as in the input.

`classifdist`

is for data given as dissimilarity matrix,
`classifnp`

is for data given as n times p data matrix.
The following methods are supported:

- "centroid"
assigns observations to the cluster with closest cluster centroid as specified in argument

`centroids`

(this is associated to k-means and pam/clara-clustering).- "qda"
only in

`classifnp`

. Classifies by quadratic discriminant analysis (this is associated to Gaussian clusters with flexible covariance matrices), calling`qda`

with default settings. If`qda`

gives an error (usually because a class was too small),`lda`

is used.- "lda"
only in

`classifnp`

. Classifies by linear discriminant analysis (this is associated to Gaussian clusters with equal covariance matrices), calling`lda`

with default settings.- "averagedist"
assigns to the cluster to which an observation has the minimum average dissimilarity to all points in the cluster (this is associated with average linkage clustering).

- "knn"
classifies by

`nnk`

nearest neighbours (for`nnk=1`

, this is associated with single linkage clustering). Calls`knn`

in`classifnp`

.- "fn"
classifies by the minimum distance to the farthest neighbour. This is associated with complete linkage clustering).

# NOT RUN { set.seed(20000) x1 <- rnorm(50) y <- rnorm(100) x2 <- rnorm(40,mean=20) x3 <- rnorm(10,mean=25,sd=100) x <-cbind(c(x1,x2,x3),y) truec <- c(rep(1,50),rep(2,40),rep(3,10)) topredict <- c(1,2,51,52,91) clumin <- truec clumin[topredict] <- -1 classifnp(x,clumin, method="averagedist") classifnp(x,clumin, method="qda") classifdist(dist(x),clumin, centroids=c(3,53,93),method="centroid") classifdist(dist(x),clumin,method="knn") # }