Learn R Programming

RclusTool (version 0.91.61)

computeSemiSupervised: Semi-supervised clustering

Description

Perform semi-supervised clustering based on pairwise constraints, dealing with the number of clusters K, automatically or not.

Usage

computeSemiSupervised(
  data.sample,
  ML,
  CNL,
  K = 0,
  kmax = 20,
  method.name = "Constrained_KM",
  maxIter = 2,
  pca = FALSE,
  pca.nb.dims = 0,
  spec = FALSE,
  use.sampling = FALSE,
  sampling.size.max = 0,
  scaling = FALSE,
  RclusTool.env = initParameters(),
  echo = TRUE
)

Value

The function returns a list containing:

label

vector of labels.

summary

data.frame containing clusters summaries (min, max, sum, average, sd).

nbItems

number of observations.

Arguments

data.sample

list containing features, profiles and clustering results.

ML

list of ML (must-link) constrained pairs (as row.names of features).

CNL

list of CNL (cannot-link) constrained pairs (as row.names of features).

K

number of clusters. If K=0 (default), this number is automatically computed thanks to the Elbow method.

kmax

maximum number of clusters.

method.name

character vector specifying the constrained algorithm to use. Must be 'Constrained_KM' (default) or 'Constrained_SC' (Constrained Spectral Clustering).

maxIter

number of iterations for SemiSupervised algorithm

pca

boolean: if TRUE, Principal Components Analysis is applied to reduce the data space.

pca.nb.dims

number of principal components kept. If pca.nb.dims=0, this number is computed automatically.

spec

boolean: if TRUE, spectral embedding is applied to reduce the data space.

use.sampling

boolean: if FALSE (default), data sampling is not used.

sampling.size.max

numeric: maximal size of the sampling set.

scaling

boolean: if TRUE, scaling is applied.

RclusTool.env

environment in which data and intermediate results are stored.

echo

boolean: if FALSE (default), no description printed in the console.

Details

computeSemiSupervised performs semi-supervised clustering based on pairwise constraints, dealing with the number of clusters K, automatically or not

See Also

computeCKmeans, computeCSC, KwaySSSC

Examples

Run this code
# \donttest{
dat <- rbind(matrix(rnorm(100, mean = 0, sd = 0.3), ncol = 2), 
             matrix(rnorm(100, mean = 2, sd = 0.3), ncol = 2), 
             matrix(rnorm(100, mean = 4, sd = 0.3), ncol = 2))
tf <- tempfile()
write.table(dat, tf, sep=",", dec=".")
x <- importSample(file.features=tf)

pairs.abs <- visualizeSampleClustering(x, selection.mode = "pairs", 
		    profile.mode="whole sample", wait.close=TRUE)

res.ckm <- computeSemiSupervised(x, ML=pairs.abs$ML, CNL=pairs.abs$CNL, K=0)
plot(dat[,1], dat[,2], type = "p", xlab = "x", ylab = "y",
    col = res.ckm$label, main = "Constrained K-means clustering")


# }

Run the code above in your browser using DataLab