Unlimited learning, half price | 50% off
Get 50% off unlimited learning

fpc (version 2.1-6)

cluster.varstats: Variablewise statistics for clusters

Description

This function gives some helpful variable-wise information for cluster interpretation, given a clustering and a data set. The output object contains some tables. For categorical variables, tables compare clusterwise distributions with overall distributions. Continuous variables are categorised for this.

If desired, tables, histograms, some standard statistics of continuous variables and validation plots as available through discrproj (Hennig 2004) are given out on the fly.

Usage

cluster.varstats(clustering,vardata,contdata=vardata,
                             clusterwise=TRUE,
                            tablevar=NULL,catvar=NULL,
                             quantvar=NULL, catvarcats=10,
                             proportions=FALSE,
                            projmethod="none",minsize=ncol(contdata)+2,
                          ask=TRUE,rangefactor=1)

## S3 method for class 'varwisetables': print(x,digits=3,...)

Arguments

clustering
vector of integers. Clustering (needs to be in standard coding, 1,2,...).
vardata
data matrix or data frame of which variables are summarised.
contdata
variable matrix or data frame, normally all or some variables from vardata, on which cluster visualisation by projection methods is performed unless projmethod="none". It should make sense to interpret these variables
clusterwise
logical. If FALSE, only the output tables are computed but no more detail and graphs are given on the fly.
tablevar
vector of integers. Numbers of variables treated as categorical (i.e., no histograms and statistics, just tables) if clusterwise=TRUE. Note that an error will be produced by factor type variables unless they are declared as ca
catvar
vector of integers. Numbers of variables to be categorised by proportional quantiles for table computation. Recommended for all continuous variables.
quantvar
vector of integers. Variables for which means, standard deviations and quantiles should be given out if clusterwise=TRUE.
catvarcats
integer. Number of categories used for categorisation of variables specified in quantvar.
proportions
logical. If TRUE, output tables contain proportions, otherwise numbers of observations.
projmethod
one of "none", "dc", "bc", "vbc", "mvdc", "adc", "awc" (recommended if not "none"), "arc", "nc", "wnc",
minsize
integer. Projection is not carried out for clusters with fewer points than this. (If this is chosen smaller, it may lead to errors with some projection methods.)
ask
logical. If TRUE, par(ask=TRUE) is set in the beginning to prompt the user before plots and par(ask=FALSE) in the end.
rangefactor
numeric. Factor by which to multiply the range for projection plot ranges.
x
an object of class "varwisetables", output object of cluster.varstats.
digits
integer. Number of digits after the decimal point to print out.
...
not used.

Value

  • An object of class "varwisetables", which is a list with a table for each variable, giving (categorised) marginal distributions by cluster.

References

Hennig, C. (2004) Asymmetric linear dimension reduction for classification. Journal of Computational and Graphical Statistics 13, 930-945 .

Examples

Run this code
set.seed(112233)
  data(Cars93)
  Cars934 <- Cars93[,c(3,5,8,10)]
  cc <-
    discrete.recode(Cars934,xvarsorted=FALSE,continuous=c(2,3),discrete=c(1,4))
  fcc <- flexmix(cc$data~1,k=2,
  model=lcmixed(continuous=2,discrete=2,ppdim=c(6,3),diagonal=TRUE))
  cv <-
    cluster.varstats(fcc@cluster,Cars934, contdata=Cars934[,c(2,3)],
    tablevar=c(1,4),catvar=c(2,3),quantvar=c(2,3),projmethod="awc",
    ask=FALSE)
  print(cv)

Run the code above in your browser using DataLab