seqpropclust: Monothetic clustering of state sequences

Description

Monothetic divisive clustering of the data using object properties. For state sequences object different set of properties are automoatically extracted.

Usage

seqpropclust(seqdata, diss, properties = c("state", "duration", "spell.age", 
		"spell.dur", "transition", "pattern", "AFtransition", "AFpattern", 
		"Complexity"), other.prop = NULL, prop.only = FALSE, pmin.support = 0.05, 
		max.k = -1, with.missing = TRUE, R = 1, weight.permutation = "diss", 
		min.size = 0.01, max.depth = 5, maxcluster = NULL, ...)
		
wcPropertyClustering(diss, properties, maxcluster = NULL, ...)
dtcut(st, k, labels = TRUE)

Arguments

seqdata

State sequence object (see seqdef).

diss

a dissimilarity matrix or a dist object.

properties

Character or data.frame. In seqpropclust, it can be a list of properties to be extracted from seqdata. It can also be a data.frame specifying the properties to use for the clustering.

other.prop

data.frame. Additional properties to be considered to cluster the sequences.

prop.only

Logical. If TRUE, the function returns a data.frame containing the extracted properties (without clustering the data).

pmin.support

Numeric. Minimum support (as a proportion of sequences). See seqefsub.

max.k

Numeric. The maximum number of events allowed in a subsequence. See seqefsub.

with.missing

Logical. If TRUE, property of missing spell are also extracted.

Number of permutations used to assess the significance of the split. See disstree.

weight.permutation

Weight permutation method: "diss" (attach weights to the dissimilarity matrix), "replicate" (replicate cases using weights), "rounded-replicate" (replicate case using rounded weights), "random-sampling" (random assignment of covariate profiles to the objects using distributions defined by the weights.). See disstree.

min.size

Minimum number of cases in a node, will be treated as a proportion if less than 1. See disstree.

max.depth

Maximum depth of the tree. See disstree.

maxcluster

Maximum number of cluster to consider.

A divise clustering tree as produced by seqpropclust

The number of groups to extract.

labels

Logical. If TRUE, rules to assign an object to a sequence is used to label the cluster (instead of a number).

…

Arguments passed to/from other methods.

Value

Return a seqpropclust object, which is (in fact) a distree object. See disstree.

Details

The method implement the DIVCLUS-T algorithm.

References

Studer, M. (2018). Divisive property-based and fuzzy clustering for sequence analysis. In G. Ritschard and M. Studer (Eds.), Sequence Analysis and Related Approaches: Innovative Methods and Applications, Life Course Research and Social Policies. Springer.

Piccarreta R, Billari FC (2007). Clustering work and family trajectories by using a divisive algorithm. Journal of the Royal Statistical Society: Series A (Statistics in Society), 170(4), 1061-1078.

Chavent M, Lechevallier Y, Briant O (2007). DIVCLUS-T: A monothetic divisive hierarchical clustering method. Computational Statistics & Data Analysis, 52(2), 687-701.

Examples

Run this code

# NOT RUN {
data(mvad)
mvad.seq <- seqdef(mvad[1:100, 17:86])

## COmpute distance using Hamming distance
diss <- seqdist(mvad.seq, method="HAM")

pclust <- seqpropclust(mvad.seq , diss=diss, maxcluster=5, properties=c("state", "duration")) 

## Run it to visualize the results
##seqtreedisplay(pclust, type="d", border=NA, showdepth=TRUE)

pclustqual <- as.clustrange(pclust, diss=diss, ncluster=5)
# }

Run the code above in your browser using DataLab