This is the main R wrapper function for the `segmenTier'
segmentation algorithm. It takes an ordered sequence of cluster
labels and returns segments of consistent clusterings, where
cluster-cluster or cluster-position similarities are
maximal. Its main input (argument seq
) is either a
"clustering" object returned by clusterTimeseries
(scenario I), or an integer vector of cluster labels (scenario
II) or. The function then runs the dynamic programming algorithm
(calculateScore
) for a selected scoring function
and an according cluster similarity matrix, followed by the
back-tracing step (backtrace
) to find segment
borders.
The main result, list item "segments" of the returned
object, is a 3-column matrix, where column 1 is the cluster
assignment and columns 2 and 3 are start and end indices of the
segments. For the batch function segmentCluster.batch
,
the "segments" item is a data.frame
contain additional information, see ?segmentCluster.batch.
As shown in the publication, the parameters M
,
E
and nui
have the strongest impact on resulting
segment borders. Other parameters can be fine-tuned but had
little impact on our test data set.
In the default and tested scenario I, when the input is an object
of class "clustering" produced by clusterTimeseries
,
the cluster-cluster and cluster-position similarity matrices are
already provided by this object.
In the second scenario II for custom use, argument seq
can
be a simple clustering vector, where a nuisance cluster must be
indicated by cluster label "0" (zero). The cluster-cluster or
cluster-position similarities MUST be provided (argument
csim
) for scoring functions "ccor" and "icor",
respectively. For the simplest scoring function "ccls", a uniform
cluster similarity matrix is constructed from arguments a
and nui
, with cluster self-similarities of 1,
"dissimilarities" between different clusters using argument
a<0
, and nuisance cluster self-similarity of -a
.
The function returns a list (class "segments") comprising of the
main result (list item "segments"), and "warnings" from the dynamic
programming and backtracing phases, the used similarity matrix
csim
, extended by the nuisance cluster; and optionally (see
option save.matrix
) the scoring vectors S1(i,c)
, the
total score matrix S(i,c)
and the backtracing matrix
K(i,c)
for analysis of algorithm performance for novel data
sets. Additional convenience data is reported, such as cluster
colors and sortings if argument seq
was of class
'clustering'. These allow for convenient inspection of all data
processing steps with the plot methods. A plot method exists that
allows to plot segments aligned to "timeseries" and "clustering"
plots.