segmenTier's core dynamic programming routine in Rcpp
calculateScore(seq, C, score, csim, M, Mn, multi = "max")
the cluster sequence (where clusters at positions k:i are considered). Note, that unlike the R wrapper, clustering numbers here are 0-based, where 0 is the nuisance cluster.
the list of clusters, including nuisance cluster '0', see
seq
the scoring function to be used, one of "ccor" or "icor",
an apt similarity matrix must be supplied via option csim
a matrix, providing either the cluster-cluster (scoring function "ccor") or the position-cluster similarity function (scoring function "icor")
minimal sequence length; Note, that this is not a strict cut-off but defined as an accumulating penalty that must be "overcome" by good score
minimal sequence length for nuisance cluster, Mn<M will allow shorter distances between segments
if multiple k
are found which return the same maximal
score, should the "max" (shorter segment) or "min" (longer segment) be used?
This has little effect on real-life large data sets, since the situation
will rarely occur. Default is "max".
Returns the total score matrix S(i,c)
and the matrix
K(i,c)
which stores the position k
which delivered
the maximal score at position i
. This is used in the back-tracing
phase.
This is segmenTier
's core dynamic programming
routine. It constructs the total score matrix S(i,c), based on
the passed scoring function ("icor" or "ccor"), and length penalty
M
. "Nuisance" cluster "0" can have a smaller penalty Mn
to allow for shorter distances between "real" segments.
Scoring function "icor" calculates the sum of similarities of data at positions k:i to cluster centers c over all k and i. The similarities are calculated e.g., as a (Pearson) correlation between the data at individual positions and the tested cluster c center.
Scoring function "ccor" calculates the sum of similarities between the clusters at positions k:i to cluster c over all k and i.
Scoring function "ccls" is a special case of "ccor" and is NOT handled
here, but is reflected in the cluster similarity matrix csim
. It
is handled and automatically constructed in the R wrapper
segmentClusters
, and merely counts the
number of clusters in sequence k:i, over all k and i, that are identical
to the tested cluster c
, and sub-tracts
a penalty for the count of non-identical clusters.
Machne, Murray & Stadler (2017) <doi:10.1038/s41598-017-12401-8>