calculateScore: segmenTier's core dynamic programming routine in Rcpp

Description

segmenTier's core dynamic programming routine in Rcpp

Usage

calculateScore(seq, C, score, csim, M, Mn, multi = "max")

Arguments

seq

the cluster sequence (where clusters at positions k:i are considered). Note, that unlike the R wrapper, clustering numbers here are 0-based, where 0 is the nuisance cluster.

the list of clusters, including nuisance cluster '0', see seq

score

the scoring function to be used, one of "ccor" or "icor", an apt similarity matrix must be supplied via option csim

csim

a matrix, providing either the cluster-cluster (scoring function "ccor") or the position-cluster similarity function (scoring function "icor")

minimal sequence length; Note, that this is not a strict cut-off but defined as an accumulating penalty that must be "overcome" by good score

minimal sequence length for nuisance cluster, Mn<M will allow shorter distances between segments

multi

if multiple k are found which return the same maximal score, should the "max" (shorter segment) or "min" (longer segment) be used? This has little effect on real-life large data sets, since the situation will rarely occur. Default is "max".

Value

Returns the total score matrix S(i,c) and the matrix K(i,c) which stores the position k which delivered the maximal score at position i. This is used in the back-tracing phase.

Details

This is segmenTier's core dynamic programming routine. It constructs the total score matrix S(i,c), based on the passed scoring function ("icor" or "ccor"), and length penalty M. "Nuisance" cluster "0" can have a smaller penalty Mn to allow for shorter distances between "real" segments.

Scoring function "icor" calculates the sum of similarities of data at positions k:i to cluster centers c over all k and i. The similarities are calculated e.g., as a (Pearson) correlation between the data at individual positions and the tested cluster c center.

Scoring function "ccor" calculates the sum of similarities between the clusters at positions k:i to cluster c over all k and i.

Scoring function "ccls" is a special case of "ccor" and is NOT handled here, but is reflected in the cluster similarity matrix csim. It is handled and automatically constructed in the R wrapper segmentClusters, and merely counts the number of clusters in sequence k:i, over all k and i, that are identical to the tested cluster c, and sub-tracts a penalty for the count of non-identical clusters.

References

Machne, Murray & Stadler (2017) <doi:10.1038/s41598-017-12401-8>