This function exemplifies the processing of an oscillatory
transcriptome time-series data as used in the establishment of this
algorithm and the demo segment_data
. As suggested by Machne & Murray
(PLoS ONE 2012) and Lehmann et al. (BMC Bioinformatics 2014) a Discrete
Fourier Transform of time-series data allows to cluster time-series by
their change pattern.
Note that NA values are here interpreted as 0. Please take care of NA
values yourself, if you do not want this behavior.
Rows consisting only of 0 (or NA) values, or with a total signal
(sum over all time points) below the value passed in argument
low.thresh
, are detected, result in NA values in the
transformed data, and will be assigned to the
"nuisance" cluster in clusterTimeseries
.
Discrete Fourier Transform (DFT): if requested (option
use.fft=TRUE
), a DFT will be applied using base R's
mvfft
function and reporting all or only
requested (option dft.range
) DFT components, where the
first, or DC ("direct current") component, equals the total signal
(sum over all points) and other components are numbered 1:n,
reflecting the number of full cycles in the time-series. Values are
reported as complex numbers, from which both amplitude and phase
can be calculated. All returned DFT components will be used by
clusterTimeseries
.
Additional Transformations: data can be transformed prior to DFT
(options trafo
, smooth.time
, smooth.space
), or
after DFT (options use.snr
and dc.trafo
). It is
recommended to use the amplitude scaling (a signal-to-noise ratio
transformation, see option documentation). The separate
transformation of the DC component allows to de-emphasize the total
signal in subsequent clustering & segmentation. Additionally, but
not tested in the context of segmentation, a Box-Cox transformation
of the DFT can be performed (option lambda
). This
transformation proofed useful in DFT-based clustering with the
model-based clustering algorithm in package flowClust, and is
available here for further tests with k-means clustering.
Phase, Amplitude and Permutation Analysis: this time-series
processing and subsequent clustering can also be used without
segmentation, eg. for conventional microarray data or RNA-seq data
already mapped to genes. The option perm
allows to perform a
permutation test (perm
times) and adds a matrix of empirical
p-values for all DFT components to the results object, ie. the
fraction of perm
where amplitude was higher then the
amplitude of the randomized time-series. Phases and amplitudes can
be derived from the complex numbers in matrix "dft" of the result
object.