clusterCTSS(object, threshold = 1, nrPassThreshold = 1, thresholdIsTpm = TRUE, method = "distclu", maxDist = 20, removeSingletons = FALSE, keepSingletonsAbove = Inf, minStability = 1, maxLength = 500, reduceToNonoverlapping = TRUE, customClusters = NULL, useMulticore = FALSE, nrCores = NULL)
CAGEset
object
>= threshold
in >= nrPassThreshold
experiments will be used for clustering and will contribute towards total signal of the cluster.
"distclu"
, "paraclu"
or "custom"
. See Details.
method = "distclu"
, otherwise ignored.
method = "custom"
.
removeSingletons = TRUE
, only singletons with signal < keepSingletonsAbove
will be removed. Useful to prevent removing highly supported singleton tag clusters. Default value Inf
results in removing all singleton TCs when removeSingletons = TRUE
. Ignored when removeSingletons = FALSE
or method = "custom"
.
< minStability
will be discarded. Used only when method = "paraclu"
, otherwise ignored.
> maxLength
will be discarded. Ignored when method = "custom"
.
method = "paraclu"
. See Details.
data.frame
with following columns: chr
(chromosome name), start
(0-based start coordinate), end
(end coordinate), strand
(either "+"
, or "-"
). Used only when method = "custom"
.
useMulticore = TRUE
is supported only on Unix-like platforms.
useMulticore = TRUE
. Default value NULL
uses all detected cores.
clusteringMethod
, filteredCTSSidx
and tagClusters
of the provided CAGEset
object will be occupied by the information on method used for clustering, CTSSs included in the clusters and list of tag clusters per CAGE experiment, respectively. To retrieve tag clusters for individual CAGE dataset use tagClusters
function.
"distclu"
and "paraclu"
. "distclu"
is an implementation of simple distance-based clustering of data attached to sequences, where two neighbouring TSSs are joined together if they are closer than some specified distance. "paraclu"
is an implementation of Paraclu algorithm for parametric clustering of data attached to sequences developed by M. Frith (Frith et al., Genome Research, 2007, http://www.cbrc.jp/paraclu/). Since Paraclu finds clusters within clusters (unlike distclu), additional parameters (removeSingletons
, keepSingletonsAbove
, minStability
, maxLength
and reduceToNonoverlapping
) can be specified to simplify the output by discarding too small (singletons) or too big clusters, and to reduce the clusters to a final set of non-overlapping clusters. Clustering is done for every CAGE dataset within CAGEset object separatelly, resulting in a different set of tag clusters for every CAGE dataset. TCs from different datasets can further be aggregated into a single referent set of consensus clusters by calling aggregateTagClusters
function.
tagClusters
load(system.file("data", "exampleCAGEset.RData", package="CAGEr"))
clusterCTSS(object = exampleCAGEset, threshold = 50, thresholdIsTpm = TRUE,
nrPassThreshold = 1, method = "distclu", maxDist = 20,
removeSingletons = TRUE, keepSingletonsAbove = 100)
Run the code above in your browser using DataLab