BSraw
objects
BSraw
object clusterSites
searches for agglomerations of CpG sites across
all samples. In a first step the data is reduced to CpG sites covered in
round(perc.samples*ncol(object))
samples, these are
called 'frequently covered CpG sites'. In a second step regions are detected
where not less than min.sites
frequently covered CpG sites are sufficiantly
close to each other (max.dist
). Note, that the frequently covered CpG sites
are considered to define the boundaries of the CpG clusters only. For the subsequent
analysis the methylation data of all CpG sites within these clusters
are used.
clusterSites(object, groups, perc.samples, min.sites, max.dist,
mc.cores, ...)
BSraw
.
filterBySharedRegions
.
min.sites
CpG sites which
are covered in at least perc.samples
of samples, otherwise clusters are dropped.
perc.samples
of samples within a cluster should not be more
than max.dist
bp apart from their nearest neighbors.
mclapply
Default is 1.
filterBySharedRegions
function.
closer than BSraw
object reduced to CpG sites within CpG cluster regions. A cluster.id
metadata column on the rowRanges
assigns cluster memberships per CpG site.
There are three parameters that are important: perc.samples
, min.sites
and max.dist
.
For example, if perc.samples=0.5
, the algorithm detects all CpG sites that are covered in at least 50%
of the samples. Those CpG sites are called frequently covered CpG sites. In the next step the algorithm
determines the distances between neighboured frequently covered CpG sites.
When they are closer than (or close as) max.dist
base pairs to each other,
those frequently covered CpG sites and all other, less frequently covered CpG sites that are
in between, belong to the same cluster. In the third step, each cluster is checked
for the number of frequently covered CpG sites. If this number is less than min.sites
,
the cluster is discarded.
In other words:
1. The perc.samples
parameter defines which are the frequently covered CpG sites.
2. The frequently covered CpG sites determine the boundaries of the clusters,
depending on their distance to each other.
3. Clusters are discarded if they have too less frequently covered CpG sites.
If argument group
is given, perc.samples
, or no.samples
, are
applied for all group levels.
filterBySharedRegions
, mclapply
data(rrbs)
rrbs.clust <- clusterSites(object = rrbs, groups = colData(rrbs)$group,
perc.samples = 4/5, min.sites = 20,
max.dist = 100)
Run the code above in your browser using DataLab