BSraw objects
BSraw object clusterSites searches for agglomerations of CpG sites across
all samples. In a first step the data is reduced to CpG sites covered in
round(perc.samples*ncol(object)) samples, these are
called 'frequently covered CpG sites'. In a second step regions are detected
where not less than min.sites frequently covered CpG sites are sufficiantly
close to each other (max.dist). Note, that the frequently covered CpG sites
are considered to define the boundaries of the CpG clusters only. For the subsequent
analysis the methylation data of all CpG sites within these clusters
are used.
clusterSites(object, groups, perc.samples, min.sites, max.dist,
mc.cores, ...)BSraw.
filterBySharedRegions.
min.sites CpG sites which
are covered in at least perc.samples of samples, otherwise clusters are dropped.
perc.samples of samples within a cluster should not be more
than max.dist bp apart from their nearest neighbors.
mclapply Default is 1.
filterBySharedRegions function.
closer than BSraw object reduced to CpG sites within CpG cluster regions. A cluster.id metadata column on the rowRanges assigns cluster memberships per CpG site.
There are three parameters that are important: perc.samples, min.sites and max.dist.
For example, if perc.samples=0.5, the algorithm detects all CpG sites that are covered in at least 50%
of the samples. Those CpG sites are called frequently covered CpG sites. In the next step the algorithm
determines the distances between neighboured frequently covered CpG sites.
When they are closer than (or close as) max.dist base pairs to each other,
those frequently covered CpG sites and all other, less frequently covered CpG sites that are
in between, belong to the same cluster. In the third step, each cluster is checked
for the number of frequently covered CpG sites. If this number is less than min.sites,
the cluster is discarded.
In other words:
1. The perc.samples parameter defines which are the frequently covered CpG sites.
2. The frequently covered CpG sites determine the boundaries of the clusters,
depending on their distance to each other.
3. Clusters are discarded if they have too less frequently covered CpG sites.
If argument group is given, perc.samples, or no.samples, are
applied for all group levels.
filterBySharedRegions, mclapply
data(rrbs)
rrbs.clust <- clusterSites(object = rrbs, groups = colData(rrbs)$group,
perc.samples = 4/5, min.sites = 20,
max.dist = 100)
Run the code above in your browser using DataLab