Learn R Programming

cobindR (version 1.10.0)

testCpG: function to cluster sequences based on their CpG and GC content

Description

diagnostical function - GC content and CpG content are clustered using 2D gaussian models (Mclust). FALSE is returned if > max.clust (default=1) subgroups are found using the bayesian information criterion (BIC). If do.plot=TRUE, the results are visualized.

Usage

"testCpG"(x, max.clust = 4, do.plot = F, n.cpu = NA)

Arguments

x
an object of the class "cobindr", which will hold all necessary information about the sequences and the hits.
max.clust
integer describing the maximal number of clusters which are used for separating the data.
do.plot
logical flag, if do.plot=TRUE a scatterplot for the GC and CpG content for each sequence is produced and the clusters are color coded.
n.cpu
number of CPUs to be used for parallelization. Default value is 'NA' in which case the number of available CPUs is checked and than used.

Value

result
logical flag, FALSE is returned if more than one subgroups are found using the bayesian information criterion (BIC)
gc
matrix with rows corresponding to sequences and columns corresponding to GC and CpG content

References

the method uses clustering functions from the package "mclust" (http://www.stat.washington.edu/mclust/)

See Also

plot.gc

Examples

Run this code
cfg <- cobindRConfiguration()
sequence_type(cfg) <- 'fasta'
sequence_source(cfg) <- system.file('extdata/example.fasta', package='cobindR')
# avoid complaint of validation mechanism 
pfm_path(cfg) <- system.file('extdata/pfms',package='cobindR')
pairs(cfg) <- '' 
runObj <- cobindr( cfg)
testCpG(runObj, max.clust = 2, do.plot = TRUE) 

Run the code above in your browser using DataLab