Learn R Programming

MMDiff (version 1.8.0)

compHistDists: Compute distances between pairs of histograms

Description

This function computes for each peak pairwise distances between histograms according to the specified method, currently Maximum Mean Discrepancy (MMD), Generalized Minimum Distance (GMD) and simple Pearson correlation (Pearson) are implemented.

Usage

compHistDists(DBA, method = 'MMD', CompIDs=NULL, Usefiltered = TRUE, PeakIDs = NULL, NormMethod = 'DESeq', overWrite = FALSE, HistField = 'PeakRawHists', run.parallel = TRUE, verbose = 2, save.file = TRUE, out.dir='.',sigma=NULL)

Arguments

DBA
DBA object, after running getPeakProfiles. Specifically, it uses the element MD, which contains a list of histogram matrices. (see the getPeakProfiles documentation for more information about this data type.)
method
specify what method should be used to determine distances between histograms, could be 'MMD' [1], 'GMD' [2] or simple 'Pearson' correlation
CompIDs
2 x nComps matrix, specifying sample ids of pairwise comparisons
Usefiltered
If TRUE, only peaks that have passed the filter to detect Outliers are considered. findOutlier() must be run first, otherwise all peaks are used
PeakIDs
Specify a subset of peaks for which distances should be completed
NormMethod
specify which normalization method should be used, currently only the 'DESeq' method [3] is implemented. Note, that unless NormMethod=NULL, getNormFactors has to be called first.
overWrite
if TRUE, overwrites earlier computed distances.
HistField
name of element in MD that is used to determine distances. This element should again be a list of nPeaks peaks, each containing a matrix of histograms (nSamples x nbins). It can be generated by running getPeakProfiles. Note, nbins may vary between peaks, if they have different length.
run.parallel
distribute over available CPUs
verbose
for debugging, set to 3 for some extra output
save.file
if TRUE, DBA objects are saved
out.dir
directory for saving output files
sigma
parameter controlling the Kernel size

Value

DBA object, with additional list element DISTS added to MD. DISTS again contains a list element named according to method applied (e.g. MMD). This elemnt is a matrix (nPeaks x nComps) containing all pairwise distances.

References

[1] Gretton A. et al )(2006). A kernel methods for the two-sample-problem. In NIPS, pages 513--520, MIT Press

[2] Zhao et al (2012). GMD: Measuring the distance between histograms with applications on high-throughput sequencing reads, Bioinformatics, 28 (8): 1164-1165.

[3] Anders S. and Huber W. (2010). Differential expression analysis for sequence count data Genome Biology, 11 (10): R106

See Also

getPeakProfiles, findOutliers, getNormFactors, detPeakPvals, plotHistDists, plotPeak

Examples

Run this code

# load DBA objects with peak profiles 
data(Cfp1Profiles)

# get normalization factors
Cfp1Norm <- getNormFactors(Cfp1Profiles)

# get all pairwise distances for the samples WT, Null and Resc i.e. WT
# vs Null, WT vs Resc and WT vs Resc: Recommended is the method 'MMD'
# [1], however, this may take a little while. Here, we compute the GMD
# distance instead [2].

Cfp1Dists <- compHistDists(Cfp1Norm, method = 'GMD', 
           NormMethod = 'DESeq') 




# You can also specify, which pairwise distances you are interessted in,
#  e.g.:

CompIDs <- cbind(c("WT.AB2", "Null.AB2"),
c("WT.AB2", "Resc.AB2"),
c("Null.AB2", "Resc.AB2"))

Cfp1Dists2 <- compHistDists(Cfp1Norm, method='GMD', CompIDs=CompIDs,
            NormMethod='DESeq')




# To view pairwise distances you can use the function plotHistDists. For
# example, treating WT and Resc as control replicates and Null as a
# treatment group, you can contrast the 'within-group' distances with 
# 'between-group' distances:

group1 <- c("WT.AB2","Resc.AB2")
group2 <- c("Null.AB2") #
plotHistDists(Cfp1Dists, group1=group1, group2=group2, method='GMD')

#see detPeakPvals to determine which peaks are significantly different
#between the two groups.

Run the code above in your browser using DataLab