Earth Mover's Distance for Differential Analysis of Genomics
Data
Description
The EMDomics algorithm is used to perform a supervised
multi-class analysis to measure the magnitude and statistical
significance of observed continuous genomics data between
groups. Usually the data will be gene expression values from
array-based or sequence-based experiments, but data from other
types of experiments can also be analyzed (e.g. copy number
variation). Traditional methods like Significance Analysis of
Microarrays (SAM) and Linear Models for Microarray Data (LIMMA)
use significance tests based on summary statistics (mean and
standard deviation) of the distributions. This approach
lacks power to identify expression differences between groups
that show high levels of intra-group heterogeneity. The Earth
Mover's Distance (EMD) algorithm instead computes the "work"
needed to transform one distribution into another, thus
providing a metric of the overall difference in shape between
two distributions. Permutation of sample labels is used to
generate q-values for the observed EMD scores. This package also
incorporates the Komolgorov-Smirnov (K-S) test and the Cramer von Mises
test (CVM), which are both common distribution comparison tests.