distGPS(x, metric='tanimoto', weights, uniqueRows=FALSE, genomelength=NULL, mc.cores=1)
TRUE
and x
is a
matrix
or data.frame
, duplicated rows are removed
prior to distance calculation. This can save substantial computing
time and memory. Notice however that the dimension of the distance
matrix is equal to the number of unique rows in x
, instead of
nrow..
(x)
.mc.cores
>1 and parallel
package is
loaded, computations are performed in parallel with mc.cores
processors when possible.distGPS
, with matrix of pairwise
dissimilarities (distances) between objects.
x
is
assumed to indicate the binding sites for a different sample,
e.g. epigenetic factor. Typically space(x)
indicates the
chromosome, start(x)
the start position and end(x)
the
end position (in bp). Strand information is ignored. x
contain individuals for
which we want to compute distances. Columns in x
contain the
variables, and should only contain either 0's and 1's or FALSE
and TRUE
.RangedDataList
objects, distances are defined as follows.
Let a1
and a2
be two RangedData
objects.
Define as n1
the number of a1
intervals overlapping with
some interval in a2
. Define n2
analogously.
The Tanimoto distance between a1
and a2
is defined as
(n1+n2)/(nrow(z1)+nrow(z2))
.
The average distance between a1
and a2
is defined as
.5*(n1/nrow(z1) + n2/nrow(z2))
.
The wtanimoto distance in chroGPS-genes weights each epigenetic factor
(table columns) according to its frequency (table rows).
The chi-square distance is defined as the usual chi-square distance on
a binary matrix B
which is automatically computed by
distGPS
.
The binary matrix B
is the
matrix with length(x)
rows and number of columns equal to the
genome length, where B[i,j]==1
indicates that element i
has a binding site at base pair j
.
The chi distance is simply defined as the square root of the
chi-square distance.
Finally, euclidean and manhattan metrics have the same definition than
in the base R function dist
. When choosing a metric one should consider the effect of outliers,
i.e. samples with large distance to all other samples.
Tanimoto and Average Distance take values between 0 and 1, and
therefore outlying distances have a limited effect.
Chi-square and Chi distances are not limited between 0 and 1,
i.e. some distances may be much larger than others. The Chi metric is
slightly more robust to outliers than the Chi-square metric.
For matrix
or data.frame
objects, x
must be a
matrix with 0's and 1's (or FALSE
and TRUE
).
The usual definitions
are used for Tanimoto (which is equivalent to Jaccard's index),
Chi-square and Chi.
Average overlap between rows i
and j
is simply the
average between the proportion of elements in i
also in
j
and the proportion of elements in j
also in i
.
mds
to create MDS-oriented objects, procrustesAdj
for
Procrustes adjustment. x <- rbind(c(rep(0,15),rep(1,5)),c(rep(0,15),rep(1,5)),c(rep(0,19),1),c(rep(1,5),rep(0,15)))
rownames(x) <- letters[1:4]
d <- distGPS(x,metric='tanimoto')
du <- distGPS(x,metric='tanimoto',uniqueRows=TRUE)
mds1 <- mds(d)
mds1
plot(mds1)
d <- distGPS(x,metric='chisquare')
mds1 <- mds(d)
mds1
plot(mds1)
Run the code above in your browser using DataLab