- Cosine similarity is a measure of similarity between two vectors of an inner product space that measures the cosine of the angle between them.
- Tversky index is an asymmetric similarity measure on sets that compares a variant to a prototype.
- Overlap cofficient is a similarity measure related to the Jaccard index that measures the overlap between two sets, and is defined as the size of the intersection divided by the smaller of the size of the two sets.
- Jaccard index is a statistic used for comparing the similarity and diversity of sample sets.
- Morisita's overlap index is a statistical measure of dispersion of individuals in a population. It is used to compare overlap among samples (Morisita 1959). This formula is based on the assumption that increasing the size of the samples will increase the diversity because it will include different habitats (i.e. different faunas).
- Horn's overlap index based on Shannon's entropy.
cosine.similarity(.alpha, .beta, .do.norm = NA, .laplace = 0)tversky.index(x, y, .a = 0.5, .b = 0.5)
overlap.coef(.alpha, .beta)
jaccard.index(.alpha, .beta, .intersection.number = NA)
morisitas.index(.alpha, .beta, .do.unique = T)
horn.index(.alpha, .beta, .do.unique = T)
tversky.index
and overlap.coef
, matrix or data.frame with 2 columns for morisitas.index
and horn.index
,
either tmorisitas.index
input data are matrices or data.frames with two columns: first column is
elements (species or individuals), second is a number of elements (species or individuals) in a population.Formulas:
Cosine similarity: cos(a, b) = a * b / (||a|| * ||b||)
Tversky index: S(X, Y) = |X and Y| / (|X and Y| + a*|X - Y| + b*|Y - X|)
Overlap coefficient: overlap(X, Y) = |X and Y| / min(|X|, |Y|)
Jaccard index: J(A, B) = |A and B| / |A U B|
Formula for Morisita's overlap index is quite complicated and can't be easily shown here, so just look at this webpage: http://en.wikipedia.org/wiki/Morisita
jaccard.index(1:10, 2:20)
a <- length(unique(immdata[[1]][, c('CDR3.amino.acid.sequence', 'V.segments')]))
b <- length(unique(immdata[[2]][, c('CDR3.amino.acid.sequence', 'V.segments')]))
jaccard.index(a, b, intersect(immdata[[1]], immdata[[2]], 'ave'))
Run the code above in your browser using DataLab