- Cosine similarity is a measure of similarity between two vectors of an inner product space that measures the cosine of the angle between them.
- Tversky index is an asymmetric similarity measure on sets that compares a variant to a prototype.
- Overlap cofficient is a similarity measure related to the Jaccard index that measures the overlap between two sets, and is defined as the size of the intersection divided by the smaller of the size of the two sets.
- Jaccard index is a statistic used for comparing the similarity and diversity of sample sets.
- Morisita's overlap index is a statistical measure of dispersion of individuals in a population. It is used to compare overlap among samples (Morisita 1959). This formula is based on the assumption that increasing the size of the samples will increase the diversity because it will include different habitats (i.e. different faunas).
cosine.similarity(.alpha, .beta, .do.norm = NA, .laplace = 0)tversky.index(x, y, .a = 0.5, .b = 0.5)
overlap.coef(.alpha, .beta)
jaccard.index(.alpha, .beta, .intersection.number = NA)
morisitas.index(.alpha, .beta, .do.unique = T)
morisitas.index input data are matrices or data.frames with two columns: first column is
elements (species or individuals), second is a number of elements (species or individuals) in a population.For jaccard.index there are two ways for computing the index. ???
Formulas:
Cosine similarity: cos(a, b) = a * b / (||a|| * ||b||)
Tversky index: S(X, Y) = |X and Y| / (|X and Y| + a*|X - Y| + b*|Y - X|)
Overlap coefficient: overlap(X, Y) = |X and Y| / min(|X|, |Y|)
Jaccard index: J(A, B) = |A and B| / |A U B|
Formual for Morisita's overlap index is quite complicated and can't be easily shown here, so just look at webpage: http://en.wikipedia.org/wiki/Morisita
jaccard.index(1:10, 2:20)
a <- length(unique(immdata[[1]][, c('CDR3.amino.acid.sequence', 'V.segments')]))
b <- length(unique(immdata[[2]][, c('CDR3.amino.acid.sequence', 'V.segments')]))
jaccard.index(a, b, intersect(immdata[[1]], immdata[[2]], 'ave'))Run the code above in your browser using DataLab