# similarity-methods

##### Compute Similarities

Provides the generic function `similarity`

and the S4 method
to compute similarities among a collection of sequences.

`is.subset, is.superset`

find subsequence or supersequence
relationships among a collection of sequences.

- Keywords
- manip

##### Usage

`similarity(x, y = NULL, ...)`# S4 method for sequences
similarity(x, y = NULL,
method = c("jaccard", "dice", "cosine", "subset"),
strict = FALSE)

# S4 method for sequences
is.subset(x, y = NULL, proper = FALSE)
# S4 method for sequences
is.superset(x, y = NULL, proper = FALSE)

##### Arguments

- x, y
- an object.
- …
- further (unused) arguments.
- method
- a string specifying the similarity measure to use (see details).
- strict
- a logical value specifying if strict itemset matching should be used.
- proper
- a logical value specifying if only strict relationships (omitting equality) should be indicated.

##### Details

Let the number of *common* elements of two sequences refer to
those that occur in a longest common subsequence. The following
similarity measures are implemented:

`jaccard`

:- The number of common elements divided by the total number of elements (the sum of the lengths of the sequences minus the length of the longest common subsequence).
`dice`

:- Uses two times the number of common elements.
`cosine`

:- Uses the square root of the product of the sequence lengths for the denominator.
`subset`

:- Zero if the first sequence is not a subsequence of the second. Otherwise the number of common elements divided by the number of elements in the first sequence.

If `strict = TRUE`

the elements (itemsets) of the sequences must
be equal to be matched. Otherwise matches are quantified by the
similarity of the itemsets (as specified by `method`

) thresholded
at 0.5, and the common sequence by the sum of the similarities.

##### Value

For `similarity`

, returns an object of class
`dsCMatrix`

if the result
is symmetric (or `method = "subset"`

) and and object of
class `dgCMatrix`

otherwise.

For `is.subset, is.superset`

returns an object of class
`lgCMatrix`

.

##### Note

Computation of the longest common subsequence of two sequences of
length `n, m`

takes `O(n*m)`

time.

The supported set of operations for the above matrix classes depends
on package Matrix. In case of problems, expand to full storage
representation using `as(x, "matrix")`

or `as.matrix(x)`

.

For efficiency use `as(x, "dist")`

to convert a symmetric
result matrix for clustering.

##### See Also

Class
```
,
method
```

`dissimilarity`

.

##### Examples

```
## use example data
data(zaki)
z <- as(zaki, "timedsequences")
similarity(z)
# require equality
similarity(z, strict = TRUE)
## emphasize common
similarity(z, method = "dice")
##
is.subset(z)
is.subset(z, proper = TRUE)
```

*Documentation reproduced from package arulesSequences, version 0.2-19, License: GPL-2*