get_sim_grid: Get similarity matrix of pairwise similarities of term sets.

Description

Using either an ontology_index object and numeric vector of information content per term - or a matrix of between-term similarities (e.g. the output of get_term_sim_mat), create a numeric matrix of `between-term set' similarities. Either the `best-match-average' or `best-match-product' approach (i.e. where the 2 scores obtained by applying the asymmetric `best-match' similarity function to two term sets in each order are combined by taking the average or the product respectively). Either Lin's (default) or Resnik's definition of term similarity can be used. If information_content is not specified, a default value from descendants_IC is generated.

Usage

get_sim_grid(
  ontology,
  information_content,
  term_sim_method,
  term_sim_mat,
  term_sets,
  term_sets2 = term_sets,
  combine = "average"
)

Value

Numeric matrix of pairwise term set similarities.

Arguments

ontology: ontology_index object.
information_content: Numeric vector of information contents of terms (named by term)
term_sim_method: Character string equalling either "lin" or "resnik" to use Lin or Resnik's expression for the similarity of terms.
term_sim_mat: Numeric matrix with rows and columns corresponding to (and named by) term IDs, and cells containing the similarity between the row and column term
term_sets: List of character vectors of ontological term IDs.
term_sets2: Second set of term sets.
combine: Character string - either "average" or "product", indicating whether to use the best-match-product' method, or function accepting two arguments - the first, the similarity matrix obtained by averaging across term sets in term_sets, and the second averaging across those in term_sets2.

Details

Note that if any term set within term_sets has 0 terms associated with it, it will get a similarity of 0 to any other set. If you do not want to compare term sets with no annotation, take care to filter out empty sets first, e.g. by `term_sets=term_sets[sapply(term_sets, length) > 0]`.

Examples

Run this code

library(ontologyIndex)
data(hpo)
term_sets <- list(
`case1`=c("HP:0001873", "HP:0011877"),
`case2`=c("HP:0001872", "HP:0001892"),
`case3`="HP:0001873")
get_sim_grid(ontology=hpo, term_sets=term_sets)