get_sim_mat: Get similarity matrix from a term similarity matrix

Description

Using a matrix of between-term similarities (e.g. the kind obtained from applying the function get_term_sim_mat), create a numeric matrix of `between-term set' similarities, using either the `best-match-average' or `best-match-product' approach (i.e. where the 2 scores obtained by applying the asymmetric `best-match' similarity function to two term sets in each order are combined by taking the average or the product respectively).

Usage

get_sim_mat(term_sim_mat, term_sets, combine = c("average", "product"))

Arguments

term_sim_mat

Numeric matrix with rows and columns corresponding to (and named by) term IDs, and cells containing the similarity between the row and column term

term_sets

List of character vectors of ontological term IDs.

combine

Character string - either average or product, indicating whether to use the `best-match-average' or `best-match-product' method

Value

Numeric matrix of between-term set similarities

Examples

Run this code

suppressPackageStartupMessages(library(ontologyIndex))
data(hpo)
set.seed(1)
#random set of terms with ancestors
terms <- get_ancestors(hpo, sample(hpo$id, size=30))
#set information content of terms (as if each term occurs with frequency `1/n`)
information_content <- get_term_info_content(hpo, term_sets=as.list(terms))
#similarity of term pairs
tsm <- get_term_sim_mat(hpo, information_content)
#5 random term sets (call them *phenotypes*) with (at most) 8 terms (removing redundant ones)
phenotypes <- lapply(replicate(simplify=FALSE, n=5, 
  expr=sample(terms, size=8)), minimal_set, ontology=hpo)
get_sim_mat(tsm, phenotypes)

Run the code above in your browser using DataLab