Learn R Programming

GSNA (version 0.1.4.2)

scoreOCMatrix_C: scoreOCMatrix_C

Description

Takes a presence/absence matrix with genes as the rows and modules as columns and calculates a matrix of overlap coefficient values (also known as Szymkiewicz–Simpson coefficients^1).

Usage

scoreOCMatrix_C(geneSetCollection_m)

Value

This function returns a matrix of overlap coefficient values between gene modules. Values on the diagonal corresponding to self-overlap coefficients are returned as NA.

Arguments

geneSetCollection_m

(required) A logical presence/absence matrix representation of a gene set collection in which columns correspond to gene sets, rows correspond to genes and values are TRUE if a gene is present in a gene set and FALSE otherwise. Row and column names correspond to gene symbols and gene set identifiers, respectively. NOTE: for a typical GSNA analysis, this matrix would include only observed filtered genes and significant gene set hits from pathways analysis. Using a matrix version of the full MSigDB without filtering genes, for example, would likely be unworkably slow and memory intensive.

Details

The overlap (or Szymkiewicz–Simpson) coefficient for two sets A and B is defined as:

$$ OC(A,B) = \dfrac{\lvert A \cap B \rvert}{min(\lvert A \rvert, \lvert B \rvert)} $$

References

  1. M.K V, K K. A Survey on Similarity Measures in Text Mining. MLAIJ. 2016;3: 19–28. doi:10.5121/mlaij.2016.3103

@import Rcpp

See Also

buildGeneSetNetworkOC scoreLFMatrix_C

Examples

Run this code

library(GSNA)

# Get the background of observable genes set from
# expression data:
gene_background <- toupper(rownames( Bai_empty_expr_mat ))

# Using the sample gene set collection **Bai_gsc.tmod**,
# generate a gene presence-absence matrix filtered for the
# ref.background of observable genes:
presence_absence.mat <-
  makeFilteredGenePresenceAbsenceMatrix( ref.background = gene_background,
                                         geneSetCollection = Bai_gsc.tmod )

# Now generate an overlap coefficient matrix.
oc.mat <- scoreOCMatrix_C( presence_absence.mat )


Run the code above in your browser using DataLab