Learn R Programming

GSNA (version 0.1.4.2)

buildGeneSetNetworkJaccard: buildGeneSetNetworkJaccard

Description

Using a gene set collection and a background of observable genes, calculate a matrix of Jaccard similarity indices and return a GSNData object.

Usage

buildGeneSetNetworkJaccard(
  object = NULL,
  ref.background = NULL,
  geneSetCollection = NULL,
  distMatrixFun = scoreJaccardMatrix_C
)

Value

This function returns a GSNData object with the $default_distance field set as 'jaccard' and $distances$lf$optimal_extreme set to 'max'.

Arguments

object

An object of type GSNData. If NULL, a new one is instantiated.

ref.background

(required) A character vector corresponding to the genes observable in a differential expression, ATAC-Seq or other dataset. This corresponds to the background used in tools like DAVID.

geneSetCollection

(required) A gene set collection either in the form of a tmod object, or a list of gene sets / modules as character vectors containing gene symbols and names corresponding to the gene module identifier.

distMatrixFun

(optional) Function for calculating the distance matrix. Defaults to scoreJaccardMatrix_C. Functions used for this purpose are expected to return a square numeric matrix corresponding to the distances between all gene sets.

Details

This function wraps the process of creating a GSNData object and calculating a Jaccard similarity matrix. The Jaccard index matrix is calculated using scoreJaccardMatrix(), which is implemented in C++.

Note: Because with Jaccard similarity indices, higher values indicate a closer match between sets, they are unlike standard metrics of distance. Therefore the optimal_extreme is "max", and for certain operations, such as construction of a hierarchical tree, they may require transformation for use in clustering.

See Also

scoreJaccardMatrix_C buildGeneSetNetworkLFFast buildGeneSetNetworkSTLF

Examples

Run this code

library(GSNA)
library(tmod)

# With tmod version >= 0.50.11, convert exported Bai_gsc.tmod **tmod** object to **tmodGS**:
if( utils::packageVersion( 'tmod' ) >= '0.50.11' )
  Bai_gsc.tmod <- tmod::tmod2tmodGS( GSNA::Bai_gsc.tmod )

# Get list of observable genes from expression data:
observable_genes <- toupper( rownames( Bai_empty_expr_mat ) )

# Subset GSEA data for significant results.
significant.Gsea <- subset( Bai_CiHep_dorothea_DN.Gsea, `FDR q-val` <= 0.05 )

# Subset tmod object for
gsc_subset.tmod <- Bai_gsc.tmod[ significant.Gsea$NAME ]

# Now, create a GSN object with Jaccard indices:
GSN <- buildGeneSetNetworkJaccard( ref.background = observable_genes,
                                   geneSetCollection = gsc_subset.tmod )

Run the code above in your browser using DataLab