gsnPareNetGenericToNearestNNeighbors: gsnPareNetGenericToNearestNNeighbors

Description

General method to pare GSNData distance matrices to nearest neighbor subset, applying any low or high value cutoffs that may be required.

Usage

gsnPareNetGenericToNearestNNeighbors(
  object,
  distance = NULL,
  extreme = NULL,
  cutoff = 0,
  keepOrphans = TRUE,
  N = 1
)

Value

A GSNData object containing a pared distance matrix for the specified distance metric.

Arguments

object: An object of type GSNData containing a distance matrix.
distance: (optional) character vector of length 1 indicating which pared distance matrix is to be used for assigning subnets. This defaults to the 'default_distance'.
extreme: (optional) Either min or max indicating whether low or high values are most significant, i.e. to be interpreted as the shortest distance for nearest neighbor paring. This defaults to the value set for the optimal_extreme field of the specified distance matrix.
cutoff: (optional) A cutoff specifying a maximal of minimal value that will be retained, dependent on the distance metric being used. The default value is 0, but this is likely incorrect for most purposes. For 'lf' and 'stlf' distances, we recommend a value of -90. For 'jaccard' distances, we recommend 0.3-0.4. (see details)
keepOrphans: A boolean indicating whether 'orphan' gene sets that have no nearest neighbors should be retained in the final network. (default TRUE )
N: Integer indicating the number of nearest neighbors to retain. (default 1)

Details

This method pares the GSN networks down to N nearest neighbors, with several tunable parameters. It is generally useful to include a cutoff for this method to remove weak associations between gene sets, but this is heavily dependent on the distance metric being used. A histogram or density plot showing the distribution of raw distances may be useful for determining a suitable value, since inflection points can guide selection of this cutoff. Such a plot may be generated using the gsnDistanceHistogram() method.

An alternative to this paring method is hierarchical clustering implemented in the gsnPareNetGenericHierarchic method.

Examples

Run this code


library(GSNA)

# In this example, we generate a gene set network from CERNO example
# data. We begin by subsetting the CERNO data for significant results:
sig_pathways.cerno <- subset( Bai_CiHep_DN.cerno, adj.P.Val <= 0.05 )

# Now create a gene set collection containing just the gene sets
# with significant CERNO results, by subsetting Bai_gsc.tmod using
# the gene set IDs as keys:
sig_pathways.tmod <- Bai_gsc.tmod[sig_pathways.cerno$ID]

# And obtain a background gene set from differential expression data:
background_genes <- toupper( rownames( Bai_CiHep_v_Fib2.de ) )

# Build a gene set network:
sig_pathways.GSN <-
   buildGeneSetNetworkJaccard(geneSetCollection = sig_pathways.tmod,
                              ref.background = background_genes )

# Now import the CERNO data:
sig_pathways.GSN <- gsnImportCERNO( sig_pathways.GSN,
                                    pathways_data = sig_pathways.cerno )

# Now we can pare the network. By default, the distances are complemented
# and converted into ranks for the sake of generating a network.
sig_pathways.GSN <- gsnPareNetGenericToNearestNNeighbors( object = sig_pathways.GSN )