gsnImportGSEA: gsnImportGSEA

Description

Add GSEA search data to a GSNData object, as generated by the the GSEA package. The data set can be either in the form of a data.frame or specified as import from a delimited text file. (See Details below)

Usage

gsnImportGSEA(
  object,
  pathways_data = NULL,
  filename = NULL,
  id_col = NULL,
  stat_col = NULL,
  sig_order = NULL,
  n_col = NULL,
  sep = "\t"
)

Value

This returns a GSNData object containing imported pathways data.

Arguments

object: A GSNData object.
pathways_data: An (optional) data.frame containing the results of GSEA analysis. (Either this or the filename argument must be set.
filename: An (optional) filename for data sets read from a text file containing GSEA results. This is ignored if the pathways_data argument is set.
id_col: (optional) A character vector of length 1 indicating the name of the column used as a key for gene sets or modules. This is normally the NAME field of GSEA data, which must be the same as the names of gene sets specified in the tmod object or in the list of gene set vectors specified with the geneSetCollection argument used when building the gene set network. By default this value is 'NAME'. The IDs must correspond to the names of the gene sets provided, or an error will be thrown. NOTE: In the tmod::tmodImportMSigDB function provided by the tmod package, the default ID is an MSigDB accession, but GSEA data sets do not use this accession. The NAME column used in GSEA results set corresponds instead to the STANDARD_NAME field in the MSigDB XML database file. This STANDARD_NAME field is not preserved by the standard tmod::tmodImportMSigDB utility function, but instead reformatted converting underscores to spaces and non-initial letters to lower case. Therefore, when using GSEA data sets with an MSigDB gene set collection imported using tmod::tmodImportMSigDB the NAME fields need to be mapped to the ID or vice versa.
stat_col: (optional) A character vector of length 1 indicating the name of the column used as a statistic to evaluate the quality of pathways results. The function scans through possible stat_col values ("FDR q-val", "FDR.q.val", "FWER p-val", "FWER.p.val", "NOM p-val", "NOM.p.val"), and uses the first one it finds. (The presence of spaces and hyphens in the column names necessitates flexibility here. Depending on how GSEA results sets are read in, spaces and hyphens may be substituted with periods.)
sig_order: (optional) Either 'loToHi' (default) or 'hiToLo' depending on the statistic used to evaluate pathways results.
n_col: (optional) Specifies the column containing the number of genes in the gene set. Generally, this is the number of genes in the gene set that are attested in an expression data set (Defaults to 'SIZE').
sep: A separator for text file import, defaults to "\t". Ignored if filename is not specified.

Details

GSEA results directories generally contain files with names beginning with gsea_report_for_ and with the .xls suffix. This method is designed to handle such data sets.

Note: An error is thrown if all gene set IDs in the genePresenceAbsense are not present in the GSEA NAME column. However, if there are gene set IDs present in the pathways data that are absent from the $genePresenceAbsence matrix, then this method emits a warning. It also checks for the standard GSEA data set column names, and if some are missing, it will emit a warning.

Examples

Run this code


library(GSNA)

# In this example, we generate a gene set network from GSEA example
# data. We begin by subsetting the GSEA data for significant results:
sig_pathways.gsea <- subset( Bai_CiHep_dorothea_DN.Gsea, `FDR q-val` <= 0.05 )

# Now create a gene set collection containing just the gene sets
# with significant GSEA results, by subsetting Bai_gsc.tmod using
# the gene set NAME as keys:
sig_pathways.tmod <- Bai_gsc.tmod[sig_pathways.gsea$NAME]

# And obtain a background gene set from the expression data used
# to generate the gsea results:
background_genes <- toupper( rownames( Bai_empty_expr_mat ) )

# Build a gene set network:
sig_pathways.GSN <-
   buildGeneSetNetworkSTLF(geneSetCollection = sig_pathways.tmod,
                           ref.background = background_genes )

# Now import the GSEA data.
sig_pathways.GSN <- gsnImportGSEA( sig_pathways.GSN,
                                   pathways_data = sig_pathways.gsea )