Learn R Programming

GSNA (version 0.1.4.2)

gsnImportDAVID: gsnImportDAVID

Description

Add DAVID search data to a GSNData object, as generated by the the DAVID web application (https://david.ncifcrf.gov/) output using either the "Functional Annotation Chart" or "Functional Annotation Cluster" results output options. The data set can be either in the form of a data.frame or specified as import from an output text file. (See Details below)

Usage

gsnImportDAVID(
  object,
  pathways_data = NULL,
  filename = NULL,
  id_col = NULL,
  stat_col = NULL,
  sig_order = NULL,
  n_col = NULL,
  sep = "\t"
)

Value

This returns a GSNData object containing imported pathways data.

Arguments

object

A GSNData object.

pathways_data

An (optional) data.frame containing the results of DAVID analysis. (Either this or the filename argument must be set. Such a data.frame can be obtained by using the read_david_data_file() function to parse a DAVID "Functional Annotation Chart" or "Functional Annotation Cluster" results text file with the default options (output = "flat", redundant = FALSE, sep = "\t").

filename

An (optional) filename for data sets read from a text file containing DAVID results. This is ignored if the pathways_data argument is set.

id_col

(optional) A character vector of length 1 indicating the name of the column used as a key for gene sets or modules. This is normally the Term field of DAVID data, which must be the same as the names of gene sets in the gene set collection specified with the geneSetCollection argument used when building the gene set network. By default this value is 'Term'. The IDs must correspond to the names of the gene sets provided, or an error will be thrown.

stat_col

(optional) A character vector of length 1 indicating the name of the column used as a statistic to evaluate the quality of pathways results. The function scans through possible stat_col values ("FDR", "Bonferroni", "Benjamini", "PValue" ), and uses the first one it finds.

sig_order

(optional) Either 'loToHi' (default) or 'hiToLo' depending on the statistic used to evaluate pathways results.

n_col

(optional) Specifies the column containing the number of genes in the gene set. Generally, this is the number of genes in the gene set that are attested in an expression data set (Defaults to 'Count', if that is present, otherwise

sep

A separator for text file import, defaults to "\t". Ignored if filename is not specified.

Details

Note: An error is thrown if all gene set IDs in the genePresenceAbsense are not present in the GSEA NAME column. However, if there are gene set IDs present in the pathways data that are absent from the $genePresenceAbsence matrix, then this method emits a warning. It also checks for the standard GSEA data set column names, and if some are missing, it will emit a warning.

See Also

gsnAddPathwaysData gsnImportCERNO gsnImportGSNORA gsnImportGenericPathways